Currently, .tome files store data matrices in a format similar to dgCMatrix from the Matrix package. This is great for making the files much smaller (nice for shinyapps.io deployments), but other implementations of on-disk matrices are more complete.
DelayedArray looks like an interesting set of methods for large on-disk arrays/matrices stored in an HDF5 file, and converting for compatibility with these arrays could have nice add-on effects - as the packages for matrixStats support continue to mature, we could benefit from the added functionality.
To try this out, we'll probably first need low-level write_tome_darray() and read_tome_darray() functions. From there, I can see if it makes sense to replace the dgCMatrices for count data, or offer DelayedArrays as an option alongside dgCMatrices. The latter could be nice - one compact structure for portability and Shiny display and another larger format for computation.
Currently, .tome files store data matrices in a format similar to dgCMatrix from the Matrix package. This is great for making the files much smaller (nice for shinyapps.io deployments), but other implementations of on-disk matrices are more complete.
DelayedArray looks like an interesting set of methods for large on-disk arrays/matrices stored in an HDF5 file, and converting for compatibility with these arrays could have nice add-on effects - as the packages for matrixStats support continue to mature, we could benefit from the added functionality.
Here's a nice workshop chapter on implementation: https://bioconductor.github.io/BiocWorkshops/effectively-using-the-delayedarray-framework-to-support-the-analysis-of-large-datasets.html
To try this out, we'll probably first need low-level write_tome_darray() and read_tome_darray() functions. From there, I can see if it makes sense to replace the dgCMatrices for count data, or offer DelayedArrays as an option alongside dgCMatrices. The latter could be nice - one compact structure for portability and Shiny display and another larger format for computation.