Bioconductor / DelayedArray

A unified framework for working transparently with on-disk and in-memory array-like datasets
https://bioconductor.org/packages/DelayedArray
24 stars 9 forks source link

Move chunking utilities from HDF5Array to DelayedArray? #93

Open LTLA opened 3 years ago

LTLA commented 3 years ago

Here I'm talking about setHDF5DumpChunkLength, setHDF5DumpChunkShape, and whatever extra functions are used to auto-choose "sensible" chunk dimensions for the HDF5 arrays. Can these be moved to DelayedArray for re-use by other backends? I would then be able to use this in TileDBArray and DelayedRandomArray, and I imagine there would also be some kind of re-use for SparseArray. In addition to making backend development easier, it could also improve efficiency by ensuring that interactions between different backends involve the same chunk dimensions. An obvious example is #89, where we could encourage consistency in chunks between the RandomNormMatrix and HDF5Matrix for optimal writing speed.