Bioconductor / DelayedArray

A unified framework for working transparently with on-disk and in-memory array-like datasets
https://bioconductor.org/packages/DelayedArray
24 stars 9 forks source link

Implicitly controlling the grid for blockApply #7

Closed LTLA closed 6 years ago

LTLA commented 6 years ago

If grid is not otherwise specified, blockApply calls defaultGrid to determine the size of the blocks of the matrix to be extracted in the block processing mechanism. There are two obvious questions here:

  1. Is it possible for the default grid choice to be aware of the internal chunking scheme (for HDF5Matrix and RLEMatrix objects)? I recall having these discussions with @hpages and I believe that this is on the agenda, but I'll just mention it here anyway.
  2. Is it possible for users to implicitly override the default grid choice, e.g., via global options? One can imagine that blockApply is called within some internal functions where it would be inconvenient to have to specify the grid as a user-visible parameter in the top-level exported function.

With respect to point number 2, my real issue is that beachmat relies on defaultGrid to choose the block size for realizing chunks of a DelayedMatrix object for data access. Unfortunately, it's not possible to pass any grid specifications explicitly to the beachmat API, as this would not be representation-agnostic. All information must either be present in or derivable from the matrix object (which would be the case for back-ends that have an inherent chunking, but not obvious for other types); or it should be extracted from global options, hence my request in the second point above.

LTLA commented 6 years ago

It seems like this is solved via blockGrid being responsive to DelayedArray.block.size. I can't control the exact shape of the blocks, but I guess this is good enough.

hpages commented 6 years ago

HI Aaron,

FWIW I added setDefaultBlockSize() and setDefaultBlockShape() for controlling the size and shape of the blocks produced by blockGrid(). I also added setDefaultGridMaker() to let you replace the current "grid maker" with your own so you get full control. See ?setDefaultGridMaker(). This is in DelayedArray 0.7.26.

H.