Until now, that is, prior to DelayedMatrixStats 1.23.1, the colVars() and rowVars() methods for DelayedMatrix objects were using colblock_APPLY() and rowblock_APPLY() internally to handle block processing. These utilities use blocks made of full columns and full rows, respectively, regarless of the physical layout of the data on disk. However, this doesn't "play well" with some physical layouts. For example, loading full rows in memory is extremely inefficient in the case of a TENxMatrix object (from the HDF5Array package), because it triggers the loading of the entire dataset!
The new BLOCK_colVars() and BLOCK_rowVars() internal helpers implemented in the DelayedArray package address this by trying to choose a block geometry that "plays well" with the physical layout. By delegating the work to these functions, the colVars() and rowVars() methods for DelayedMatrix objects can be 3x to 10x faster (or more) for datasets with a "difficult" physical layout, while at the same time consume a lot less memory.
Note that the other matrixStats methods defined in DelayedMatrixStats also use colblock_APPLY() and rowblock_APPLY() internally, so will need to be modified in a similar way.
Until now, that is, prior to DelayedMatrixStats 1.23.1, the
colVars()
androwVars()
methods for DelayedMatrix objects were usingcolblock_APPLY()
androwblock_APPLY()
internally to handle block processing. These utilities use blocks made of full columns and full rows, respectively, regarless of the physical layout of the data on disk. However, this doesn't "play well" with some physical layouts. For example, loading full rows in memory is extremely inefficient in the case of a TENxMatrix object (from the HDF5Array package), because it triggers the loading of the entire dataset!The new
BLOCK_colVars()
andBLOCK_rowVars()
internal helpers implemented in the DelayedArray package address this by trying to choose a block geometry that "plays well" with the physical layout. By delegating the work to these functions, thecolVars()
androwVars()
methods for DelayedMatrix objects can be 3x to 10x faster (or more) for datasets with a "difficult" physical layout, while at the same time consume a lot less memory.Note that the other matrixStats methods defined in DelayedMatrixStats also use
colblock_APPLY()
androwblock_APPLY()
internally, so will need to be modified in a similar way.