Closed LTLA closed 2 years ago
Thanks a lot for the pull request! I had been planning to add something like this to the package for years, but never got the time to actually look for the right interface.
I now have added it to the package, so large matrices will be much easier to deal with!
You may consider providing support for large datasets via the DelayedArray block processing mechanism. Briefly, the idea would be to process large datasets in column-wise chunks, thus avoiding the need to create a ranked dense matrix (e.g., as in #11).
If I understand your algorithm correctly, it should be pretty simple as each cell is processed independently of each other cell, so you can process the cells in chunks without much effort. Each chunk would create a ranked matrix (dense, but because the chunk is small, it doesn't matter) and then you can compute scores for all gene sets therein.
Doing this will allow us to use AUCell directly with arbitrarily large matrices (e.g.,
HDF5Matrix
objects). Another plus is that the block processing mechanism supports parallelization via BiocParallel so you can just re-use that as well.