aertslab / AUCell

AUCell: score single cells with gene regulatory networks
110 stars 26 forks source link

DelayedArray support and block processing #14

Closed LTLA closed 2 years ago

LTLA commented 3 years ago

You may consider providing support for large datasets via the DelayedArray block processing mechanism. Briefly, the idea would be to process large datasets in column-wise chunks, thus avoiding the need to create a ranked dense matrix (e.g., as in #11).

If I understand your algorithm correctly, it should be pretty simple as each cell is processed independently of each other cell, so you can process the cells in chunks without much effort. Each chunk would create a ranked matrix (dense, but because the chunk is small, it doesn't matter) and then you can compute scores for all gene sets therein.

Doing this will allow us to use AUCell directly with arbitrarily large matrices (e.g., HDF5Matrix objects). Another plus is that the block processing mechanism supports parallelization via BiocParallel so you can just re-use that as well.

s-aibar commented 2 years ago

Thanks a lot for the pull request! I had been planning to add something like this to the package for years, but never got the time to actually look for the right interface.

I now have added it to the package, so large matrices will be much easier to deal with!