We can easily handle large matrices without coercing them into an ordinary matrix - or, heaven forbid, a data.table. One may also consider using DelayedMatrixStats to generate a ranking in a more efficient manner, especially for sparse data.
y <- as(counts(sce), "dgCMatrix")
output <- AUCell:::blocked_AUCell(y, geneIds(geneSets))
dim(output)
Parallelization on a variety of backends (forking, SNOW, Slurm, etc.) is easily supported, though there are some tricky issues with differences in results due to how the RNG stream is set up in each core. Is ties.method="random" really necessary? I also used the same approach for scran::correlatePairs in the past, but it was more trouble than it's worth.
In any case, this PR contains the minimal components required to achieve block processing. The count matrix above can be substituted with any matrix representation, including arbitrarily large file-backed HDF5Arrays, and it should continue to work. I will leave it to you to decide how you would like to integrate this into the other AUCell functions, if at all. (Note that DelayedArray is already a dependency of SummarizedExperiment, so this does not add any further dependencies.)
Closes #14. To illustrate:
We can easily handle large matrices without coercing them into an ordinary matrix - or, heaven forbid, a data.table. One may also consider using DelayedMatrixStats to generate a ranking in a more efficient manner, especially for sparse data.
Parallelization on a variety of backends (forking, SNOW, Slurm, etc.) is easily supported, though there are some tricky issues with differences in results due to how the RNG stream is set up in each core. Is
ties.method="random"
really necessary? I also used the same approach forscran::correlatePairs
in the past, but it was more trouble than it's worth.In any case, this PR contains the minimal components required to achieve block processing. The count matrix above can be substituted with any matrix representation, including arbitrarily large file-backed HDF5Arrays, and it should continue to work. I will leave it to you to decide how you would like to integrate this into the other AUCell functions, if at all. (Note that DelayedArray is already a dependency of SummarizedExperiment, so this does not add any further dependencies.)