Breaking for very large objects

Hi, I have an object with 1.1mil cells across 200 samples. I've used Seurat's BPCells method to minimize the amount of processing done in-memory. However, when I try to run AddModuleScore_UCell, I get the following error:

Error in (function (cond) : error in evaluating the argument 'x' in selecting a method for function 'as.matrix': Error converting IterableMatrix to dgCMatrix
• dgCMatrix objects cannot hold more than 2^31 non-zero entries
• Input matrix has 2736736780 entries

Is there a way to bypass this conversion so that it doesn't break when it gets converted to an in-memory dgCMatrix? i.e. is there a way for UCell to work alongside very-large data that's stored on disk in the BPCells format?

The alternate solution for me is to split my object into 200 different samples, run UCell on each sample individually, and then combine the metadata results. While UCell is more robust than the default method to changes in dataset composition, the obvious downside to this method is that errors may snowball as you have increasingly more deviations from the full dataset.

What are the options for calculating UCell scores on massive datasets with on-disk processing?

carmonalab / UCell

Breaking for very large objects #40