broadinstitute / infercnv

Inferring CNV from Single-Cell RNA-Seq
Other
557 stars 164 forks source link

How to deal with multiple patients? #610

Open zhouzhendiao opened 11 months ago

zhouzhendiao commented 11 months ago

We possess datasets from 5 patients, each comprising 2 samples. The computation time exceeds 24 hours, and memory usage surpasses 400GB when merging all 10 samples together (60k cells, 8000 genes). To address this, can I execute inferCNV per patient—combining samples from the same patient—and then aggregate the CNV results for epithelial cells across different patients? Is there a concern about bias given that the reference cells are not uniform across patients?

GeorgescuC commented 10 months ago

Hi @zhouzhendiao ,

Whether you run infercnv on each patient separately or combined should only affect the initial filtering of genes that are expressed below the filtering threshold and following counts normalization, as long as you use the same set of reference cells. I would always use the combined set for the reference cells. One workaround for the gene filtering would be to:

From my runs however, 60k cells x 8000 genes should not require 400GB+ of RAM if you use the Leiden subclustering option and the leiden_resolution is fitting (too high a resolution can produce too many subclusters that are very small and thus an absurdly high number of CNV regions that need to be evaluated by the costly Bayesian network step). If you are not using a sparse matrix as input already, there is also a script available that first makes a sparse matrix out of your input matrix on disk (and reads it much faster too) which reduces the starting memory size before filtering and until smoothing.

Regards, Christophe.