GabrielHoffman / variancePartition

Quantify and interpret divers of variation in multilevel gene expression experiments
http://gabrielhoffman.github.io/variancePartition/
60 stars 14 forks source link

Can variancePartition be used for analyzing single-cell data? #61

Closed DongzeHE closed 2 years ago

DongzeHE commented 2 years ago

Hello,

Thanks for providing such an excellent tool!

I am analyzing a single-cell RNA-sequencing dataset. The count matrix represents the UMI count of each gene (~30k) in each cell (~1k). The task is to know the importance of each gene to the clustering result, in which each cell is assigned a cluster.

My question is:

  1. Can I use variancePartition to do this analysis?
  2. If I can, how should I normalize the data? Can I use log transformation?

Best, Dongze

GabrielHoffman commented 2 years ago

I have been using variancePartition on pseudobulk computed from single cell RNA-seq data. This works well at the pseudobulk level, but less well at the single cell level. I am currently finishing up a package for applying variancePartition and dream to large-scale pseudobulk data that I'll release in a few weeks.

At the single-cell level, the zero's become a problem for variance partitioning. You can try log2 CPM, but I'm not sure how useful it will be

Gabriel

DongzeHE commented 2 years ago

Thanks for your answer! This makes total sense. I will see if reducing the sparsity of the data can help, such as some imputation methods, or using only highly variable genes in the analysis.