constantAmateur / SoupX

R package to quantify and remove cell free mRNAs from droplet based scRNA-seq data
249 stars 34 forks source link

Running SoupX with no channel info #94

Closed MomenehForoutan closed 2 years ago

MomenehForoutan commented 2 years ago

Hi team, Thanks for the great tool! I have been using the current latest version of SoupX (v 1.5.2) on a very big data set with ~ 400,000 cells, with quite significant amount of ambient RNA. According to the paper and the manual, it seems that it is recommended that we run SoupX per channel. For this data, I have the count matrix, without knowing anything about the channel information; having said that, I know the data has been run on two different versions of 10X (v2 and v3), using different sample processing protocols (e.g. sorting cells) across different hospitals. I was just wondering what do you recommend if we do not have channel info in the data, or whether or not we should somehow consider other factors that may cause variation in the amount of ambient RNA?

Just so you know that at the moment, I am subsetting to all cells that I am interested, perform first pass analysis in Seurat by running SCT, where I regress out several variables mentioned above and get the clusters; then, I run SoupX on the data using the cluster info from Seurat and by using Ig genes (as they are not supposed to be expressed in the cells I am interested in).

Any advice/feedback will be highly appreciated! Cheers, Sepideh

constantAmateur commented 2 years ago

I'm unclear how you are running SoupX without channel information. Where are you getting the table of droplets from if not the cellranger output?

Otherwise your approach of clustering with regression and using those clusters together with IG genes sounds sensible.