constantAmateur / SoupX

R package to quantify and remove cell free mRNAs from droplet based scRNA-seq data
248 stars 34 forks source link

SoupX on unprocessed (raw) data, but with cluster info? #123

Closed vkartha closed 1 year ago

vkartha commented 1 year ago

Hi, I'm interested in testing SoupX out for ambient RNA detection / correction. I was wondering, in most cases we start from gene x "cell" counts, prior to QC filtering / normalization / clustering / cell annotation. I see in your vignettes that it's included providing cluster annotation from the start to aid in SoupX's detection of ambient RNA contamination rates. In most cases, this isn't the norm since it's unprocessed data we wish to flag ambient RNA rates in to begin with? Am I missing something?

derrik-gratz commented 1 year ago

I passed in the 10X clusters (found in a subdirectory of the 10X outputs like .../count/analysis/clustering/gene_expression_graphclust/clusters.csv since in the documentation it said the clustering doesn't seem to change the outcomes much. This avoids me doing any processing in R to produce clusters, so I can still do the normal processsing / filtering / clustering etc. after SoupX. I also felt a little uncertain about this step and using the 10X clusters, so I'm curious to hear anything more about this topic.

constantAmateur commented 1 year ago

The default workflow of load10X and autoEstClust does exactly as derrik-gratz suggests. In theory, you will get better results if your clusters correspond to well annotated cell types. In practice though, any graph based clustering, such as the one cellranger performs, is good enough.