constantAmateur / SoupX

R package to quantify and remove cell free mRNAs from droplet based scRNA-seq data
248 stars 34 forks source link

Working with inaccurate cell calling data #115

Closed RuiyuRayWang closed 1 year ago

RuiyuRayWang commented 2 years ago

Due to the nature of my bio sample, CellRanger's cell calling is inaccurate and my filtered_feature_bc_matrix contains many empty droplets.

I was able to confirm this because in my umap there is a cluster showing very low number of gene numbers and very high level of mitochondrial transcripts.

Screen Shot 2022-06-22 at 13 12 33

Tuning the --force-cells parameter in cellranger is difficult because it's hard to find the exact threshold for calling a cell a cell. See my barcode rank plot below: newplot2

My question is, can SoupX work with data like this? Should I manually remove the empty droplet population before or after SoupX?

Thanks! Ray

constantAmateur commented 1 year ago

Broadly speaking it shouldn't matter much, as long as there are more real cells than miscalled empty cells. Optimally you'd probably want to remove them before SoupX, but it really shouldn't make much difference.

If you'd like to check, try running SoupX twice with and without these cells and verify that the estimated contamination fraction is similar.