constantAmateur / SoupX

R package to quantify and remove cell free mRNAs from droplet based scRNA-seq data
255 stars 34 forks source link

[Question] The way to choose soup specific genes #18

Closed hanbyulcho closed 4 years ago

hanbyulcho commented 4 years ago

In my project, genes that are highly specific to just one population of cells (Immune, luminal, fibroblast, endothelial) are not known at all.

Thus, I made plot with top 60 candidates of soup specific genes by using "plotMarkerDistribution" described in detailed vignette. However, it is hard to see bimodal graph expect TGM4 (located middle in top 20 candidates).

  1. which gene could be the soup specific genes with no biomodal graph in this case?

  2. In detailed vignette, all immunoglobulin genes are used since IGCK and IGLC2 were chosen from the plot. However, that is not the case in my data. How can I make the list with TGM4 with no biological information? should I use TGM4 only as the soup specific gene?

I attached my plot for better understanding.

  1. top1-20 111

  2. top21-40 222

  3. top 41-60 333

Thanks,

constantAmateur commented 4 years ago

Based on the plots you've posted it looks like none of the genes would be suitable for estimating the contamination rate. I would not suggest proceeding with estimation using any of those genes, as you will likely get an inflated estimate of the contamination rate and do more harm than good to your data.

Are there any other genes that you think might work for your particular case? The algorithmic genes shown by plotMarkerDistribution are just a heuristic and can miss useful genes. Have you tried HB genes?

If none of these are suitable I would recommend either not doing any contamination correction, or trying a range of sensible values (2%-10%) and seeing what effect it has on your downstream analysis.