When use different marker genes to start, contamination ratios are quite different

constantAmateur / SoupX

R package to quantify and remove cell free mRNAs from droplet based scRNA-seq data

249 stars 34 forks source link

I have samples containing both neurons and non-neurons (two major cell types here). when I use the manual way to estimate the contamination fraction, I use two sets of marker genes: one contains markers for non-neurons, and the other contains for neurons, for example:

igGenes = c("Cldn5", "Opalin","Siglech","Aqp4","C1qc","Gja1") ##non-neuron markers or igGenes = c("Syp", "Rbfox3","Elavl2") ## neural markers

and I got marker maps as below respectively: Rplot or Rplot01

When I run calculateContaminationFraction next, I got very different results: one is close to 17%, the other one is extremely as high as 45%. I understand that in my datasets, neural markers are widely contaminated across clusters. But my question is: should I correct the expression profile using 45% instead of 17%?

constantAmateur / SoupX

When use different marker genes to start, contamination ratios are quite different #86