constantAmateur / SoupX

R package to quantify and remove cell free mRNAs from droplet based scRNA-seq data
249 stars 34 forks source link

Lumpy contamination estimates #73

Closed lazappi closed 3 years ago

lazappi commented 3 years ago

I have a dataset with several samples and for all of them I seem to get irregular "lumpy" estimates of the contamination fraction from autoEstCont() rather than the nice smooth distribution shown in the vignette. For example:

image

Not sure if this is normal/expected or not? In my case it usually leads to either very low or very high estimates depending on where the highest peak is. I've tried modifying the parameters to autoEstCont() and a few other things like adjusting the clustering but it doesn't make a lot of difference. Setting contaminationRange can prevent the really high values but it doesn't help with the jaggedness of the distribution.

Any suggestions about what might be going on or things to try?

Thanks

constantAmateur commented 3 years ago

That peak around 0 is almost always artifactual and can be ignored. The general lumpiness is not something I've seen before. If I had to guess, I'd say the true value probably corresponds to the peak around 0.05. But really this looks like a case when the automated estimation procedure can't be relied upon (see FAQ in readme).

If this is part of a series of similar experiments, I'd suggest just manually setting this channel to something similar to other channels that give clearer results. If you really need an accurate answer for this particular channel, you're probably stuck having to think a bit harder about what genes are likely to provide good estimates (HB and IG genes are usually a good place to start) and using the values they provide.

lazappi commented 3 years ago

Thanks! I ended up just setting a value for all the samples in this dataset. I haven't looked into it but I was wondering if the lumpiness could be due to lack of heterogeneity in the sample? This is about 80% one fairly homogenous cell type.