constantAmateur / SoupX

R package to quantify and remove cell free mRNAs from droplet based scRNA-seq data
255 stars 34 forks source link

Removal of general contaminants #12

Closed Osynchronika closed 4 years ago

Osynchronika commented 5 years ago

Hello,

Thanks for creating the SoupX, it seems like a nice package for accounting for the background noise in 10X data. I have a small question though. I have two single cell samples of a tissue from healthy and diseased patient, and I know a bunch of cell-type specific genes that contaminate the "soup". I could successfully remove them with your algorithm. However, I also see in each sample a number of general contaminants (like lncRNAs and splicing regualators, most of them are also highly expressed), that are sample-specific. I assume that they are contaminants, as I see them in the "empty droplets". When I look for DE genes between the samples, I of course find those contaminants in the top of the lists for all the clusters, but as I said, I assume they are just artifacts. I haven't fully understood how to deal with those general contaminant genes that are present in all cell types in the SoupX. Do I just run calculateContaminationFraction() on the list of these genes on all the cells I have? I tried that, but it didn't seem to do much. Or is there a different way to handle this?

Thanks in advance!

constantAmateur commented 4 years ago

SoupX does not require you to specify which genes are contaminants. All genes will exhibit some degree of contamination, although for many genes the amount will be vanishingly small.

If you have particular genes you suspect are contaminating different populations, I suggest you run plotMarkerMap to see their distirbution.