Closed nicodemus88 closed 4 years ago
What are the contamination fractions calculated by the two approaches? This should be printed as a message when you run the functions, or you can check the column rho
in sc$metaData
.
@constantAmateur Thank you for your reply.
Thank you very much.
My experience across looking at several hundred 10X channels is that contamination fractions in the 5-10% range are very common. Given this, I'd be inclined to use the 8% estimate. It's likely to be an accurate estimate and even if it is a sight over-estimate it is not by much and seems to clean your data nicely.
Regarding doublets, it really shouldn't matter if you run the method before or after SoupX. The only proviso might be that if the particular doublet finder requires integer counts set roundToInt=TRUE
when running adjustCounts
.
Hi @constantAmateur , I remember that in this issue you recommended running doublet detection tool first.
2. it will make little practical difference. Ideally you should run doublet removal first and exclude these cells when running SoupX, but it will make little practical difference.
Hi, Thank you for the wonderful program. I would like some opinion on the analysis I performed on my dataset.
My dataset is from PBMCs sequenced using the 10X 5'-GEX kit. I can see that there are very obvious background contamination from RBCs, so I tried to remove them.
If you see the attached image, you can see I tried 2 different parameters. For the auto, I used the default, automated SoupX setting while for manual, I manually specify the list of RBC genes only as possible source of background.
After correction, background RBC signal was reduced but to different efficiency. With manual correction, all background signals in all cell clusters except RBC were removed, but with auto, I still see some signal in 1 specific island.
So my questions are: 1) In this case, which is the 'better' correction method? Would you suggest the automated or manual method? 2) As I mentioned, there's an island which still express some RBC genes after automated method. That island strangely does not express any known immune cell marker but express CD45 which is a pan-immune cell marker. The same island / cluster of cells can be seen in manual correction as well (circled in blue). Could these potentially be doublets / multiplets? In this case, do you recommend doublet removal first before removing ambient RNA? 3) In the case of PBMCs, what other gene sets do you recommend to remove ambient RNA? As this is sequenced on a 5'-GEX kit, the capture of TCR / Ig genes are a bit poor as I could not really get good expression of these genes, so they're not good candidates. I tried using canonical markers like CD3 groups for T-cells, but results are not so impressive.
Hope to hear back from you soon. Thank you very much.
SoupX_test