constantAmateur / SoupX

R package to quantify and remove cell free mRNAs from droplet based scRNA-seq data
253 stars 34 forks source link

Opinion on analysis: (1) Auto/Manual; and (2) Before/After doublet removal #44

Closed nicodemus88 closed 4 years ago

nicodemus88 commented 4 years ago

Hi, Thank you for the wonderful program. I would like some opinion on the analysis I performed on my dataset.

My dataset is from PBMCs sequenced using the 10X 5'-GEX kit. I can see that there are very obvious background contamination from RBCs, so I tried to remove them.

If you see the attached image, you can see I tried 2 different parameters. For the auto, I used the default, automated SoupX setting while for manual, I manually specify the list of RBC genes only as possible source of background.

After correction, background RBC signal was reduced but to different efficiency. With manual correction, all background signals in all cell clusters except RBC were removed, but with auto, I still see some signal in 1 specific island.

So my questions are: 1) In this case, which is the 'better' correction method? Would you suggest the automated or manual method? 2) As I mentioned, there's an island which still express some RBC genes after automated method. That island strangely does not express any known immune cell marker but express CD45 which is a pan-immune cell marker. The same island / cluster of cells can be seen in manual correction as well (circled in blue). Could these potentially be doublets / multiplets? In this case, do you recommend doublet removal first before removing ambient RNA? 3) In the case of PBMCs, what other gene sets do you recommend to remove ambient RNA? As this is sequenced on a 5'-GEX kit, the capture of TCR / Ig genes are a bit poor as I could not really get good expression of these genes, so they're not good candidates. I tried using canonical markers like CD3 groups for T-cells, but results are not so impressive.

Hope to hear back from you soon. Thank you very much.

SoupX_test

constantAmateur commented 4 years ago

What are the contamination fractions calculated by the two approaches? This should be printed as a message when you run the functions, or you can check the column rho in sc$metaData.

  1. If your data has very obvious RBC contamination, I would trust the contamination fraction calculated. I am curious to see how different the automatic value is though (see above)
  2. It is possible that these are doublets, but you should run a dedicated doublet detection method to check. Ideally you should run doublet removal first and exclude these cells when running SoupX, but it will make little practical difference.
  3. It sounds like HBG genes work well for you, so you shouldn't really need another set. I'd usually recommend Ig genes for Immune cell data. It really depends on what cells you capture. Mast cell markers can sometimes work well. Or S100A8/A9.
nicodemus88 commented 4 years ago

@constantAmateur Thank you for your reply.

  1. Contamination fraction by automated method is 3% while manually is 8%.
  2. OK, I will give it a shot. But do you suggest doublet removal pre- or post-SoupX? I'm currently using DoubletFinder for doublet detection & the data needs to be at least partially cleaned to remove poor quality cells. Would that affect SoupX in any way?

Thank you very much.

constantAmateur commented 4 years ago

My experience across looking at several hundred 10X channels is that contamination fractions in the 5-10% range are very common. Given this, I'd be inclined to use the 8% estimate. It's likely to be an accurate estimate and even if it is a sight over-estimate it is not by much and seems to clean your data nicely.

Regarding doublets, it really shouldn't matter if you run the method before or after SoupX. The only proviso might be that if the particular doublet finder requires integer counts set roundToInt=TRUE when running adjustCounts.

YiweiNiu commented 3 years ago

Hi @constantAmateur , I remember that in this issue you recommended running doublet detection tool first.

Xyanhong commented 6 months ago

2. it will make little practical difference. Ideally you should run doublet removal first and exclude these cells when running SoupX, but it will make little practical difference.