campbio / decontX

Methods for decontamination of single cell data
MIT License
26 stars 1 forks source link

decontPro with raw matrix #15

Open daymecita opened 6 months ago

daymecita commented 6 months ago

Hello!

I am testing decontPro and would like to know how to input the raw matrix that includes the empty droplets to better estimate the ambient profile. I can't see the option in the function nor in the vignettes.

Thanks! Dayme

yuan-yin-truly commented 6 months ago

Hey @daymecita, please refer to this discussion #14

Let me know if you have any other issues!

daymecita commented 6 months ago

Hi!

Thank you very much for your quick reply. I could run it after installing the devel version. Now, I am wondering wether I should run DecontPro using my QC'ed count matrix (after removing low quality cells and doublets) as the filtered matrix or I should use the filtered matrix from Cellranger. In the first case, there will be many droplets considered as empty (which wouldn't be true) because they are in the raw matrix but not in the filtered matrix. What is your suggestion?

Thanks!

yuan-yin-truly commented 6 months ago

Hi @daymecita,

I used mostly filtered matrix from Cellranger, with a bit of QC on top. But the QC I applied are very mild (eg. remove top 1% RNA library sized droplets) where few droplets are removed. The rationale of doing so is that, DecontPro uses cell droplets to approximate ambient noise profile, when user don't supply ambient droplet matrix. When you have some of the outlier in the cell droplets, particularly say RNA or ADT library sizes that are drastically different from a typical cell, that approximation of noise profile may be noisy in itself.

Also to quickly refer to Figure 2A of the DecontPro paper, our analysis showed that the raw matrix can include cell-containing droplets (mostly in the filtered matrix), mislabeled cell-containing droplets (might be neutrophils and such that has low mRNA), ambient droplets (low RNA and low ADT), and what we described as "spongelets" droplets with low RNA and intermediate ADT, and different distribution from the ambient. Just for your information if you are designing your own QC on raw matrix to get the cell-containing droplet.

daymecita commented 6 months ago

Hi @yuan-yin-truly !

I am using the raw matrix along with the filtered from cellranger to estimate the ambient and the background noise. I use the cellranger algorithm to call the cell-containing droplets. I use QC filtering of low quality cells and doublet calling afterwards. Should I then run decontPro using the cellranger matrices (filtered and raw), and then run my QC analysis? This is how I normally do it for RNA using decontX. Regarding mislabeled cell-containing droplets that are called as empty droplets from the raw matrix, how can I get those?

Thanks!

yuan-yin-truly commented 6 months ago

Hi @daymecita, to make things simple, if you have ambient background droplets to supply to DecontPro, you can do QC on filtered droplets, and then run DecontPro on the QC'ed droplets with no issue; otherwise, you can still do QC then decontaminate, just like how you normally do with decontX, but the approximation of ambient noise may be affected to some extent (because some of the cells were QC'ed out. Although my experience has been the final results would be largely the same for both decontPro and decontX with or without background input. So I don't think you need to be too worried unless the results look strange and you are troubleshooting.)

In our Figure 2A, we plot total RNA count vs. total ADT counts for raw matrix to identify clusters, and note those with low total RNA counts and high total ADT counts as potentially mislabeled cells. You can do something similar to that, although this would be a very manual process. If you have your own dataset and are generating biological insights, you can do some manual analysis; if you are creating some package/tools, I don't think you need to be too concerned about that. The filtered matrix should have captured the vast majority of the cells and hence most of the biology.