campbio / decontX

Methods for decontamination of single cell data
MIT License
26 stars 1 forks source link

Native signal removed #19

Open daymecita opened 3 months ago

daymecita commented 3 months ago

Hi!

I am a bit surprised about the output of DecontPro. For some ADTs it seemed to work quite well but for others it seems to have removed native signal. For example, CD8 signal expressed in several cell types but also in the group T.NK cells, was removed after decontamination. Something similar happened to CD206 that should target Myeloid cells. Before decontamination several cell types expressed CD206 but mostly Myeloid. After decontPro the signal almost dissapeared for all cell types. Another thing that gets my attention is the strange density after decontPro that I get for CD3. In this case the results look good because the signal from the other cell types was removed and in T.NK cells it was preserved, but why is the density looking like that? Below you can see the plots from decontPro and the UMAPs with the expression of the mentioned ADTs after and before decontPro. The code I used was as simple as: out <- decontX::decontPro(counts,clusters) where counts are Qced filtered cell containing droplets, and clusters broad cell types I had annotated using ADT and marker genes. image image image

joshua-d-campbell commented 2 months ago

Hi @daymecita, thanks for trying out our tool and giving us feedback. We haven't observed this aggressive behavior before in our benchmark datasets, but @yuan-yin-truly can correct me if I'm wrong. Could you let us know how many ADTs and how many cells you are looking at? Could you also plot a few additional things to help us troubleshoot:

1) The cluster labels on the UMAP so we can see the resolution 2) Plot the level of ambient and background contamination on the UMAP. They can be calculated using the following code

ambient <- colSums(out$ambient_counts) / colSums(counts)
background <- colSums(out$background_counts) / colSums(counts)

They can then be added to the colData of a SingleCellExperiment or to the metadata of a Seurat object using the AddMetaData function and plotted on the UMAP like any other score.

The other thing you could try is tweaking the prior parameters delta_sd and background_sd parameters. You can make them slightly smaller (e.g. 2e-06 and 2e-07, respectively) to make the tool assume that there is less contamination.

If you still have problems and could potentially share the data (or a small subset), we can try to take a look as well .

yuan-yin-truly commented 2 months ago

Hi @daymecita, CD3 looks like a plotting artifact. We used ggplot geom_density to plot density and I think there is a parameter about kernel window size that could make the plot overly smooth or overly choppy (adjust I think? doc).

For CD8, the density looks low for raw data, so I am not sure if the algorithm can work well without over-decontaminating.

CD206 does look like over-decontaminating, and I agree with @joshua-d-campbell to try smaller priors? If you can run multiple instances, maybe try a grid search like 10, 30, 100 times smaller priors?