Closed cdeanj closed 4 years ago
Yep, I concur with your analysis.
I'd probably pick a more stringent threshold (e.g. 0.6) or more relaxed threshold (e.g. 0.1) depending on whether it was more important to my study to maximally control contamination even at the risk of losing a few real taxa (stringent), or to remove the most egregious contaminants even at the risk of letting a few contaminants through (relaxed).
Hi Dr. Callahan,
I have an additional question regarding the histogram of composite scores I presented a couple of days ago.
I take the two peaks to correspond to the the composite scores generated by the contaminant and non-contaminant models, where the far right peak corresponds to ASVs better explained by the non-contaminant model and the left peak corresponding to ASVs better explained by the contaminant model.
If this interpretation is correct, why are the composite scores surrounding the left peak so high? I would have assumed that they would have been lower, since lower scores indicate that the contaminant model is a better fit.
Thanks! Chris
If this interpretation is correct, why are the composite scores surrounding the left peak so high? I would have assumed that they would have been lower, since lower scores indicate that the contaminant model is a better fit.
All scores over 0.5 indicate that the non-contaminant model was a better fit. What you see here are two score modes, both of which are better fit by the non-contaminant model. My first guess is that the "lower-score" mode of ~0.85 might be ASVs that appear in fewer samples, and for that reason the non-contaminat model can't be preferred as strongly, but is still preferred.
When contamination is a major factor, the contaminant mode will have a mode <0.5.
Hi Dr. Callahan,
I have sequenced negative controls and generated 16S qPCR values and used them as input to
decontam
in the following way:contamdf.comb <- isContaminant(ps, method="combined", neg="is.neg", conc="CopyNumber")
Following this, I inspected the distribution of the composite scores assigned by the
isContaminant
function:hist(contamdf.comb$p, 100)
The scores appear to display a bimodal distribution with peaks at 0.8 and 1.0, indicating that most of the ASVs fall within this high score range and are likely not contaminants. Would I be justified in choosing a threshold between 0.1 and 0.6 to remove the putative contaminants? Just want to make sure I understand the purpose of this parameter.
Thanks! Chris