Open JayalalKJ opened 3 weeks ago
I have one control sample
decontam-prevalence is not an appropriate method for use when you have only one negative control sample. It relies on repeated observation of contaminants across multiple negative controls. We recommend a minimum of 5, see our original paper for more on that. https://doi.org/10.1186/s40168-018-0605-2
Hi, I have one control sample, and the prevalence method in Decontam is not effectively identifying contaminants. The p-values seem distributed evenly, and the isContaminant() function isn’t marking many sequences as contaminants even with an aggressive threshold. I need advice on how to proceed or any alternative approaches or changes to the following code.
attempted -> (e.g., using threshold=0.3).
identification of control samples sample_data(physeq)$is.neg <- sample_data(physeq)$Sample_or_Control == "Control Sample"
Identify contaminants using the prevalence method with an aggressive threshold contamdf.prev <- isContaminant(physeq, method = "prevalence", neg = "is.neg", threshold = 0.3) table(contamdf.prev$contaminant)
visualize prevalence in positive vs negative controls ps.pa <- transform_sample_counts(physeq, function(abund) 1 * (abund > 0)) ps.pa.neg <- prune_samples(sample_data(ps.pa)$Sample_or_Control == "Control Sample", ps.pa) ps.pa.pos <- prune_samples(sample_data(ps.pa)$Sample_or_Control == "True Sample", ps.pa)
Create a data frame for visualization df.pa <- data.frame(pa.pos = taxa_sums(ps.pa.pos), pa.neg = taxa_sums(ps.pa.neg), contaminant = contamdf.prev$contaminant)
Plot the prevalence of taxa in positive vs negative controls ggplot(data = df.pa, aes(x = pa.neg, y = pa.pos, color = contaminant)) + geom_point() + xlab("Prevalence (Negative Controls)") + ylab("Prevalence (True Samples)")
Prune contaminants physeq_clean <- prune_taxa(!contamdf.prev$contaminant, physeq)