benjjneb / decontam

Simple statistical identification and removal of contaminants in marker-gene and metagenomics sequencing data
https://benjjneb.github.io/decontam/
148 stars 25 forks source link

Negative control count best practices #126

Closed Rob-murphys closed 1 year ago

Rob-murphys commented 1 year ago

I have a dataset of 18 different batches each with 2 negative controls. One from the extraction and one from the PCR. Now 2 negative controls is by your suggestion to low for prevalence decontamination. Would it be better for them to then merge all 18 batches (which I plan to do anyway later down the line) and use all 36 negative controls together or is that bad practice as they are technically negative controls from different extractions/PCR runs?

benjjneb commented 1 year ago

2 differently generated negative controls is not sufficient to use the prevalence method, so combining all the negative controls across sequencing batches would be preferred. That is actually what we did in the original decontam paper.

Rob-murphys commented 1 year ago

Thanks for the reply :) That makes sense and I shall proceed in that manner. I have one follow up question though:

How do you reconcile that not all batches would necessarily be subject to the same contaminants?

benjjneb commented 1 year ago

In my analysis, I would at least investigate if a batch-effect was evident in my data, and potentially include it in any inferential statistical modeling I did.