benjjneb / decontam

Simple statistical identification and removal of contaminants in marker-gene and metagenomics sequencing data
https://benjjneb.github.io/decontam/
140 stars 24 forks source link

Amplicon sequencing : decontam #71

Open tabaresr opened 4 years ago

tabaresr commented 4 years ago

Dear all, I did amplicon sequencing (16S) from woodchip bioreactors that were used to clean water contaminated with pesticides. I am very new to the field and I am still learning how to analyze the data.

I used the decontam package but I am struggling a bit trying to understand the results. It seems I don't have a clear separation when I plot the prevalence of the decontam taxa in the true samples and in the negative controls.

Could you guide me a little bit more on what could be happening with the data? Why is that I don't have a clear separation of the true sample and the control sample? Why is that the number of reads in some of the control samples are higher than some true samples when it must be the opposite?

Thank you very much. Sample_or_Control f2

benjjneb commented 4 years ago

Could you guide me a little bit more on what could be happening with the data? Why is that I don't have a clear separation of the true sample and the control sample? Why is that the number of reads in some of the control samples are higher than some true samples when it must be the opposite?

That you don't have any real separation in the prevalence patterns of negative-control associated taxa and those from the samples, and that there is no difference between those groups in overall reads, is a cause for concern.

Typically negative controls present with lower read depths because they contain very little genetic material to start with relative to the samples, and there should at least be some characteristic looking negative-control associated taxa with much higher prevalence in negative contorls than in true samples.

What kind of negative controls did you generate? Was there evidence of lower DNA concentrations in those samples prior to library preparation? Are woodchip bioreactors low microbial biomass?

tabaresr commented 4 years ago

Thank you for answering.

  1. What kind of negative controls did you generate? I have 20 bioreactors with nutrients added. Then, the same bioreactors are treated over time with nutrients and pesticides. Only 5 of the 20 are kept with nutrient over time (those are my controls). 
  2. Was there evidence of lower DNA concentrations in those samples prior to library preparation? I don't have that information since I just received the sequencing raw data.
benjjneb commented 4 years ago

Only 5 of the 20 are kept with nutrient over time (those are my controls).

Do you mean without nutrients? So that there would presumably be lower bacterial growth?

In any case, I don't think decontam is the tool you want to be using here. It seems like you have a "case/control" type of study design here, in which there is a set of treatment bioreactors, and another set that didn't get the treatment (nutrients/pesticides). And you'd like to detect the differences between the microbial communities between them.

decontam's prevalence method is intended to be used with negative controls of the type where there is expected to be no bacteria at all in those samples. For instance, where you take DI water or something like that and take it through the PCR and library preparation process. The purpose of these negative controls is to identify taxa that are coming from contamination, often from the reagents, so they can be removed from the data. It isn't the right tool for detecting differences between conditions.

tabaresr commented 4 years ago

I understand now. I have positive controls to compare two conditions. Therefore, I can't use this approach to identify taxa coming from contamination since I would need other information such as DNA concentration of the samples or negative controls. Thank you very much for your explanation and your time.

El mié., 13 may. 2020 a las 10:26, Benjamin Callahan (< notifications@github.com>) escribió:

Only 5 of the 20 are kept with nutrient over time (those are my controls).

Do you mean without nutrients? So that there would presumably be lower bacterial growth?

In any case, I don't think decontam is the tool you want to be using here. It seems like you have a "case/control" type of study design here, in which there is a set of treatment bioreactors, and another set that didn't get the treatment (nutrients/pesticides). And you'd like to detect the differences between the microbial communities between them.

decontam's prevalence method is intended to be used with negative controls of the type where there is expected to be no bacteria at all in those samples. For instance, where you take DI water or something like that and take it through the PCR and library preparation process. The purpose of these negative controls is to identify taxa that are coming from contamination, often from the reagents, so they can be removed from the data. It isn't the right tool for detecting differences between conditions.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/benjjneb/decontam/issues/71#issuecomment-628025544, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANS7SLYEGIH36P4DSJU3AGDRRKUZTANCNFSM4M63FPKQ .