benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
468 stars 142 forks source link

Primers, cutadapt output and chimera #1978

Closed luigallucci closed 2 months ago

luigallucci commented 3 months ago

Hi @benjjneb,

I have several questions, hopefully not all stupid ones.

I'm working with sequences with both a classical set of primers...

FWD <- "CCTACGGGNGGCWGCAG"

REV <- "GACTACHVGGGTATCTAATCC"

Which I'm removing using a slightly modified version of the tutorial of ITS, so through Cutadapt. In this case, the samples are environmental ones. We suppose that inside this, there should be organisms rich in introns in their 16S rRNA gene (we are sequencing the V3-V4 region). After a pipeline using the pooling for the dada algorithm and the following values for the chimera removal...

derepF <- derepFastq(filtFs)
dadaFs <- dada(derepF, err = errF, pool = TRUE, multithread = 30, verbose = TRUE)
derepR <- derepFastq(filtRs)
dadaRs <- dada(derepR, err = errR, pool = TRUE, multithread = 30, verbose = TRUE)
mergers1 <- mergePairs(dadaFs, filtFs, dadaRs, filtRs, minOverlap = 12, verbose = TRUE)

seqtab <- makeSequenceTable(mergers1)
seqtab.nochim <- removeBimeraDenovo(seqtab, method="pooled", multithread= 20, verbose=TRUE)

...the output highlight an high removal rate of ASVs (still preserving most of the reads). 29890 were the starting ASVs before chimera removal. 9822 were passing the chimera step, with a final percentage of 82% (sum(seqtab.nochim)/sum(seqtab)).

Using ITS pipe, I'm pretty sure all the primers were removed. Do you have any ideas or suggestions?

Another question is, is there a "suggested" way to deal with a set of primers like this?

341F (5'-CCTACGGGNGGCWGCAG-3' 
341Fb (5'-TCCTACGGGNGGCWGCAG-3'
341Fc (5'-ATCCTACGGGNGGCWGCAG-3'
341Fd (5'-TGTCCTACGGGNGGCWGCAG-3'
785R (5'-GACTACHVGGGTATCTAATCC-3'

These were used all together in order to deal better with region variability.

benjjneb commented 3 months ago

29890 were the starting ASVs before chimera removal. 9822 were passing the chimera step, with a final percentage of 82% (sum(seqtab.nochim)/sum(seqtab)).

Using ITS pipe, I'm pretty sure all the primers were removed. Do you have any ideas or suggestions?

This is within the range of results we see in real data. Chimeras tend to be low abundance and diverse, so it is not uncommon that a majority of ASVs are flagged as chimeric, but most reads (>75%) should remain. 18% chimeric reads isn't great though, and could suggest optimizing the PCR protocol in the future, perhaps to lower the number of cycles of increase elongation times, both of which have been shown to reduce chimera formation.

Another question is, is there a "suggested" way to deal with a set of primers like this?

This is referred to as "heterogeneity spacers" sometimes. There are previous discussion on this. We have not developed a recommended workflow for this type of primer setup. You'll need to use an external tool (like cutadapt) to remove these sorts of primers. The key for DADA2 is that all the reads start at the same position when starting the DADA2 workflow.

luigallucci commented 2 months ago

@benjjneb thank you for the reply!

This is within the range of results we see in real data. Chimeras tend to be low abundance and diverse, so it is not uncommon that a majority of ASVs are flagged as chimeric, but most reads (>75%) should remain. 18% chimeric reads isn't great though, and could suggest optimizing the PCR protocol in the future, perhaps to lower the number of cycles of increase elongation times, both of which have been shown to reduce chimera formation.

Indeed, this is what I checked, and this data were produced by increasing elongation time (with respect to our standard protocol) because we know that some target organisms have introns in their 16S. Do you think this intron can lead to an increase in chimeras if a fragment containing it is targeted and amplified during the PCR?

benjjneb commented 2 months ago

Do you think this intron can lead to an increase in chimeras if a fragment containing it is targeted and amplified during the PCR?

I don't know.

luigallucci commented 2 months ago

thank you!