Ability of DADA2 to detect sequencing and PCR error?

I am unsure if this is the most appropriate place to post this issue. I realize this may be more of a comment on the paper in question than DADA2, however I am specifically interested in the use of DADA2 here. Please feel free to comment if this is not an appropriate use of the issue tracker and of course suggestions of where a better place to post this would be welcome.

A recent paper about an improved marker gene analysis pipeline compared the ubiquitous Earth Microbiome Protocol (EMP) and a dual-indexing protocol (DI). The two methods were run on several mock communities and then both pipelines were analyzed using a 97% OTU picking protocol using USEARCH61 in QIIME 1. At a high level the conclusions of the paper were that the newer DI protocol performed better in nearly every regard. Furthermore, the use of an higher fidelity / error correcting polymerase was shown to be more accurate than using the standard Taq polymerase.

A concern I have is that the paper only used 97% OTUs for their comparison, which might be influencing the results. For example, if DADA2 was used to identify the sequence features, and error corrections were incorporated, perhaps the differences in the use of the two protocols and the two polymerases would be minimized. As it stands, the paper’s conclusions might be influenced by known OTU issues, like overestimating diversity, etc. In the context of a lab / project that only wanted to use 97% OTUs for their features, the paper stands. But the field in general is moving away from OTUs, and towards error corrected sequences such as those produced by DADA2, deblurring, or even dereplication.

There are a few different outcomes I could see by using different computational approaches to analyzing the data. First, the results may not change, both lab protocols after all would be analyzed using the same computational method. Presumably because the two lab protocols were sequenced in the same fashion any improvement in reconstituting the mock communities would be equal with regard to the two protocols. I could also imagine a situation in which the DI protocol shows minimal improvement using DADA2 over OTU picking whereas the EMP protocol showed substantial improvements, resulting in performance in the two protocols that are more similar than what was presented in the paper.

I am interested in knowing the outcome of this approach though unfortunately don’t have the time to carry it out. I am interested to know if anyone has looked at using a DADA2 to analyze the quality of the two protocols and if this would be valuable for the community to know.

This is an interesting question, and one I've had since these formal comparisons of methods/labs on the same reference material (mock communities) have been emerging. While your question about whether or not the OTU results would converge if DADA2 had been used on both workflows is perhaps worth considering, the more important question for the field is, "Why are they using OTUs in this context at all??"... These are mock communities for which the exact sequences are known in advance. This means that the objective evaluation of "which protocol was the most accurate, providing correct sequences more of the time" is not only possible, but highly advisable relative to a subjective comparison between protocols of somewhat unknown absolute accuracy, or an accuracy defined by some heuristic mapping of 97% sequence similarity to the known taxonomy of the members of the mock community.

I brought this point up at the NIST microbiome standards meeting this summer, along with a few other colleagues who were also incredulous how final evaluation of these efforts would still end up shoe-horned into an OTU approach. We can do so much better.

@johnchase I haven't looked closely at this particular paper, but I can comment that we have found that the DADA algorithm can reduce "batch effects" resulting from differences in error rates caused by different methodologies, so I think what you are suggesting would be worth looking into.

When writing the DADA1 paper we looked directly at two datasets generated by Taq and another high-fidelity PCR polymerase. Because the algorithm learned the differences in the error rates between the two enzymes from the data, its final output was basically unaffected by those differences. Fixed cutoff methods (eg. OTUs) don't work that way.

A somewhat related example, here are diversity estimates from QIIME(uclust)/Mothur(an)/DADA2 on an example mock community, filtered at different levels of stringency. Because of the DADA2 error model, its estimates are essentially independent of the stringency of quality filtering, while the OTU methods are highly sensitive to that:

crkmt7uusaacepe

All that said: There are types of differences, such as those relating to bias or misindexing, that DADA2 vs. OTU will have no impact on, as those error modes are not part of the DADA2 error model. So I agree it is worth trying what you suggest, but hard to say at this point whether there will be a significant change.

Courtesy of lead author Daryl Gohl, because I was having a hard time figuring this out, these are the SRX accessions for the samples analyzed in Figure 1 (of "Systematic improvement of amplicon marker gene methods for increased accuracy in microbiome studies"):

Figure 1g-i Kozich data: https://www.mothur.org/MiSeqDevelopmentData.html Nelson data: http://sra.dnanexus.com/studies/ERP003999

EMP(Taq) HM_276D_rep1_141006_M00262_0180_000000000-ABG6L SRX1582951 HM_276D_rep2_141006_M00262_0180_000000000-ABG6L SRX1582953 HM_276D_rep3_141006_M00262_0180_000000000-ABG6L SRX1582955

EMP(KAPA) HM_276D_rep1_KAPA_141006_M00262_0180_000000000-ABG6L SRX1582952 HM_276D_rep2_KAPA_141006_M00262_0180_000000000-ABG6L SRX1582954 HM_276D_rep3_KAPA_141006_M00262_0180_000000000-ABG6L SRX1582956

DI(Taq) Taq_20_HM_276D_rep1_150112_M00784_0182_000000000-AD1AR SRX1582892 Taq_20_HM_276D_rep2_150112_M00784_0182_000000000-AD1AR SRX1582893 Taq_20_HM_276D_rep3_150112_M00784_0182_000000000-AD1AR SRX1582894 Taq_20_HM_276D_100_150112_M00784_0182_000000000-AD1AR SRX1582886

DI(Q5) Q5_20_HM_276D_rep1_150112_M00784_0182_000000000-AD1AR SRX1582844 Q5_20_HM_276D_rep2_150112_M00784_0182_000000000-AD1AR SRX1582845 Q5_20_HM_276D_rep3_150112_M00784_0182_000000000-AD1AR SRX1582846 Q5_20_HM_276D_100_150112_M00784_0182_000000000-AD1AR SRX1582838

DI(KAPA) HM_276D_rep1_141006_M00784_0156_000000000-ABEPA SRX1582965 HM_276D_rep2_141006_M00784_0156_000000000-ABEPA SRX1582966 HM_276D_rep3_141006_M00784_0156_000000000-ABEPA SRX1582967 K_100_1x_20cyc_150627_M00262_0028_000000000-AFVU2 SRX1582991

Figure 1j-k EMP(Taq) HM_277D_rep1_141006_M00262_0180_000000000-ABG6L SRX1582957 HM_277D_rep1_KAPA_141006_M00262_0180_000000000-ABG6L SRX1582958 HM_277D_rep2_141006_M00262_0180_000000000-ABG6L SRX1582959

EMP(KAPA) HM_277D_rep2_KAPA_141006_M00262_0180_000000000-ABG6L SRX1582960 HM_277D_rep3_141006_M00262_0180_000000000-ABG6L SRX1582961 HM_277D_rep3_KAPA_141006_M00262_0180_000000000-ABG6L SRX1582962

DI(Taq) Taq_20_HM_277D_rep1_150112_M00784_0182_000000000-AD1AR SRX1582895 Taq_20_HM_277D_rep2_150112_M00784_0182_000000000-AD1AR SRX1582896 Taq_20_HM_277D_rep3_150112_M00784_0182_000000000-AD1AR SRX1582897

DI(Q5) Q5_20_HM_277D_rep1_150112_M00784_0182_000000000-AD1AR SRX1582847 Q5_20_HM_277D_rep2_150112_M00784_0182_000000000-AD1AR SRX1582848 Q5_20_HM_277D_rep3_150112_M00784_0182_000000000-AD1AR SRX1582849

DI(KAPA) HM_277D_rep1_141006_M00784_0156_000000000-ABEPA SRX1582968 HM_277D_rep2_141006_M00784_0156_000000000-ABEPA SRX1582969 HM_277D_rep3_141006_M00784_0156_000000000-ABEPA SRX1582970

Inspired by @johnchase questions above, I ran two samples through DADA2 to see what happened, one EMP/Taq/Even sample (SRX1582951), and one DI/KAPA/Even sample (SRX1582965). These are two of the samples used in Figure 1c/1e/1g/1h/1i.

Both samples are of the same mock community of 20 strains mixed at equimolar rRNA concentration: 19 with distinguishable v4 16S regions (two identical Stapylococci) and containing 22 unique SVs or sequence variants (B. vulgatus has 3, C. bejerincii has 2).

On figure 1h (% chimeric reads) I found that that EMP/Taq sample had 15.55% chimeric reads, and the DI/KAPA sample had 0.89% chimeric reads, in line with the findings in the paper, but a bit higher because the DADA2 pipeline detects more chimeras. As seen next, the DADA2 pipeline removes all those chimeras, so they have no effect on downstream results save in reducing the total number of reads.

On figure 1i (# OTUs or, for DADA2, SVs) I found that the EMP/Taq sample had 22 SVs - 0 FPs and 0 FNs - with the 22 coming, as noted above, from the 22 distinguishable SVs in these 20 strains. I found that the DI/KAPA sample had 49 SVs - 22 TPs, 22 shifted versions of the TPs, 3 contaminants, and 2 reverse complement versions of the TPs. The shifted versions were easily removed by filtering based on amplicon length. These SV numbers are much lower than the OTU numbers reported in Gohl et al. because of the far lower FP rate of the DADA2 pipeline compared to the OTU picking method used there.

On figure 1j (abundance rms) I found that the EMP/Taq sample had an RMS of 3.3%, and the DI/KAPA sample had an RMS of 1.75%, basically identical to the findings in Gohl et al.

I generated figures corresponding to figure 1c/1e after DADA2 processing (that basically reproduce the findings in Gohl et al.):

emp

In total, these findings suggest that the DADA2 pipeline solves some of the problems identified in Gohl et al, but does not (and cannot) change the differential bias issues between the DI and EMP methods. Better bioinformatics completely solved the issues with overinflation of OTUs. In fact, after DADA2 processing, the EMP protocol is more accurate in terms of presence/absence, as the DI protocol picks up additional contaminants, as well as generating reverse-complement and length-shifted variants.

However, the DI protocol produced frequencies much closer to the expected results for the 20 mock strains. While it is worth noting that the EMP protocol had an advantage in specificity (when processed with DADA2), the more uniform sensitivity of the DI protocol, which led to more accurate frequency estimates, makes a compelling case for its use, as argued by Gohl et al.

Edit: Link to the R script used to do this analysis. Should be completely reproducible for anyone, as long as you have dada2 version 1.2 or later: https://www.dropbox.com/sh/311c807uugdgm1w/AAAQ-BsGegctdTkj_JHpTsp7a?dl=0

I just saw this again when it was closed and was reminded that I had meant to respond. No need to re-open, but I appreciate the time that @benjjneb took to look into the issue, the findings are interesting and will be very useful for us. Thanks!

benjjneb / dada2

Ability of DADA2 to detect sequencing and PCR error? #173