Which workflow to default to?

This lesson is meant to be a bit modular, so that it can be adapted to a few different workflows depending on user preference. That said, we do need a default one. Here are my thoughts:

OTUs vs. ASVs: OTUs (Operational Taxonomic Units) are made by clustering reads together at some threshold in order to overcome PCR and sequencing errors. ASVs (amplicon sequence variants) instead represent exact DNA sequences, and they usually use some sort of error-correction to overcome errors.

The current consensus is that ASVs are superior to OTUs because they have finer resolution, are (mostly) directly comparable between studies, and have a specific biological meaning. As such, I think we should default to ASVs, and maybe not even include options for OTU-calling pipelines.

Choice of Workflow The default workflow will affect a lot of things, since they have different ways of dealing with read pair joining, filtering, singletons, etc. Here are the major options I'm aware of (again, all for ASV calling). Each has pros and cons.

QIIME2 is one of the most common and has made it very easy to use with virtual machines, plugins, and the like. I dislike having to pull everything out of their artifacts, though, and Deblur tends to filter out huge numbers of reads (as seen here. Currently has ~1600 citations, but that doesn't count the ~24,000 to QIIME1.
DADA2 is an R-based platform with ~4500 citations), so it's relatively common. (Probably helped that it got integrated into QIIME2, too). It's a little odd in that it calls ASVs on the forward and reverse reads separately before merging them. However, this benchmarking paper says it's the best if you want to tell close relatives apart, and recommends it over QIIME2-Deblur.
USEARCH seems to be pretty popular (14,000 citations), and that same benchmarking paper scores it as the all-around best option (unless you need to tell close relatives apart). Unfortunately, it's a commercial program: the 32-bit version is free but the 64-bit version is licensed, even for academic use ($800 each, I think), so I don't think this is a good option. Costs aside, it seems to go a bit against the open-source ethos of the Carpentries.

So...having typed all that out, I think I favor using DADA2 as the default pipeline, with QIIME2-Deblur as a second alternative. I think it's better to default to using DADA2 in R, and then have side/alternate lessons on QIIME2 with DADA2 or Deblur. Defaulting to R also means it will be easy to bring data into phyloseq for visualization. So possibly the workflows being:

DADA2 in R (default)
DADA2 in QIIME2
Deblur in QIIME2

Those are my thoughts. Anyone object?

carpentries-incubator / microbial-amplicon-analysis

Which workflow to default to? #3