carpentries-incubator / microbial-amplicon-analysis

A lesson teaching analysis of microbial amplicon data
https://carpentries-incubator.github.io/microbial-amplicon-analysis/
Other
1 stars 1 forks source link

Which workflow to default to? #3

Open wallacelab opened 3 years ago

wallacelab commented 3 years ago

This lesson is meant to be a bit modular, so that it can be adapted to a few different workflows depending on user preference. That said, we do need a default one. Here are my thoughts:

OTUs vs. ASVs: OTUs (Operational Taxonomic Units) are made by clustering reads together at some threshold in order to overcome PCR and sequencing errors. ASVs (amplicon sequence variants) instead represent exact DNA sequences, and they usually use some sort of error-correction to overcome errors.

The current consensus is that ASVs are superior to OTUs because they have finer resolution, are (mostly) directly comparable between studies, and have a specific biological meaning. As such, I think we should default to ASVs, and maybe not even include options for OTU-calling pipelines.

Choice of Workflow The default workflow will affect a lot of things, since they have different ways of dealing with read pair joining, filtering, singletons, etc. Here are the major options I'm aware of (again, all for ASV calling). Each has pros and cons.

So...having typed all that out, I think I favor using DADA2 as the default pipeline, with QIIME2-Deblur as a second alternative. I think it's better to default to using DADA2 in R, and then have side/alternate lessons on QIIME2 with DADA2 or Deblur. Defaulting to R also means it will be easy to bring data into phyloseq for visualization. So possibly the workflows being:

Those are my thoughts. Anyone object?

kescobo commented 3 years ago

Agree completely re: OTU vs ASV, though probably worth putting in a note that sometimes people refer to "OTUs" generically to encompase any taxonomic assignment from amplicon sequencing, at least colloquially.

I've used QIIME2 primarily, and generally prefer python to R, but I don't think it matters over much. Probably good to mention all of the options at the top, and just focus on one, but if designed properly, we could probably use the same structure for separate python/QIIME2-focused and R/DADA2-focused workshops. Conceptually, they do the same things, and I'd personally prefer to focus on concepts and have the hands-on stuff is just "one way to implement concepts."