AstrobioMike / AstrobioMike.github.io

Site to help biologists wade their way into bioinformatics.
https://astrobiomike.github.io/
Other
35 stars 21 forks source link

REgarding the full example workflow DADA2 #36

Closed Cesar2112 closed 3 years ago

Cesar2112 commented 3 years ago

Hi Mike! first of all thansk for the tutorial and clear info in your page!

I'm following both DADA2 tutorial (https://benjjneb.github.io/dada2/tutorial.html) and your pipeline too (https://astrobiomike.github.io/amplicon/dada2_workflow_ex#inferring-asvs). I wanted to ask you why in your own example the ASV table creation cames before merging of sequences? This is the oposite in the DADA2 tutorial, with my few experience in metabarcoding pipelines I would tend to think first in merge the reads before inferring ASV. Have you a word to say?

Thanks in advance and have a good day. César

AstrobioMike commented 3 years ago

Hey there, César!

Thanks for the kind words :)

I'm not sure what part you are referring to being the opposite from the dada2 tutorial. That also infers the sequences on the forward and reverse reads separately (the dada() steps) prior to merging them (the mergePairs() step). First is the "Sample Inference" section, then next is the "Merge paired reads" section. Mine on Happy Belly goes the same way. Am i misunderstanding or maybe not seeing what you are saying is opposite?

As for why the ASV-inference happens prior to merging the pairs, it has to do with how dada2 works. dada2 utilizes the quality scores and error profiles of the reads, which can be very different between forward and reverse, and becomes artificial at the overlapping regions. I also remember this being unique about the dada2 approach. That's just my terminology-based remembrance of why it does things this way. If you want an actual, technical explanation, haha, I'd look at the initial dada2 paper and then maybe search the dada2 github issues and post a question there if you can't find anything :)

Cesar2112 commented 3 years ago

Thanks for the clarification Mike and your early answer. I was missunderstanding the process and now is clearer. You are right, both pipelines follows the same steps in the same order.

Cheers, César