benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
470 stars 142 forks source link

Parameters optimal to produce COI exact variants. #2015

Open aimirza opened 1 month ago

aimirza commented 1 month ago

Hi,

What parameters do you suggest changing when working on Illumina short-amplicon reads of COI genes? For example, would you:

benjjneb commented 1 month ago

I wouldn't recommend any parameter changes. Parameter settings should be appropriate for the sequencing technology, as the errors from PCR/sequencing is what DADA2 is modeling. They don't change between amplicon targets (outside of the filterAndTrim stage anyway).

aimirza commented 1 month ago

Not even the alignment parameters, which is done before error modelling? I am worried that important sequences not aligning to each other because of more variability in the COI gene compared to the 16S. For example, in the paper it says

Both heuristics [ BAND_SIZE and KDIST_CUTOFF] can be disabled by the user, and the default values should be re-examined if the algorithm is applied to genetic regions with significantly different characteristics, such as the indel-rich ITS region

benjjneb commented 1 month ago

I am worried that important sequences not aligning to each other because of more variability in the COI gene compared to the 16S

That's fine. If they don't align because they are so different, then they will be split into different ASVs as they should be.

the default values should be re-examined if the algorithm is applied to genetic regions with significantly different characteristics, such as the indel-rich ITS region

We now realize that isn't the right advice. The alignment parameters should be reconsidered when the sequencing tech has different characteristics (e.g. high indels).

aimirza commented 1 month ago

Got it, thank you!