Can a variable start of amplicon sequence compromise learn Errors (or other subsequent functions)?

Hello, I have variable length barcode and in my case I trimmed all my sequences to the maximum length (from my longest barcode), so I have for smallest barcoded sequences a lost of a few true biological sequences at the beginning and an alignment of trimmed sequences will look like this:

AAAGTTATCGGC (for longest barcoded sequence)
--AGTTATCGGC
------ATCGGC (for the shortest barcoded sequence)

From version 1.3.3 DADA2 allowed variable length amplicon for the dada aligner to deal with ITS amplicons: NEWS and the collapseNomismatch to merge pair with variable length BUT what about learnErrors ? Does it takes into account the variable length/start ? Like performing an alignment of the sub sampled sequences before the error learning ? This would begin the error learning score at position 7 for the shortest barcoded sequence instead of a position 1 from my above example.

To make it short: can a variable start of amplicon sequence compromise learn Errors (or other subsequent functions)? Thanks in advance, Rémi

benjjneb / dada2

Can a variable start of amplicon sequence compromise learn Errors (or other subsequent functions)? #923