benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
469 stars 142 forks source link

truncLenKeep #2020

Open LukeLikesDirt opened 4 weeks ago

LukeLikesDirt commented 4 weeks ago

Hi Benjamin

Could you please consider implementing a truncLenKeep option in future versions, similar to --fastq_trunclen_keep in VSEARCH. This would be beneficial for quality filtering in the ITS pipeline, where one could remove reverse compliment primers to address read-through before filtering, but also truncate poor-quality distal ends for long ITS regions without losing shorter reads.

Have you considered this previously? Am I missing a quality filtering option that could achieve this?

Kind regards Luke

benjjneb commented 3 weeks ago

Not missing anything, we don't have an option like this implemented.

Can you clarify the use case for ITS? Is it that you would use an external tool (e.g. cutadapt) to remove reverse primers, and then use truncLenKeep on the cutadapted reads to truncate only the longer remaining reads?

LukeLikesDirt commented 3 weeks ago

Thanks for your reply.

Yes exactly.

Because the ITS1 or ITS2 regions can be as short as 50 bp in some groups, I generally remove the reverse complement of primers to address read-through. However, some groups have ITS1 or ITS2 regions greater than 300 bp in length. For these cases, particularly when distal ends are noisy, it can be beneficial to truncate forward and reverse reads to a length that allows merging after quality filtering and denoising. I find this approach allows more reads to pass quality filtering and denoising. Currently, I use VSEARCH to quality filter and truncate noisy ends while retaining reads representing short ITS1 or ITS2, before denoising with DADA2. To be fair, this isn’t a major issue, but I would prefer the option to do this directly in DADA2 if it were available.

Does this generally seem like a reasonable approach? Am I missing a similar or better way to achieve my goal in DADA2?

Cheers 
Luke

benjjneb commented 2 weeks ago

Linking this to a related issue raised in the GH repo for the QIIME2 DADA2 plugin: https://github.com/qiime2/q2-dada2/issues/129#issuecomment-2218757351