benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
468 stars 142 forks source link

ITS2 trimming issues #1723

Closed peggy17688 closed 4 months ago

peggy17688 commented 1 year ago

Hi Benjamin,

Peggy here. Thank you for the help the other time for the filterAndTrim issues.

This time, also another trimming issues, but with ITS2 sequences. I followed your protocol for the ITS pipeline, including the use of cutadapt. However, 50% of the reads were removed after the filterAndTrim process (even when I specify minLen = 0), I checked that the primers were already removed completely as well. I was just wondering if I could use the TrimLeft function (minus TruncLen) instead of cutadapt to retain the reads?

I actually had an earlier file which worked fine with cutadapt, but was wondering maybe this time because there were 2 species for this library which caused the problem.

I have attached images of the sequence quality and the read numbers after trimming with cutadapt.

Thanks in advance for the help!

Best, Peggy

image

image

reads.in reads.out 1201K1ma_ITS2_S26_R1_001.fastq.gz 89337 34885 1201K1pa_ITS2_S74_R1_001.fastq.gz 86173 26931 1201K2pa_ITS2_S86_R1_001.fastq.gz 71748 8663 1201K3ma_ITS2_S38_R1_001.fastq.gz 125259 50783 1201K3pa_ITS2_S3_R1_001.fastq.gz 105200 41949 1201K4ma_ITS2_S50_R1_001.fastq.gz 106018 42317

benjjneb commented 1 year ago

What is the exact filterAndTrim command you are using?

because there were 2 species for this library

What does this mean? This is from a test dataset with two species in it?

peggy17688 commented 1 year ago

It's at this step:

out <- filterAndTrim(cutFs, filtFs, cutRs, filtRs, maxN = 0, maxEE = c(2, 2), truncQ = 2, minLen = 50, rm.phix = TRUE, compress = TRUE, multithread = TRUE) # on windows, set multithread = FALSE

I am using this to process my ITS2 sequences from coral zooxanthallae and they came from two coral species. I am not sure if that would be a problem for the pipeline. Earlier on it was ok for another sample set.

benjjneb commented 1 year ago

Eyeballing the quality plots, I wouldn't have guessed you would be losing 50% of the reads at the filtering step. You can try relaxing maxEE to a higher number to see if that helps.

I was just wondering if I could use the TrimLeft function (minus TruncLen) instead of cutadapt to retain the reads?

The issue with ITS sequencing (or most ITS amplicons anyway) is that some of the reads will be longer than the sequenced amplicon because of the high levels of length variation at that locus, and so some of the reads need to have the opposite primer trimmed off the end. This is what cutadapt is needed for, as DADA2 has not implemented a native function to do that (on Illumina-scale data anyway). So, replacing with trimLeft is not a solution.

I am using this to process my ITS2 sequences from coral zooxanthallae and they came from two coral species. I am not sure if that would be a problem for the pipeline.

That shouldn't be a problem.

peggy17688 commented 1 year ago

Ah I see. Will try doing that. Thanks for the help!

I just tried with adjusting the maxEE with minLen = 0, it works!