benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
459 stars 142 forks source link

Issue regarding merging data #1984

Open Aziz4306 opened 1 month ago

Aziz4306 commented 1 month ago

Hello Everyone, I am currently analyzing my data related to amoA gene sequences. However, I am encountering an issue at the merge pair step, where no data is successfully merging. I have attached the plotQualityProfile and other related images for reference. Could anyone please help me resolve this issue? Thank you in advance for your assistance. Screenshot 2024-07-19 180134 Screenshot 2024-07-19 180050 plotQualityProfile2 plotQualityProfile 1

hjarnek commented 1 month ago

Usually when merging fails, it's because the samples were quality-trimmed too much, effectively leaving no overlap for aligning the reads. How long is your amoA region, and what did you put for truncLen?

Aziz4306 commented 1 month ago

Thank you so much for your time. the primer I used for amoA is cren 104F and Cren 616R. and truncLen=c(280,220).

Aziz4306 commented 1 month ago

and I also tried truncLen=c(240,160). but facing the same problem.

hjarnek commented 1 month ago

and I also tried truncLen=c(240,160). but facing the same problem.

Trimming even more won't help. First and foremost, you should remove primers with e.g. cutadapt. Then try to quality-trim even less than you did first, or maybe even not at all, just to check what happens. Find out what the expected amplicon length with your primers is. If you get an increased merge success rate with less trimming, you've found the problem.

benjjneb commented 1 month ago

I am not familiar with the amoA primer set you are working, but I am guessing based on the names of the primers, that we are expecting an amplicon of roughly 616-104 = 512 nts. This may be off by the length of your primers, depending on whether or not they are on your reads.

But working with 512 nts as our estimate, the reads after truncation must be long enough to sufficiently overlap to allow merging. "Sufficient overlap" rule of thumb is at least 20 nts -- 12 required by default and 8 additional nucleotides to allow for some biological length variation in the amplicon. This means you need your forward and reverse truncation lengths to add up to 532nts (or more). The parameters you posted don't do that: 280+220=500, and 240+160=400. So you are losing almost everything at the merging step, because your reads no longer overlap after truncation.

So, either pick a new set of truncation lengths that add up to at least 532, or if sequence quality is too bad for that, you can also consider using just the forward reads alone.