benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
460 stars 142 forks source link

Weird error plots on merged reads #1989

Open msalamon2 opened 1 month ago

msalamon2 commented 1 month ago

Hi,

I ran DADA2 on a 36 sample dataset of merged reads (before filtering) generated with the COI primer pair BF3/BR2 and sequenced with Novaseq 6000 250 bp pair ended.

I initially used truncLen to remove the tails that were of lower quality, but > 50% of the reads were lost during chimera removal step for two samples, so I also did trimming on the front. This fixed the issue with the chimera removal step, but the error plots did not change between the two filtering options and looks rather weird, with a lot of points with low error rates across quality scores 15-40, and far from the estimated black line. I wondered if this could be due to the merging that I had no control with, but a colleague also received merged reads after sequencing and did not have the same issue.

Here are the options for the filtering, and I am providing the plots of the errors and two QC profiles before and after filtering for one of the samples:

out <- filterAndTrim(extendedfragments, extendedfragments.filtN, maxN = 0, maxEE = 2, rm.phix = T, truncLen=445, trimLeft=30, compress=FALSE, multithread=F)

QC_profile_GMP45905.pdf plot_errors_DADA2_MalaiseTrapsv1.pdf QC_profile_GMP45905_trimmed.pdf

Thank you for your help, Best wishes, Mathilde Salamon

benjjneb commented 1 month ago

For the DADA2 workflow, we do not recommend merging beforehand. Instead, we recommend denoising the forward and reverse reads separately, and then merging, as per our tutorial workflow: https://benjjneb.github.io/dada2/tutorial.html

The error model plots your provided don't look that bad though, and would not on their own concern me.

The QC profile plots show why we do this separate denoising of F/R reads though -- as is clear there is a totally different set of quality scores in the middle of those reads there whatever read merging program you are using is arbitrating the overlap and assigning new synthetic quality scores.

msalamon2 commented 1 month ago

Hi Benjamin,

thank you for your answer and explanations. Unfortunately, the raw reads were provided to me as merged already, and I do not have a way to access the R1 and R2 reads before merging.

Thank you for having a look at the error model plots, I checked all issues posted previously on error plots, and could not find plots that looked similar.

I will keep the issue about synthetic quality scores in mind.

Best wishes, Mathilde Salamon