benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
468 stars 142 forks source link

Errors with merger (dada2) #1585

Closed AgEntoGirl closed 4 months ago

AgEntoGirl commented 2 years ago

I am trying to run 16S reads on dada2 but I can not get my reads to merge. I just get this message.

mergers <- mergePairs(dadaFs, filtFs, dadaRs, filtRs, verbose=TRUE)

"0 paired-reads (in 0 unique pairings) successfully merged out of 26572 (in 3113 pairings) input." Can anyone help, I am completely new to this process and have no idea what I am doing.

benjjneb commented 2 years ago

Merging of paired reads can only happen if those reads overlap with one another. Did you truncate your reads to lengths too short for them to overlap? The numbers you need to know here are the length of your sequenced amplicon (e.g. V4 is usually ~250 nts if primers aren't sequenced), and the truncLen parameters you picked at the filterAndTrim stage. The sum of the forward and reverse truncation lengths should be ~20 nts more (at least) than the length of the sequenced amplicon.

EJS01 commented 2 years ago

Merging of paired reads can only happen if those reads overlap with one another. Did you truncate your reads to lengths too short for them to overlap? The numbers you need to know here are the length of your sequenced amplicon (e.g. V4 is usually ~250 nts if primers aren't sequenced), and the truncLen parameters you picked at the filterAndTrim stage. The sum of the forward and reverse truncation lengths should be ~20 nts more (at least) than the length of the sequenced amplicon.

Good day! My case is the same, I even tried the following lines for filters:

out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs, truncLen=c(230,150), maxN=0, maxEE=c(3,5), truncQ=2, rm.phix=TRUE, compress=TRUE, multithread=FALSE)

I've also tried the following values: for trucLen=(200,150) ; for maxEE= (2,2) and (5,5)

for mergers:

mergers <- mergePairs(dadaFs, derepFs, dadaRs, derepRs, verbose=TRUE) mergers <- mergePairs(dadaFs, filtFs, dadaRs, filtRs, verbose=TRUE)

the results from these codes are still the same, which is "0 paired-reads (in 0 unique pairings) successfully merged out of __ inputs"

I tried the justconcatenate command but it greatly alters the AssignTaxa part of the pipeline

16s taxa plot.pdf error f.pdf forward qc plot.pdf reverse qc plot.pdf shannon simpson.pdf error r.pdf

benjjneb commented 2 years ago

@EJS01 The relevant questions that need to be answered are the same as above. In particular, what is the length of your sequenced amplicon? If you aren't sure, a place to start is identifying what primers you are using, and whether or not they are included on your reads.

AgEntoGirl commented 2 years ago

we use V3/V4 primers (341F and 785R) but I do not know what truncLen parameter to modify it to for it to work.

EJS01 commented 2 years ago

Hi, thank you for the response. Will look unto it. Meanwhile, how can I check those parameters (length of seq amplicon, primers used, presence of primers)? Pardon me please for this, I am new to R.

benjjneb commented 2 years ago

@AgEntoGirl Are you using the Illumina 16S protocol described here? https://support.illumina.com/documents/documentation/chemistry_documentation/16s/16s-metagenomic-library-prep-guide-15044223-b.pdf

If so, the sequenced amplicon will contain the primers at the start, and is up to 460-ish nts long. So, the sum of your forward and reverse truncation lengths should be at least 475.

benjjneb commented 2 years ago

how can I check those parameters (length of seq amplicon, primers used, presence of primers)?

@EJS01 You should talk to who is doing your amplicon sequencing. They will be able to tell you what primers you are using, and hopefully whether the primers are present on the reads and how long the amplicon should be. At a bare minimum they must know what the primers are, and you could start by looking for those primers at the start of your reads (to see if they are sequenced) and searching them against a reference db of your marker-gene to get a distribution of sequencing amplicon lengths.