benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
464 stars 142 forks source link

margePairs problem #1000

Closed MatS792 closed 4 years ago

MatS792 commented 4 years ago

Hi, I'm trying to use dada2 to analyze 16S sequences but I have problems with the margePairs function that gives 0 paired reads merged. I tried to use different truncation lengths, maxEE and truncQ to balance overlap and read quality, respectively. Nevertheless, the merging was always 0 for most of the reads. This is the pipeline I used with some output:

library(dada2) packageVersion("dada2") → 1.14.1

path1 <- "/home/issue_github" list.files(path1) # Verify the file list

fnFs1 <- sort(list.files(path1, pattern="_R1_001.fastq", full.names = TRUE)) fnRs1 <- sort(list.files(path1, pattern="_R2_001.fastq", full.names = TRUE))

sample.names1 <- sapply(strsplit(basename(fnFs1), "_"), [, 1)

filtFs1 <- file.path(path1, "filtered", paste0(sample.names1, "_F_filt.fastq.gz")) filtRs1 <- file.path(path1, "filtered", paste0(sample.names1, "_R_filt.fastq.gz"))

names(filtFs1) <- sample.names1 names(filtRs1) <- sample.names1

Forward sequences

plotQualityProfile(fnFs1[3:7]) plotQF

Reverse sequences

plotQualityProfile(fnRs1[3:7]) plotQR

out1 <- filterAndTrim(fnFs1, filtFs1, fnRs1, filtRs1, truncLen=c(221,171), maxN=0, maxEE=c(5,5), truncQ=9, rm.phix=TRUE, compress=TRUE, multithread=TRUE) out1

  reads.in reads.out
1_S4_L001_R1_001.fastq.gz 30660 30624
10_S17_L001_R1_001.fastq.gz 21668 21634
11_S29_L001_R1_001.fastq.gz 40759 40726
12_S41_L001_R1_001.fastq.gz 20546 20522
13_S53_L001_R1_001.fastq.gz 35109 35056
14_S65_L001_R1_001.fastq.gz 37469 37430
15_S77_L001_R1_001.fastq.gz 45975 45935
16_S89_L001_R1_001.fastq.gz 28013 27978
17_S6_L001_R1_001.fastq.gz 19992 19963
18_S18_L001_R1_001.fastq.gz 29256 29222
19_S30_L001_R1_001.fastq.gz 33458 33412
2_S16_L001_R1_001.fastq.gz 28956 28935
20_S42_L001_R1_001.fastq.gz 39858 39813
21_S54_L001_R1_001.fastq.gz 31244 31196
3_S28_L001_R1_001.fastq.gz 41611 41574
4_S40_L001_R1_001.fastq.gz 13891 13871
5_S52_L001_R1_001.fastq.gz 42859 42806
6_S64_L001_R1_001.fastq.gz 38051 38021
7_S76_L001_R1_001.fastq.gz 41910 41872
8_S88_L001_R1_001.fastq.gz 27451 27414
9_S5_L001_R1_001.fastq.gz 21935 21906

errF1 <- learnErrors(filtFs1, multithread=TRUE, randomize=TRUE) plotErrors(errF1, nominalQ=TRUE) errF errR1 <- learnErrors(filtRs1, multithread=TRUE, randomize=TRUE) plotErrors(errR1, nominalQ=TRUE) errR derepFs1 <- derepFastq(filtFs1, verbose=TRUE) derepRs1 <- derepFastq(filtRs1, verbose=TRUE)

names(derepFs1) <- sample.names1 names(derepRs1) <- sample.names1

dadaFs1 <- dada(derepFs1, err=errF1, pool="pseudo", multithread=TRUE) dadaRs1 <- dada(derepRs1, err=errR1, pool="pseudo", multithread=TRUE)

mergers1 <- mergePairs(dadaFs1, derepFs1, dadaRs1, derepRs1, verbose=TRUE)

128 paired-reads (in 5 unique pairings) successfully merged out of 27732 (in 9091 pairings) input. 0 paired-reads (in 0 unique pairings) successfully merged out of 19166 (in 7612 pairings) input. 0 paired-reads (in 0 unique pairings) successfully merged out of 37072 (in 13859 pairings) input. 0 paired-reads (in 0 unique pairings) successfully merged out of 18236 (in 6702 pairings) input. 0 paired-reads (in 0 unique pairings) successfully merged out of 31660 (in 11839 pairings) input. 0 paired-reads (in 0 unique pairings) successfully merged out of 34434 (in 10305 pairings) input. 0 paired-reads (in 0 unique pairings) successfully merged out of 41381 (in 13940 pairings) input. 16 paired-reads (in 1 unique pairings) successfully merged out of 25080 (in 7844 pairings) input. 0 paired-reads (in 0 unique pairings) successfully merged out of 17700 (in 7060 pairings) input. 0 paired-reads (in 0 unique pairings) successfully merged out of 26761 (in 9365 pairings) input. 0 paired-reads (in 0 unique pairings) successfully merged out of 30259 (in 9612 pairings) input. 6 paired-reads (in 1 unique pairings) successfully merged out of 26275 (in 8955 pairings) input. 0 paired-reads (in 0 unique pairings) successfully merged out of 36094 (in 12412 pairings) input. 0 paired-reads (in 0 unique pairings) successfully merged out of 28188 (in 10335 pairings) input. 0 paired-reads (in 0 unique pairings) successfully merged out of 37389 (in 14522 pairings) input. 0 paired-reads (in 0 unique pairings) successfully merged out of 11890 (in 5013 pairings) input. 24 paired-reads (in 1 unique pairings) successfully merged out of 38616 (in 13027 pairings) input. 0 paired-reads (in 0 unique pairings) successfully merged out of 34671 (in 12492 pairings) input. 0 paired-reads (in 0 unique pairings) successfully merged out of 38302 (in 13731 pairings) input. 0 paired-reads (in 0 unique pairings) successfully merged out of 24090 (in 7063 pairings) input. 0 paired-reads (in 0 unique pairings) successfully merged out of 19264 (in 7941 pairings) input.

Thanks for your help.

benjjneb commented 4 years ago

What is your library setup? I.e. what primers are you using, are they included on the reads, and what is the length of the sequenced amplicon?

MatS792 commented 4 years ago

Following your suggestions, I checked the sequences and I found that I was working on sequences already filtered and trimmed.. With the right sequences I have no problems. Thank you for your help, I will more careful next time.

EJS01 commented 2 years ago

What is your library setup? I.e. what primers are you using, are they included on the reads, and what is the length of the sequenced amplicon?

Good day, I am new to bioinformatics... how can I check this parameters?