benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
471 stars 143 forks source link

Error-rate estimation on very low-depth samples should be handled gracefully #469

Closed ndanckert closed 6 years ago

ndanckert commented 6 years ago

Hi Ben,

I've recently been working through the big data PE workflow. This afternoon I updated dada2 to version 1.8.0. Since the update the script cannot complete the mergePairs(..) function.

I now get a combined warning/error "Error rates could not be estimated." "Error in outs[, 1] : subscript out of bounds" half way through processing my dataset. I should note that prior to the update the same dataset was processing without the error, so I am unsure what I've done / has happened.

My script for this section is the same as the tutorial - (https://benjjneb.github.io/dada2/bigdata_paired.html) - the only changes I had to make were swapping 'nreads = X' to 'nbases =X' from learnErrors() section. In saying that I don't think this has contributed to the error as I've tested both separately to no success.

Any help would be much appreciated. Thanks

benjjneb commented 6 years ago

Can you add a bit more detail. What sort of data (e.g. 2x150 HiSeq) is being processed here?

Where exactly is the "Error rates could not be estimated." being thrown? On both forward and reverse reads, or just one?

Did you install via Bioconductor or via Github? If not BioC, what is R.version and packageVersion("ShortRead")?

ndanckert commented 6 years ago

Thanks for the quick response.

My data is 2 x 300bp Miseq reads. I updated R yesterday morning too, so I am now using 3.5.0 and ShortRead version 1.38.0.

I downloaded dada2 from bioconductor ( https://bioconductor.org/packages/release/bioc/html/dada2.html).

As for the errors, the "Error rates could not be estimated." seems to appear for a few samples (poor quality reads / blanks) occuring for both the forward and reverse reads, but the script can continue processing. The Error in outs[, 1] : subscript out of bounds stops the script and seems to be occurring at the exact same sample each time.

If it helps I can send a subset of my data.

Thanks for your help.

On Mon, May 14, 2018 at 9:59 PM, Benjamin Callahan <notifications@github.com

wrote:

Can you add a bit more detail. What sort of data (e.g. 2x150 HiSeq) is being processed here?

Where exactly is the "Error rates could not be estimated." being thrown? On both forward and reverse reads, or just one?

Did you install via Bioconductor or via Github? If not BioC, what is R.version and packageVersion("ShortRead")?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/benjjneb/dada2/issues/469#issuecomment-388792138, or mute the thread https://github.com/notifications/unsubscribe-auth/AjMVCRBhxLsDgJC4QvJXDMsX4lGtu75Aks5tyXGugaJpZM4T9bxN .

benjjneb commented 6 years ago

Can you post the output of the following?

out <- filterAndTrim(...) # Whatever your filtering params are
out

(There was a small change in the error-estimation in the 1.8 release. Typically the change has almost no effect, but it might not be dealing with extremely low-read samples as gracefully.)

ndanckert commented 6 years ago

Not a problem.

I am using 341F - 805R primers. I also recently noticed that the sequencing company were processing our sequence data in reverse (reverse reads 1st and foward reads 2nd), hence truncLen() has the foward reads cut shorter as they're typically deteriorate in quality as you'd normally expect for the reverse and vice versa. Otherwise I've followed your tutorial as mentioned above and done some brief trial and error.

filterAndTrim(fwd=file.path(pathF, fastqFs), filt=file.path(filtpathF, fastqFs), rev=file.path(pathR, fastqRs), filt.rev=file.path(filtpathR, fastqRs), truncLen=c(220,280),trimLeft =c(23,27), maxEE=c(3,3), truncQ=2, maxN=0, rm.phix=TRUE, compress=TRUE, verbose=TRUE, multithread=FALSE)

Thanks.

On Tue, May 15, 2018 at 11:06 AM, Benjamin Callahan < notifications@github.com> wrote:

Can you post the output of the following?

out <- filterAndTrim(...) # Whatever your filtering params are out

(There was a small change in the error-estimation in the 1.8 release. Typically the change has almost no effect, but it might not be dealing with extremely low-read samples as gracefully.)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/benjjneb/dada2/issues/469#issuecomment-389009952, or mute the thread https://github.com/notifications/unsubscribe-auth/AjMVCX6rhGFdSjitIOLynSzu5KHCllBOks5tyioygaJpZM4T9bxN .

benjjneb commented 6 years ago

Sorry wasn't clear enough, can you post the output that prints to screen when you run the following?

out <- filterAndTrim(...) # Whatever your filtering params are
print(out)

It will be a matrix showing the numbers of input and filtered reads, and will help diagnose what might be going wrong.

ndanckert commented 6 years ago

Not problem,

Here is the print out. Apologies for the length.

I've highlighted the sample in red which seems to result in the the code stopping. Interestingly it doesn't have a problem with "T11r1_r2.fastq.gz" which also appears to be another failed sample...

Thanks.

print(out) reads.in reads.out BLANK1_r2.fastq.gz 15985 254 BLANK2_r2.fastq.gz 19936 307 BLANK3_r2.fastq.gz 3931 243 BLANK4_r2.fastq.gz 16188 411 BLANK5_r2.fastq.gz 2089 98 BLANK6_r2.fastq.gz 7246 718 IC1_r2.fastq.gz 13810 7265 IC10_r2.fastq.gz 18771 14339 IC11_r2.fastq.gz 14772 9911 IC12_r2.fastq.gz 16230 6477 IC13_r2.fastq.gz 16342 12110 IC14_r2.fastq.gz 19835 5353 IC15_r2.fastq.gz 13356 9928 IC16_r2.fastq.gz 10597 8755 IC17_r2.fastq.gz 19184 10479 IC18_r2.fastq.gz 22584 10441 IC2_r2.fastq.gz 23989 6962 IC3_r2.fastq.gz 8085 6948 IC4_r2.fastq.gz 18175 9995 IC5_r2.fastq.gz 20765 11492 IC6_r2.fastq.gz 18115 10492 IC7_r2.fastq.gz 13381 8835 IC8_r2.fastq.gz 15921 8220 IC9_r2.fastq.gz 12781 9634 T10r1_r2.fastq.gz 16145 13109 T10r2_r2.fastq.gz 16269 10978 T10r3_r2.fastq.gz 16668 11996 T11r1_r2.fastq.gz 15 1 T11r2_r2.fastq.gz 21817 6387 T11r3_r2.fastq.gz 12161 10320 T12r1_r2.fastq.gz 12451 9189 T12r2_r2.fastq.gz 22385 17755 T12r3_r2.fastq.gz 20984 15277 T13r1_r2.fastq.gz 16035 11023 T13r2_r2.fastq.gz 18284 9972 T13r3_r2.fastq.gz 12304 9801 T14r1_r2.fastq.gz 11953 9146 T14r2_r2.fastq.gz 17518 10414 T14r3_r2.fastq.gz 17344 12330 T15r1_r2.fastq.gz 11706 8976 T15r2_r2.fastq.gz 9456 7914 T15r3_r2.fastq.gz 17790 13873 T16r1_r2.fastq.gz 11971 9310 T16r2_r2.fastq.gz 14081 9849 T16r3_r2.fastq.gz 25261 3143 T17r1_r2.fastq.gz 17750 12368 T17r2_r2.fastq.gz 19885 7849 T17r3_r2.fastq.gz 29 2 T18r1_r2.fastq.gz 21051 18146 T18r2_r2.fastq.gz 14033 10997 T18r3_r2.fastq.gz 21557 11984 T19r1_r2.fastq.gz 19206 12151 T19r2_r2.fastq.gz 15807 13346 T19r3_r2.fastq.gz 15541 10026 T1r1_r2.fastq.gz 17006 10092 T1r2_r2.fastq.gz 19504 15418 T1r3_r2.fastq.gz 12754 8905 T20r1_r2.fastq.gz 12049 6324 T20r2_r2.fastq.gz 16098 12825 T20r3_r2.fastq.gz 12153 9386 T21r1_r2.fastq.gz 20478 13438 T21r2_r2.fastq.gz 21046 15505 T21r3_r2.fastq.gz 13632 12106 T22r1_r2.fastq.gz 4133 3376 T22r2_r2.fastq.gz 9432 7467 T22r3_r2.fastq.gz 13685 9804 T23r1_r2.fastq.gz 23573 5149 T23r2_r2.fastq.gz 16725 13064 T23r3_r2.fastq.gz 19390 14218 T24r1_r2.fastq.gz 16431 11904 T24r2_r2.fastq.gz 13200 9471 T24r3_r2.fastq.gz 10313 7864 T25r1_r2.fastq.gz 14559 12461 T25r2_r2.fastq.gz 18697 16904 T25r3_r2.fastq.gz 13638 12535 T26r1_r2.fastq.gz 14173 11754 T26r2_r2.fastq.gz 13727 12211 T26r3_r2.fastq.gz 28912 8588 T27r1_r2.fastq.gz 30738 8299 T27r2_r2.fastq.gz 14009 10795 T27r3_r2.fastq.gz 14023 11306 T28r1_r2.fastq.gz 6736 5219 T28r2_r2.fastq.gz 16228 8320 T28r3_r2.fastq.gz 11830 8674 T29r1_r2.fastq.gz 14517 11941 T29r2_r2.fastq.gz 18020 12771 T29r3_r2.fastq.gz 17545 10739 T2r1_r2.fastq.gz 13804 11342 T2r2_r2.fastq.gz 11532 9241 T2r3_r2.fastq.gz 23899 18872 T30r1_r2.fastq.gz 11647 9983 T30r2_r2.fastq.gz 17257 11186 T30r3_r2.fastq.gz 17759 12641 T31r1_r2.fastq.gz 18195 12760 T31r2_r2.fastq.gz 13764 12370 T31r3_r2.fastq.gz 17562 10216 T32r1_r2.fastq.gz 10771 8397 T32r2_r2.fastq.gz 13493 9838 T32r3_r2.fastq.gz 25200 16885 T3r1_r2.fastq.gz 16676 11338 T3r2_r2.fastq.gz 17698 10234 T3r3_r2.fastq.gz 20855 13993 T4r1_r2.fastq.gz 18648 5072 T4r2_r2.fastq.gz 15136 11107 T4r3_r2.fastq.gz 42021 394 T5r1_r2.fastq.gz 16439 11747 T5r2_r2.fastq.gz 15880 13635 T5r3_r2.fastq.gz 12810 10590 T6r1_r2.fastq.gz 19590 15094 T6r2_r2.fastq.gz 15599 13672 T6r3_r2.fastq.gz 17915 13198 T7r1_r2.fastq.gz 18417 9497 T7r2_r2.fastq.gz 16847 9644 T7r3_r2.fastq.gz 13048 9319 T8r1_r2.fastq.gz 18913 12791 T8r2_r2.fastq.gz 11829 9816 T8r3_r2.fastq.gz 14321 12737 T9r1_r2.fastq.gz 18897 13460 T9r2_r2.fastq.gz 13152 9149 T9r3_r2.fastq.gz 20303 10387

On Tue, May 15, 2018 at 10:08 PM, Benjamin Callahan < notifications@github.com> wrote:

Sorry wasn't clear enough, can you post the output that prints to screen when you run the following?

out <- filterAndTrim(...) # Whatever your filtering params are print(out)

It will be a matrix showing the numbers of input and filtered reads, and will help diagnose what might be going wrong.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/benjjneb/dada2/issues/469#issuecomment-389143517, or mute the thread https://github.com/notifications/unsubscribe-auth/AjMVCZafmiY8lQD_T3UuYOUBIs3uqIAHks5tysVBgaJpZM4T9bxN .

benjjneb commented 6 years ago

The issue is probably the failed samples with just 1 or 2 reads. What I'd recommend is to drop those samples after filtering, i.e. by:

keep <- out[,"reads.out"] > 20 # Or other cutoff
filtFs <- file.path(filtpathF, fastqFs)[keep]
filtRs <- file.path(filtpathR, fastqRs)[keep]

And then proceed from there and hopefully all will be well. I'll have to think about a way to make that simpler...

ndanckert commented 6 years ago

Great! I'll run this overnight and let you know if it works.

Appreciate the help.

On Tue, May 15, 2018 at 11:09 PM, Benjamin Callahan < notifications@github.com> wrote:

The issue is probably the failed samples with just 1 or 2 reads. What I'd recommend is to drop those samples after filtering, i.e. by:

keep <- out[,"reads.out"] > 20 # Or other cutoff filtFs <- file.path(filtpathF, fastqFs)[keep] filtRs <- file.path(filtpathR, fastqRs)[keep]

And then proceed from there and hopefully all will be well. I'll have to think about a way to make that simpler...

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/benjjneb/dada2/issues/469#issuecomment-389160528, or mute the thread https://github.com/notifications/unsubscribe-auth/AjMVCRgjYhhXTjWfU5kno0OzaygP2Y7Wks5tytOngaJpZM4T9bxN .

ndanckert commented 6 years ago

Problem solved. You were spot on, for whatever reason multiple failed samples were causing the script to crash. Removing them solved the issue. Thanks.

MarieBLund commented 6 years ago

Hi Ben I've run into the exact same problem in dada2 v 1.8. Removing my sample with only 50 reads solved the problem. My sample with the fewest number of reads.out is now 4680

benjjneb commented 6 years ago

Re-opening because this should be handled more gracefully. Either by catching and handling the exception when error-model estimation fails on very low-read samples, or by automatically declining to estimate error rates on low-read samples.

alexweisberg commented 6 years ago

I also get a similar error ("Error in outs[, 1] : subscript out of bounds") however I do not get the message about error rates. This also happens when I only include samples with >1000 reads using the code you suggested. mergePairs still fails partway through, as it looks like all or most of the reads are filtered for one of the samples: > mergers <- mergePairs(dadaFs, derepFs, dadaRs, derepRs, verbose=TRUE) 2399 paired-reads (in 19 unique pairings) successfully merged out of 2499 (in 34 pairings) input. 2988 paired-reads (in 18 unique pairings) successfully merged out of 3402 (in 34 pairings) input. 8185 paired-reads (in 50 unique pairings) successfully merged out of 10746 (in 152 pairings) input. 6378 paired-reads (in 32 unique pairings) successfully merged out of 6501 (in 50 pairings) input. 13535 paired-reads (in 52 unique pairings) successfully merged out of 19120 (in 179 pairings) input. 22867 paired-reads (in 42 unique pairings) successfully merged out of 24428 (in 126 pairings) input. 44332 paired-reads (in 51 unique pairings) successfully merged out of 45920 (in 120 pairings) input. 60736 paired-reads (in 84 unique pairings) successfully merged out of 64171 (in 205 pairings) input. 66398 paired-reads (in 68 unique pairings) successfully merged out of 67615 (in 151 pairings) input. 16849 paired-reads (in 90 unique pairings) successfully merged out of 24121 (in 345 pairings) input. 24607 paired-reads (in 116 unique pairings) successfully merged out of 31250 (in 480 pairings) input. 66496 paired-reads (in 57 unique pairings) successfully merged out of 68044 (in 131 pairings) input. 111381 paired-reads (in 105 unique pairings) successfully merged out of 114455 (in 334 pairings) input. 90861 paired-reads (in 81 unique pairings) successfully merged out of 93401 (in 235 pairings) input. 15441 paired-reads (in 38 unique pairings) successfully merged out of 17480 (in 149 pairings) input. 1411 paired-reads (in 20 unique pairings) successfully merged out of 1653 (in 32 pairings) input. 1818 paired-reads (in 16 unique pairings) successfully merged out of 1869 (in 21 pairings) input. 12226 paired-reads (in 41 unique pairings) successfully merged out of 12543 (in 88 pairings) input. 25014 paired-reads (in 26 unique pairings) successfully merged out of 25436 (in 63 pairings) input. 437465 paired-reads (in 384 unique pairings) successfully merged out of 494201 (in 1657 pairings) input. 11958 paired-reads (in 49 unique pairings) successfully merged out of 13411 (in 150 pairings) input. 8761 paired-reads (in 33 unique pairings) successfully merged out of 9146 (in 61 pairings) input. 19405 paired-reads (in 76 unique pairings) successfully merged out of 22069 (in 246 pairings) input. 36842 paired-reads (in 88 unique pairings) successfully merged out of 39761 (in 306 pairings) input. 27 paired-reads (in 2 unique pairings) successfully merged out of 27 (in 2 pairings) input. 10 paired-reads (in 1 unique pairings) successfully merged out of 11 (in 2 pairings) input. 289 paired-reads (in 13 unique pairings) successfully merged out of 757 (in 22 pairings) input. 0 paired-reads (in 0 unique pairings) successfully merged out of 1 (in 1 pairings) input. Error in outs[, 1] : subscript out of bounds

Is there a way to automatically drop samples like this and give a warning message rather than causing the entire run to crash? Thanks!

benjjneb commented 6 years ago

Is there a way to automatically drop samples like this and give a warning message rather than causing the entire run to crash? Thanks!

There isn't an "automatic" way right now. But you can screen the number of output filtered reads and drop samples that don't pass a reasonable threshold (e.g. 50 reads) which should avoid this issue (see post above for example code).

We're thinking about ways to handle this better though.

alexweisberg commented 6 years ago

Yes, sorry, I should have mentioned, I filtered to samples with at least 1000 reads and it still gave an error. It looks like there are some samples that are so poor that they have 0 reads after filtering.

benjjneb commented 6 years ago

I filtered to samples with at least 1000 reads and it still gave an error. It looks like there are some samples that are so poor that they have 0 reads after filtering.

If you filter on the number of reads post-filtering it will solve that issue. e.g. from earlier post:

out <- filterAndTrim(...)
keep <- out[,"reads.out"] > 20 # Or other cutoff
filtFs <- file.path(filtpathF, fastqFs)[keep]
filtRs <- file.path(filtpathR, fastqRs)[keep]
Shellfishgene commented 6 years ago

Hi, I also have this problem. What does fastqFs and filtpathF in you solution refer to? It's not mentioned in the current tutorial.

benjjneb commented 6 years ago

What does fastqFs and filtpathF in you solution refer to? It's not mentioned in the current tutorial.

That is from the Big Data workflow. You can do the same with the tutorial workflow though:

# After out <- filterAndTrim(...)
keep <- out[,"reads.out"] > 20 # Or other cutoff
filtFs <- filtFs[keep]
filtRs <- filtRs[keep]

Also upgrading this to bug status (higher priority) as this seems to be affecting multiple people.

ecastron commented 6 years ago

Just a note on this: The same error can arise if you analyze a batch of samples and leave the negative control in. As expected, the negative control yielded very few reads and at the merging step I got the Error in outs[, 1] : subscript out of bounds. By deleting that element from the dadaFs and derepFs, and the same for the reverse reads, solved the problem.

casett commented 6 years ago

I have an ITS2 low-complexity dataset where I get this error with 1.8 (had no issues with 1.4) even when I use > 1000 reads:

`out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs, maxN=0, maxEE=c(2,2), truncQ=2, matchIDs =TRUE, rm.phix=TRUE, compress=TRUE, multithread=TRUE, verbose = TRUE)

keep <- out[,"reads.out"] > 1000 filtFs <- filtFs[keep] filtRs <- filtRs[keep]`

However - I find that if I relax the filtering parameters (and let WAY more reads through), I still get the error - but can get merging to happen successfully using samples > 150 reads: `out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs, maxN=0, maxEE=c(10,10), truncQ=2, matchIDs =TRUE, rm.phix=TRUE, compress=TRUE, multithread=TRUE, verbose = TRUE)

keep <- out[,"reads.out"] > 150 filtFs <-filtFs[keep] filtRs <-filtRs[keep]`

As an aside - previously with v. 1.4 using the first filterAndTrim #s (e.g. EE = 2) and following the tutorial was able to get a sequence table with ~1000 ASVs, but with v. 1.8 using the same files / code, I only get ~16 - were changes made to the merging algorithm between versions? I was unable to replicate the issue with 16S data so I assume its an ITS merging issue. Changing the EE to 10, I was able to increase my # of ASVs back to ~1000.

benjjneb commented 6 years ago

were changes made to the merging algorithm between versions?

Yes, see ae200adfb6632659016a81d0267c32c7dbcbbb67

benjjneb commented 6 years ago

So after some digging, it looks like this was already fixed previously. The error here is in the merging step, which now handles samples with zero merging reads gracefully (7d8d2df8ad06b47c49fb627d8b8c13a90aa00632).

"Error rates could not be estimated." is just a warning (or a message, technically) that is output when error rate estimation fails on a sample, but is already correctly handled by the code and can almost always be ignored, as it really only matters if the error rates are estimated during the learnErrors step, not the subsequent dada step.

For that reason, I changed the logic on outputting this warning so that it only is seen when selfConsist=TRUE or when called via learnErrors.

mosabdel commented 11 months ago

I am facing the same problem with DADA2. In the R2, I couldn't estimate the error rate for the Reverse reads. Even when i use Keep >20 to mainline the high sample reads, it still not able to solve. any advice.

**> out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs, truncLen=c(250,150), maxN=0, maxEE=c(2,5), truncQ=2, rm.phix=TRUE, compress=TRUE, multithread=FALSE)

out reads.in reads.out AKBAC049_6021_S49_L001_R1_001.fastq.gz 9355 6242 AKBAC050_6021_S50_L001_R1_001.fastq.gz 15388 11285 AKBAC051_6021_S51_L001_R1_001.fastq.gz 12776 9503 AKBAC052_6021_S52_L001_R1_001.fastq.gz 14326 10284 AKBAC053_6021_S53_L001_R1_001.fastq.gz 11764 8709 AKBAC054_6021_S54_L001_R1_001.fastq.gz 10962 8159 AKBAC055_6021_S55_L001_R1_001.fastq.gz 13240 9487 AKBAC056_6021_S56_L001_R1_001.fastq.gz 11528 8185 AKBAC057_6021_S57_L001_R1_001.fastq.gz 8328 5607 AKBAC058_6021_S58_L001_R1_001.fastq.gz 11070 8340 AKBAC059_6021_S59_L001_R1_001.fastq.gz 11017 8022 AKBAC060_6021_S60_L001_R1_001.fastq.gz 11395 8543 > truncLen_F <- 250 > truncLen_R <- 150 total_bases_F <- sum(out[,"reads.out"]) truncLen_F total_bases_R <- sum(out[,"reads.out"]) truncLen_R > print(paste("Total bases in forward reads:", total_bases_F)) [1] "Total bases in forward reads: 25591500" print(paste("Total bases in reverse reads:", total_bases_R)) [1] "Total bases in reverse reads: 13332800" keep <- out[,"reads.out"] > 20 filtFs <- filtFs[keep] filtRs <- filtRs[keep] errF <- learnErrors(filtFs, multithread=TRUE, nbases=25640000) 25640000 total bases in 102560 reads from 12 samples will be used for learning the error rates. errR <- learnErrors(filtRs, multithread=TRUE, nbases=13332800)** 13332800 total bases in 102560 reads from 12 samples will be used for learning the error rates. Error rates could not be estimated (this is usually because of very few reads). Error in getErrors(err, enforce = TRUE) : Error matrix is NULL. plot1-QC.pdf plot2_QC.pdf

benjjneb commented 11 months ago

@mosabdel Could you describe a bit more your data: What amplicon? What sequencing technology? What sample types?

Is this issue only happening in the reverse reads?

mosabdel commented 11 months ago

Hi Benjamine We used the NextSeq2000 platform. Primer Sets: Bacterial Amplicon: 515F (Parada) (TGTGYCAGCMGCCGCGGTAA) – 806R (Apprill) (GGACTACNVGGGTWTCTAAT) So, forward: TGTGYCAGCMGCCGCGGTAA, Reverse: GGACTACNVGGGTWTCTAAT Samples derived from bulk soil, root as well as phyllosphere. All get same error. Error occurs with Reverse reads only, Even though the errF was estimated but it did not show good model.

Thanks, Mostafa     


From: Benjamin Callahan @.> Sent: Wednesday, November 29, 2023 12:03 PM To: benjjneb/dada2 @.> Cc: Abdelrahman, Mostafa @.>; Mention @.> Subject: Re: [benjjneb/dada2] Error-rate estimation on very low-depth samples should be handled gracefully (#469)

This email originated outside TTU. Please exercise cautionhttps://askit.ttu.edu/phishing!

@mosabdelhttps://github.com/mosabdel Could you describe a bit more your data: What amplicon? What sequencing technology? What sample types?

Is this issue only happening in the reverse reads?

— Reply to this email directly, view it on GitHubhttps://github.com/benjjneb/dada2/issues/469#issuecomment-1832442312, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BEH5NEPWM3SHVYVI6OZKAR3YG52GBAVCNFSM4E7VXRG2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBTGI2DIMRTGEZA. You are receiving this because you were mentioned.Message ID: @.***>

benjjneb commented 11 months ago

@mosabdel What is your version of the dada2 package and R version?

The output of sessionInfo() is an easy way to get both, after you have loaded the dada2 R package.

mosabdel commented 11 months ago

Hi benjamine

I am using R 4.3.2

dada2_1.30.0

thanks, Mostafa     


From: Benjamin Callahan @.> Sent: Thursday, December 7, 2023 8:09 PM To: benjjneb/dada2 @.> Cc: Abdelrahman, Mostafa @.>; Mention @.> Subject: Re: [benjjneb/dada2] Error-rate estimation on very low-depth samples should be handled gracefully (#469)

This email originated outside TTU. Please exercise cautionhttps://askit.ttu.edu/phishing!

@mosabdelhttps://github.com/mosabdel What is your version of the dada2 package and R version?

The output of sessionInfo() is an easy way to get both, after you have loaded the dada2 R package.

— Reply to this email directly, view it on GitHubhttps://github.com/benjjneb/dada2/issues/469#issuecomment-1846445908, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BEH5NEOHLSJZGBPVVTK52GLYIJZFJAVCNFSM4E7VXRG2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBUGY2DINJZGA4A. You are receiving this because you were mentioned.Message ID: @.***>

benjjneb commented 11 months ago

Error occurs with Reverse reads only

I don't recall this ever being reported before. Can you make a "minimal example"? That would be a single R2 fastq file that causes this error, and the exact command that causes the error. Ideally a subsampled fastq file.