benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
468 stars 142 forks source link

Some input samples had no reads pass the filter --> Error rates could not be estimated (this is usually because of very few reads). Error in getErrors(err, enforce = TRUE) : Error matrix is NULL. #1738

Closed chuynh96 closed 4 months ago

chuynh96 commented 1 year ago

As stated in the title, I'm running into this issue and I'm not sure how to proceed or fix this?

out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs,

  • maxN=0, maxEE=2, truncQ=2, rm.phix=TRUE,
  • compress=TRUE, multithread=TRUE) Some input samples had no reads pass the filter. head(out) reads.in reads.out SRR14825040_1.fastq.gz 644802 221908 SRR14825041_1.fastq.gz 2294 504 SRR14825042_1.fastq.gz 864765 280319 SRR14825043_1.fastq.gz 678906 241955 SRR14825044_1.fastq.gz 865014 289631 SRR14825045_1.fastq.gz 7741596 1746684

    Learn the error rates of your samples.

    Forward

    errF <- learnErrors(filtFs, multithread=TRUE) 111973169 total bases in 744686 reads from 4 samples will be used for learning the error rates. Error rates could not be estimated (this is usually because of very few reads). Error in getErrors(err, enforce = TRUE) : Error matrix is NULL.

    Reverse

    errR <- learnErrors(filtRs, multithread=TRUE) 111975043 total bases in 744686 reads from 4 samples will be used for learning the error rates. Error rates could not be estimated (this is usually because of very few reads). Error in getErrors(err, enforce = TRUE) : Error matrix is NULL.

benjjneb commented 1 year ago

Can you provide some more information on these samples? What environement/amplicon is being sequenced? What is the sequencing machine?

Also, what does the output of plotQualityProfile look like? Does it look like real data, or is everything perhaps been assigned the same quality score?

chuynh96 commented 1 year ago

These samples are skin microbiome samples from a publicly available dataset https://www.ebi.ac.uk/ena/browser/view/PRJNA736108

benjjneb commented 1 year ago

What amplicon is being sequenced? What is the sequencing machine?

Also, what does the output of plotQualityProfile look like? Does it look like real data, or is everything perhaps been assigned the same quality score?

cresil commented 1 year ago

This dataset appears to be a shotgun library not amplicon.

On Wed, May 24, 2023, 03:38 Benjamin Callahan @.***> wrote:

What amplicon is being sequenced? What is the sequencing machine?

Also, what does the output of plotQualityProfile look like? Does it look like real data, or is everything perhaps been assigned the same quality score?

— Reply to this email directly, view it on GitHub https://github.com/benjjneb/dada2/issues/1738#issuecomment-1560339786, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE4JZDTRSAN7ENP6O46AYPLXHVRALANCNFSM6AAAAAAYMOGU2M . You are receiving this because you are subscribed to this thread.Message ID: @.***>

benjjneb commented 1 year ago

This dataset appears to be a shotgun library

DADA2 is not an appropriate tool for analyzing this data then.

chuynh96 commented 1 year ago

I am running into this problem again, and this time on a dataset I've confirmed to be 16s data from an Illumina. How should I proceed?

Error rates could not be estimated (this is usually because of very few reads). Error in getErrors(err, enforce = TRUE) : Error matrix is NULL.

benjjneb commented 1 year ago

Your beginning diagnostics should be the same as what I mentioned previoulsy in this thread.

Can you provide some more information on these samples? What environement/amplicon is being sequenced? What is the sequencing machine?

Also, what does the output of plotQualityProfile look like? Does it look like real data, or is everything perhaps been assigned the same quality score?

chuynh96 commented 1 year ago

They are skin microbiomes, 16S rRNA (V3/V4 region), Illumina Quality Profile looks like real data, they are not assigned the same quality score

benjjneb commented 1 year ago

Can. you post an example plotQualityProfile output? One for forward reads, one for reverse, from a typical sample.

chuynh96 commented 1 year ago
Screenshot 2023-06-27 at 10 52 10 PM

Sorry for the delay--this is an image of one of the quality profiles

benjjneb commented 1 year ago

This is what the quality profile looks like when every quality value is the same. This data didn't come off a sequencer, it was either simulated or had fake (constant) quality values added back to it later.

chuynh96 commented 1 year ago

Does this mean I can't proceed at all with this data on the dada2 pipeline?

benjjneb commented 1 year ago

You can run learnErrors and dada with USE_QUALS=FALSE, that is ignoring the quality scores.

I'd keep in mind though that this is either simulated or manipulated data when thinking about interpreting your results.

chuynh96 commented 1 year ago

That's unfortunate...thank you for letting me know!