tresacool commented 2 years ago

Hello,

I try to analyze paired 16S rRNA reads from prokaryotes, which I extracted from metagenome data (sequenced on Illumina NextSeq) using SortMeRNA. The metagenome reads have been trimmed (read length min 50bp, Phred 20) before the extraction. This is also visible in the PlotQualityProfile.

This is the code I was using: `

Sort samples

fastqs <- fns[grepl(".fastqsanger$", fns)] fastqs <- sort(fastqs) fnFs <- fastqs[grepl("_forward.fastqsanger", fastqs)] fnRs <- fastqs[grepl("_reverse.fastqsanger", fastqs)]

sample.names <- sapply(strsplit(fnFs, "_"), [, 1)

Specify the full path to the fnFs and fnRs

fnFs <- file.path(path, fnFs) fnRs <- file.path(path, fnRs)

Quality Plot

plotQualityProfile(RH2020F[1:4])

Make directory and filenames for the filtered fastqs

filt_path <- file.path(path, "filtered") if(!file_test("-d", filt_path)) dir.create(filt_path) filtFs <- file.path(filt_path, paste0(sample.names, "_F_filt.fastq.gz")) filtRs <- file.path(filt_path, paste0(sample.names, "_R_filt.fastq.gz"))

Filter and dereplicate

for(i in seq_along(fnFs)) { fastqPairedFilter(c(fnFs[i], fnRs[i]), c(filtFs[i], filtRs[i]), maxN=0, maxEE=c(2,2), rm.phix=TRUE, compress=TRUE, verbose=TRUE) }

out <- filterAndTrim(RH2020F, filtFs, RH2020R, filtRs,rm.phix=TRUE,maxEE=c(2,2), minLen = 50, compress=TRUE, multithread=TRUE) head(out)

derepFs <- derepFastq(filtFs, qualityType="FastqQuality", verbose=TRUE) derepRs <- derepFastq(filtRs,qualityType="FastqQuality", verbose=TRUE)

names(derepFs) <- sample.names names(derepRs) <- sample.names

Learn error rates

errF <- learnErrors(derepFs, randomize = TRUE, multithread=TRUE, MAX_CONSIST = 20, nbases=1e12) or errR <- learnErrors(filtRs, multithread=TRUE) or dadaFs.lrn <- dada(derepFs, err=NULL, selfConsist = TRUE, multithread=TRUE)
`

I can run the pipeline up to the learnErrors() function, where I get the following message: Error in getErrors(err, enforce = TRUE) : Error matrix is NULL

Also after changing parameter it remains the same. Is this caused by the low error rate of the reads (because they were trimmed before)? Do you have an idea how I can solve the issue?

Thanks and kind regards!

benjjneb commented 2 years ago

Is this amplicon sequencing data?

tresacool commented 2 years ago

The reads are paired 16S rRNA reads extracted from a shotgun metagenome dataset.

benjjneb commented 5 months ago

DADA2 is not appropriate for analyzing shotgun metagenome data.

benjjneb / dada2