benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
464 stars 142 forks source link

Having zero Reads mapping to samples after changing filter parameters to accommodate all samples #885

Closed kmalki123 closed 4 years ago

kmalki123 commented 4 years ago

I am having an issue with running some 16S amplicon sequences through DADA2. I have successfully run two other datasets before, but for some reason I have been having issues with the one I am currently working on. The issue is that 3 samples do not pass the filter. (The fastqc scores look normal, though I did have some concerns regarding the quality scores of the reverse reads of the three "problem" samples. I contacted the sequencing facility and they assured me that the quality was fine. The error rates also seem normal). When I adjust the filter to make it less stringent they pass the filter but then I have several different samples that have 0 sequences mapping to ASVs when they previously had sequences mapping.

I should note that I have successfully run this data only using the forward reads. However, as I have paired data I am trying to get the paired data version running successfully.

Have you encountered this before? And if so do you have any suggestions?

benjjneb commented 4 years ago

The issue is that 3 samples do not pass the filter. (The fastqc scores look normal, though I did have some concerns regarding the quality scores of the reverse reads

Can you post the results of plotQualityProfile on one of these samples, and on another sample that passes the filter? Can you also post the exact filtering command you are performing?

kmalki123 commented 4 years ago

FnFs_QualityProfile.pdf fnRs_QualityPlot.pdf

I have attached the quality scores of the forward and reverse reads the three problems I see are R_F1, R_F2, R_W1. Would it be better if I removed these samples before running the pipline?

kmalki123 commented 4 years ago

also my code: library(dada2); packageVersion("dada2")

File parsing

path <- "/Users/kemamalki/Desktop/Lab_Stuff/Projects/Freshwater_Springs/Temporal_16s/16S_trimmed" # CHANGE ME to the directory containing your demultiplexed forward-read fastqs filtpath <- file.path(path, "filtered") # Filtered forward files go into the pathF/filtered/ subdirectory fnFs <- sort(list.files(path, pattern="_trim_for_paired.fq")) fnRs <- sort(list.files(path, pattern="_trim_rev_paired.fq")) fnFs <- file.path(path, fnFs) fnFs sample.names <- sapply(strsplit(fnFs, "_trim"), [, 1) sample.names fnRs <- file.path(path, fnRs) fnRs plotQualityProfile(fnFs[1:36]) plotQualityProfile(fnRs[1:36]) if(length(fastqFs) != length(fastqRs)) stop("Forward and reverse files do not match.")

Filtering: THESE PARAMETERS ARENT OPTIMAL FOR ALL DATASETS, ex: needed to change trunclen for For from 240 to 200 for this data

filterAndTrim(fwd=file.path(path, fastqFs), filt=file.path(filtpath, fastqFs), rev=file.path(path, fastqRs), filt.rev=file.path(filtpath, fastqRs), truncLen=c(240,160), maxEE=c(4,4), truncQ=11, maxN=0, rm.phix=TRUE, compress=TRUE, verbose=TRUE, multithread=TRUE)

Truclen where we truncate the read, maxee is lower better, truncq where trunck at low quality score

File parsing

filtpathF <- "/Users/kemamalki/Desktop/Lab_Stuff/Projects/Freshwater_Springs/Temporal_16s/16S_trimmed/filtered" # CHANGE ME to the directory containing your filtered forward fastqs filtpathR <- "/Users/kemamalki/Desktop/Lab_Stuff/Projects/Freshwater_Springs/Temporal_16s/16S_trimmed/filtered" # CHANGE ME filtFs <- list.files(filtpathF, pattern="_F_filt.fq", full.names = TRUE) filtFs filtRs <- list.files(filtpathR, pattern="_R_filt.fq", full.names = TRUE) filtRs sample.namesF <- sapply(strsplit(basename(filtFs), "_F_filt"), [, 1) # Assumes filename = samplename_XXX.fastq.gz sample.namesF sample.namesR <- sapply(strsplit(basename(filtRs), "_R_filt"), [, 1) # Assumes filename = samplename_XXX.fastq.gz class(filtFs) if(!identical(sample.namesF, sample.namesR)) stop("Forward and reverse files do not match.") names(filtFs) <- sample.namesF names(filtRs) <- sample.namesF set.seed(100)

Learn forward error rates

errF <- learnErrors(filtFs, nbases=1e8, multithread=TRUE)

Learn reverse error rates

errR <- learnErrors(filtRs, nbases=1e8, multithread=TRUE)

check the errors should follow the black line

plotErrors(errF, nominalQ=TRUE)

Sample inference and merger of paired-end reads

mergers <- vector("list", length(sample.namesF)) names(mergers) <- sample.namesF for(sam in sample.namesF) { cat("Processing:", sam, "\n") derepF <- derepFastq(filtFs[[sam]]) ddF <- dada(derepF, err=errF, multithread=TRUE) derepR <- derepFastq(filtRs[[sam]]) ddR <- dada(derepR, err=errR, multithread=TRUE) merger <- mergePairs(ddF, derepF, ddR, derepR) mergers[[sam]] <- merger } rm(derepF); rm(derepR)

Construct sequence table and remove chimeras

class(mergers)

makeSequenceTable

seqtab <- makeSequenceTable(mergers) seqtabt<- t(seqtab) write.csv(seqtab, "/Users/kemamalki/Desktop/Lab_Stuff/Projects/Freshwater_Springs/Temporal_16s/output/16S_Temp_seqtable.csv") # CHANGE ME to where you want sequence table saved

Remove chimeras

seqtabclean <- removeBimeraDenovo(seqtab, method="consensus", multithread=TRUE)

Assign taxonomy

tax <- assignTaxonomy(seqtabclean, "/Users/kemamalki/Desktop/Lab_Stuff/Projects/Freshwater_Springs/Temporal_16s/silva_nr_v132_train_set.fa", multithread=TRUE) tax <- addSpecies(tax, "/Users/kemamalki/Desktop/Lab_Stuff/Projects/Freshwater_Springs/Temporal_16s/silva_species_assignment_v132.fa")

Write to disk

write.csv(seqtabclean, "/Users/kemamalki/Desktop/Lab_Stuff/Projects/Freshwater_Springs/Temporal_16s/output/seqtabclean.csv") # CHANGE ME to where you want sequence table saved write.csv(tax, "/Users/kemamalki/Desktop/Lab_Stuff/Projects/Freshwater_Springs/Temporal_16s/output/tax.csv") # CHANGE ME to where you want sequence table saved

benjjneb commented 4 years ago

Those 3 samples have very few reads (30-120) whereas the other samples have tens of thousands. That indicates library preparation was not successful for those samples, and they likely have litltle if any legitimate DNA in them. That they aren't passing the filtering step is not a surprise, and you can drop them from the analysis at that stage.