Closed kmalki123 closed 4 years ago
The issue is that 3 samples do not pass the filter. (The fastqc scores look normal, though I did have some concerns regarding the quality scores of the reverse reads
Can you post the results of plotQualityProfile
on one of these samples, and on another sample that passes the filter? Can you also post the exact filtering command you are performing?
FnFs_QualityProfile.pdf fnRs_QualityPlot.pdf
I have attached the quality scores of the forward and reverse reads the three problems I see are R_F1, R_F2, R_W1. Would it be better if I removed these samples before running the pipline?
also my code: library(dada2); packageVersion("dada2")
path <- "/Users/kemamalki/Desktop/Lab_Stuff/Projects/Freshwater_Springs/Temporal_16s/16S_trimmed" # CHANGE ME to the directory containing your demultiplexed forward-read fastqs
filtpath <- file.path(path, "filtered") # Filtered forward files go into the pathF/filtered/ subdirectory
fnFs <- sort(list.files(path, pattern="_trim_for_paired.fq"))
fnRs <- sort(list.files(path, pattern="_trim_rev_paired.fq"))
fnFs <- file.path(path, fnFs)
fnFs
sample.names <- sapply(strsplit(fnFs, "_trim"), [
, 1)
sample.names
fnRs <- file.path(path, fnRs)
fnRs
plotQualityProfile(fnFs[1:36])
plotQualityProfile(fnRs[1:36])
if(length(fastqFs) != length(fastqRs)) stop("Forward and reverse files do not match.")
filterAndTrim(fwd=file.path(path, fastqFs), filt=file.path(filtpath, fastqFs), rev=file.path(path, fastqRs), filt.rev=file.path(filtpath, fastqRs), truncLen=c(240,160), maxEE=c(4,4), truncQ=11, maxN=0, rm.phix=TRUE, compress=TRUE, verbose=TRUE, multithread=TRUE)
filtpathF <- "/Users/kemamalki/Desktop/Lab_Stuff/Projects/Freshwater_Springs/Temporal_16s/16S_trimmed/filtered" # CHANGE ME to the directory containing your filtered forward fastqs
filtpathR <- "/Users/kemamalki/Desktop/Lab_Stuff/Projects/Freshwater_Springs/Temporal_16s/16S_trimmed/filtered" # CHANGE ME
filtFs <- list.files(filtpathF, pattern="_F_filt.fq", full.names = TRUE)
filtFs
filtRs <- list.files(filtpathR, pattern="_R_filt.fq", full.names = TRUE)
filtRs
sample.namesF <- sapply(strsplit(basename(filtFs), "_F_filt"), [
, 1) # Assumes filename = samplename_XXX.fastq.gz
sample.namesF
sample.namesR <- sapply(strsplit(basename(filtRs), "_R_filt"), [
, 1) # Assumes filename = samplename_XXX.fastq.gz
class(filtFs)
if(!identical(sample.namesF, sample.namesR)) stop("Forward and reverse files do not match.")
names(filtFs) <- sample.namesF
names(filtRs) <- sample.namesF
set.seed(100)
errF <- learnErrors(filtFs, nbases=1e8, multithread=TRUE)
errR <- learnErrors(filtRs, nbases=1e8, multithread=TRUE)
plotErrors(errF, nominalQ=TRUE)
mergers <- vector("list", length(sample.namesF)) names(mergers) <- sample.namesF for(sam in sample.namesF) { cat("Processing:", sam, "\n") derepF <- derepFastq(filtFs[[sam]]) ddF <- dada(derepF, err=errF, multithread=TRUE) derepR <- derepFastq(filtRs[[sam]]) ddR <- dada(derepR, err=errR, multithread=TRUE) merger <- mergePairs(ddF, derepF, ddR, derepR) mergers[[sam]] <- merger } rm(derepF); rm(derepR)
class(mergers)
seqtab <- makeSequenceTable(mergers) seqtabt<- t(seqtab) write.csv(seqtab, "/Users/kemamalki/Desktop/Lab_Stuff/Projects/Freshwater_Springs/Temporal_16s/output/16S_Temp_seqtable.csv") # CHANGE ME to where you want sequence table saved
seqtabclean <- removeBimeraDenovo(seqtab, method="consensus", multithread=TRUE)
tax <- assignTaxonomy(seqtabclean, "/Users/kemamalki/Desktop/Lab_Stuff/Projects/Freshwater_Springs/Temporal_16s/silva_nr_v132_train_set.fa", multithread=TRUE) tax <- addSpecies(tax, "/Users/kemamalki/Desktop/Lab_Stuff/Projects/Freshwater_Springs/Temporal_16s/silva_species_assignment_v132.fa")
write.csv(seqtabclean, "/Users/kemamalki/Desktop/Lab_Stuff/Projects/Freshwater_Springs/Temporal_16s/output/seqtabclean.csv") # CHANGE ME to where you want sequence table saved write.csv(tax, "/Users/kemamalki/Desktop/Lab_Stuff/Projects/Freshwater_Springs/Temporal_16s/output/tax.csv") # CHANGE ME to where you want sequence table saved
Those 3 samples have very few reads (30-120) whereas the other samples have tens of thousands. That indicates library preparation was not successful for those samples, and they likely have litltle if any legitimate DNA in them. That they aren't passing the filtering step is not a surprise, and you can drop them from the analysis at that stage.
I am having an issue with running some 16S amplicon sequences through DADA2. I have successfully run two other datasets before, but for some reason I have been having issues with the one I am currently working on. The issue is that 3 samples do not pass the filter. (The fastqc scores look normal, though I did have some concerns regarding the quality scores of the reverse reads of the three "problem" samples. I contacted the sequencing facility and they assured me that the quality was fine. The error rates also seem normal). When I adjust the filter to make it less stringent they pass the filter but then I have several different samples that have 0 sequences mapping to ASVs when they previously had sequences mapping.
I should note that I have successfully run this data only using the forward reads. However, as I have paired data I am trying to get the paired data version running successfully.
Have you encountered this before? And if so do you have any suggestions?