Closed jesika0123 closed 4 months ago
Your primer-checking results suggest that you are working with a mixed orientation library. That is, the sequenced reads are coming from an even mixture of those in the expected orientation (FWD primer on R1 and rc(REV) primer on R2), and the opposite orientation (FWD primer on R2 and rc(REV) on R2). Your cutadapt command is only trimming primers when the reads are in the expected orientation.
The easiest solution is to discard untrimmed reads at the cutadapt step. This will throw away half the reads, but also avoid a number of complications downstream. For other approaches, see this previous thread https://github.com/benjjneb/dada2/issues/1357 and there are probably other examples on the forum as well.
Thank you for your reply. I have continued to edit the cutadapt parameters and settings, but still cannot get all 0s at the "sanity check" after primer removal.
I have added your suggestion of adding --discard-untrimmed, but also have added other suggested parameters.
Questions: 1) For the following script (below) primers are still present, but these are the best results so far (a full summary is below). What happens if it is not possible to get to all 0s at this step?
2) Is there a way not to lose a large percentage of reads at this step? Or is just the beast of this type of sequence data?
Best script and summary of results: for (i in seq_along(fnFs)) { system2(cutadapt, args = c(R1.flags, R2.flags, "-j", 0, "--nextseq-trim=20", "--match-read-wildcards", "--discard-untrimmed", "-n", 2, "--minimum-length", 214, "-e", 0, "-o", fnFs.cut[i], "-p", fnRs.cut[i], # output files fnFs.filtN[i], fnRs.filtN[i])) # input files }
Don't worry about getting perfect zeros. The goal is to get rid of almost all the primers, and you're achieving that.
A few stray ones show up because there are minor differences in how primers are matched by cutadapt and the R code, and possibly some malformed amplicons where the primers might appear in the middle of the sequences. But these are at very low numbers, so not a concern.
Thank you!
I am following the pipeline posted on "https://github.com/ErnakovichLab/dada2_ernakovichlab".
I am working with NovaSeq 2*250bp data. Primer amplicon length: 347bp Issue: rbind results (see below) Solution: although I have coding experience, I am far from an expert, so please keep this in mind when reply. Thanks in advance for all help!
I have defined the primers as 1) FWD <- "18 bp with wobbles" and 2) REV <- "18 bp no wobbles" (sorry I can't share the actual bps).
Next I ran the following:
Function that creates a list of all orientations of the primers
Pre-filter to remove sequences with ambiguous bases (Ns)
fnFs.filtN <- file.path(preprocess, "filtN", basename(fnFs)) fnRs.filtN <- file.path(preprocess, "filtN", basename(fnRs)) filterAndTrim(fnFs, fnFs.filtN, fnRs, fnRs.filtN, maxN = 0, multithread = FALSE)
Count the number of times the primers appear in the forward and reverse read, while considering all possible primer orientations.
primerHits <- function(primer, fn) {
Counts number of reads in which the primer is found
}
rbind(FWD.ForwardReads = sapply(FWD.orients, primerHits, fn = fnFs.filtN[[1]]), FWD.ReverseReads = sapply(FWD.orients, primerHits, fn = fnRs.filtN[[1]]), REV.ForwardReads = sapply(REV.orients, primerHits, fn = fnFs.filtN[[1]]), REV.ReverseReads = sapply(REV.orients, primerHits, fn = fnRs.filtN[[1]]))
Not the expect results: Forward Complement Reverse RevComp FWD.ForwardReads 314219 0 0 8724 FWD.ReverseReads 305418 0 0 8770 REV.ForwardReads 204238 0 0 424 REV.ReverseReads 210007 0 0 458
But...just to humor myself I keep going:
Create directory to hold the output from cutadapt
if (!dir.exists(trimmed)) dir.create(trimmed) fnFs.cut <- file.path(trimmed, basename(fnFs)) fnRs.cut <- file.path(trimmed, basename(fnRs))
Save the reverse complements of the primers to variables
FWD.RC <- dada2:::rc(FWD) REV.RC <- dada2:::rc(REV)
Trim FWD and the reverse-complement of REV off of R1 (forward reads)
R1.flags <- paste("-g", FWD, "-a", REV.RC)
Trim REV and the reverse-complement of FWD off of R2 (reverse reads)
R2.flags <- paste("-G", REV, "-A", FWD.RC)
Run Cutadapt
for (i in seq_along(fnFs)) { system2(cutadapt, args = c(R1.flags, R2.flags, "-m", 20, #based on a suggestion by benjjneb "-n", 2, # -n 2 required to remove FWD and REV from reads "-o", fnFs.cut[i], "-p", fnRs.cut[i], # output files fnFs.filtN[i], fnRs.filtN[i])) # input files }
As expected: not the results I want:
FWD.ForwardReads 0 0 0 8723 FWD.ReverseReads 305352 0 0 0 REV.ForwardReads 204238 0 0 0 REV.ReverseReads 0 0 0 458