jesika0123 commented 4 months ago

I am following the pipeline posted on "https://github.com/ErnakovichLab/dada2_ernakovichlab".

I am working with NovaSeq 2*250bp data. Primer amplicon length: 347bp Issue: rbind results (see below) Solution: although I have coding experience, I am far from an expert, so please keep this in mind when reply. Thanks in advance for all help!

I have defined the primers as 1) FWD <- "18 bp with wobbles" and 2) REV <- "18 bp no wobbles" (sorry I can't share the actual bps).

Next I ran the following:

Function that creates a list of all orientations of the primers

allOrients <- function(primer) {
  # Create all orientations of the input sequence
  require(Biostrings)
  dna <- DNAString(primer)  #The Biostrings works w/ DNAString objects rather than character vectors
  orients <- c(Forward = dna, Complement = complement(dna), Reverse = reverse(dna), 
               RevComp = reverseComplement(dna))
  return(sapply(orients, toString))  #Convert back to character vector
}

# Save the primer orientations to pass to cutadapt
FWD.orients<- allOrients(FWD)
FWD.orients

REV.orients <- allOrients(REV)
REV.orients

Pre-filter to remove sequences with ambiguous bases (Ns)

fnFs.filtN <- file.path(preprocess, "filtN", basename(fnFs)) fnRs.filtN <- file.path(preprocess, "filtN", basename(fnRs)) filterAndTrim(fnFs, fnFs.filtN, fnRs, fnRs.filtN, maxN = 0, multithread = FALSE)

Count the number of times the primers appear in the forward and reverse read, while considering all possible primer orientations.

primerHits <- function(primer, fn) {

Counts number of reads in which the primer is found

nhits <- vcountPattern(primer, sread(readFastq(fn)), fixed = FALSE)
return(sum(nhits > 0))

}

rbind(FWD.ForwardReads = sapply(FWD.orients, primerHits, fn = fnFs.filtN[[1]]), FWD.ReverseReads = sapply(FWD.orients, primerHits, fn = fnRs.filtN[[1]]), REV.ForwardReads = sapply(REV.orients, primerHits, fn = fnFs.filtN[[1]]), REV.ReverseReads = sapply(REV.orients, primerHits, fn = fnRs.filtN[[1]]))

Not the expect results: Forward Complement Reverse RevComp FWD.ForwardReads 314219 0 0 8724 FWD.ReverseReads 305418 0 0 8770 REV.ForwardReads 204238 0 0 424 REV.ReverseReads 210007 0 0 458

But...just to humor myself I keep going:

Create directory to hold the output from cutadapt

if (!dir.exists(trimmed)) dir.create(trimmed) fnFs.cut <- file.path(trimmed, basename(fnFs)) fnRs.cut <- file.path(trimmed, basename(fnRs))

Save the reverse complements of the primers to variables

FWD.RC <- dada2:::rc(FWD) REV.RC <- dada2:::rc(REV)

Trim FWD and the reverse-complement of REV off of R1 (forward reads)

R1.flags <- paste("-g", FWD, "-a", REV.RC)

Trim REV and the reverse-complement of FWD off of R2 (reverse reads)

R2.flags <- paste("-G", REV, "-A", FWD.RC)

Run Cutadapt

for (i in seq_along(fnFs)) { system2(cutadapt, args = c(R1.flags, R2.flags, "-m", 20, #based on a suggestion by benjjneb "-n", 2, # -n 2 required to remove FWD and REV from reads "-o", fnFs.cut[i], "-p", fnRs.cut[i], # output files fnFs.filtN[i], fnRs.filtN[i])) # input files }

rbind(FWD.ForwardReads = sapply(FWD.orients, primerHits, fn = fnFs.cut[[1]]),

FWD.ReverseReads = sapply(FWD.orients, primerHits, fn = fnRs.cut[[1]]),

REV.ForwardReads = sapply(REV.orients, primerHits, fn = fnFs.cut[[1]]),

REV.ReverseReads = sapply(REV.orients, primerHits, fn = fnRs.cut[[1]]))

As expected: not the results I want:

                                Forward Complement Reverse RevComp

FWD.ForwardReads 0 0 0 8723 FWD.ReverseReads 305352 0 0 0 REV.ForwardReads 204238 0 0 0 REV.ReverseReads 0 0 0 458

benjjneb commented 4 months ago

Your primer-checking results suggest that you are working with a mixed orientation library. That is, the sequenced reads are coming from an even mixture of those in the expected orientation (FWD primer on R1 and rc(REV) primer on R2), and the opposite orientation (FWD primer on R2 and rc(REV) on R2). Your cutadapt command is only trimming primers when the reads are in the expected orientation.

The easiest solution is to discard untrimmed reads at the cutadapt step. This will throw away half the reads, but also avoid a number of complications downstream. For other approaches, see this previous thread https://github.com/benjjneb/dada2/issues/1357 and there are probably other examples on the forum as well.

jesika0123 commented 4 months ago

Thank you for your reply. I have continued to edit the cutadapt parameters and settings, but still cannot get all 0s at the "sanity check" after primer removal.

I have added your suggestion of adding --discard-untrimmed, but also have added other suggested parameters.

Questions: 1) For the following script (below) primers are still present, but these are the best results so far (a full summary is below). What happens if it is not possible to get to all 0s at this step?

FWD.ForwardReads 0 0 0 0

FWD.ReverseReads 1 0 0 0

REV.ForwardReads 3 0 0 0

REV.ReverseReads 0 0 0 0

2) Is there a way not to lose a large percentage of reads at this step? Or is just the beast of this type of sequence data?

Best script and summary of results: for (i in seq_along(fnFs)) { system2(cutadapt, args = c(R1.flags, R2.flags, "-j", 0, "--nextseq-trim=20", "--match-read-wildcards", "--discard-untrimmed", "-n", 2, "--minimum-length", 214, "-e", 0, "-o", fnFs.cut[i], "-p", fnRs.cut[i], # output files fnFs.filtN[i], fnRs.filtN[i])) # input files }

Summary of testing different parameters

no minimal read length

Forward Complement Reverse RevComp

FWD.ForwardReads 0 0 0 7

FWD.ReverseReads 10 0 0 0

REV.ForwardReads 4 0 0 0

REV.ReverseReads 0 0 0 0

"--minimum-length", 214 (250 - 18 (forward primer) - 18 (reverse primer) vs 230) = same results and best results

Forward Complement Reverse RevComp

FWD.ForwardReads 0 0 0 0

FWD.ReverseReads 1 0 0 0

REV.ForwardReads 3 0 0 0

REV.ReverseReads 0 0 0 0

increasing -n from 2 to 16 did not change results

Forward Complement Reverse RevComp

FWD.ForwardReads 0 0 0 7

FWD.ReverseReads 10 0 0 0

REV.ForwardReads 4 0 0 0

REV.ReverseReads 0 0 0 0

-e from 0 to default changed results

Forward Complement Reverse RevComp

FWD.ForwardReads 0 0 0 7

FWD.ReverseReads 12 0 0 0

REV.ForwardReads 5 0 0 0

REV.ReverseReads 0 0 0 1

benjjneb commented 4 months ago

Don't worry about getting perfect zeros. The goal is to get rid of almost all the primers, and you're achieving that.

A few stray ones show up because there are minor differences in how primers are matched by cutadapt and the R code, and possibly some malformed amplicons where the primers might appear in the middle of the sequences. But these are at very low numbers, so not a concern.

jesika0123 commented 4 months ago

Thank you!

benjjneb / dada2

Unexpected results after cutadapt #1955

Function that creates a list of all orientations of the primers

Pre-filter to remove sequences with ambiguous bases (Ns)

Count the number of times the primers appear in the forward and reverse read, while considering all possible primer orientations.

Counts number of reads in which the primer is found

Create directory to hold the output from cutadapt

Save the reverse complements of the primers to variables

Trim FWD and the reverse-complement of REV off of R1 (forward reads)

Trim REV and the reverse-complement of FWD off of R2 (reverse reads)

Run Cutadapt

FWD.ForwardReads 0 0 0 0

FWD.ReverseReads 1 0 0 0

REV.ForwardReads 3 0 0 0

REV.ReverseReads 0 0 0 0

Summary of testing different parameters

no minimal read length

Forward Complement Reverse RevComp

FWD.ForwardReads 0 0 0 7

FWD.ReverseReads 10 0 0 0

REV.ForwardReads 4 0 0 0

REV.ReverseReads 0 0 0 0

"--minimum-length", 214 (250 - 18 (forward primer) - 18 (reverse primer) vs 230) = same results and best results

Forward Complement Reverse RevComp

FWD.ForwardReads 0 0 0 0

FWD.ReverseReads 1 0 0 0

REV.ForwardReads 3 0 0 0

REV.ReverseReads 0 0 0 0

increasing -n from 2 to 16 did not change results

Forward Complement Reverse RevComp

FWD.ForwardReads 0 0 0 7

FWD.ReverseReads 10 0 0 0

REV.ForwardReads 4 0 0 0

REV.ReverseReads 0 0 0 0

-e from 0 to default changed results

Forward Complement Reverse RevComp

FWD.ForwardReads 0 0 0 7

FWD.ReverseReads 12 0 0 0

REV.ForwardReads 5 0 0 0

REV.ReverseReads 0 0 0 1