benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
470 stars 142 forks source link

ITS workflow. Problem with primer revoval using cutadapt #2039

Open Guillermouceda opened 3 weeks ago

Guillermouceda commented 3 weeks ago

Hello I am wokring with a dataset of ITS-2 plant sequences. I am following the tutorial provided in github. However, I am having problems with the removal of mi primer sequences.

After filtering for seqs with Ns the primer count of one of my samples looks like this:

> rbind(FWD.ForwardReads = sapply(FWD.orients, primerHits, fn = fnFs.filtN[[16]]), 
+       FWD.ReverseReads = sapply(FWD.orients, primerHits, fn = fnRs.filtN[[16]]), 
+       REV.ForwardReads = sapply(REV.orients, primerHits, fn = fnFs.filtN[[16]]), 
+       REV.ReverseReads = sapply(REV.orients, primerHits, fn = fnRs.filtN[[16]]))

Forward Complement Reverse RevComp FWD.ForwardReads 213 0 0 0 FWD.ReverseReads 0 0 0 1 REV.ForwardReads 1 0 0 2 REV.ReverseReads 223 0 0 0

After using cutdapt the primer count still find some primer sequences in my reads from the same sample:

> rbind(FWD.ForwardReads = sapply(FWD.orients, primerHits, fn = fnFs.cut[[16]]), 
+       FWD.ReverseReads = sapply(FWD.orients, primerHits, fn = fnRs.cut[[16]]), 
+       REV.ForwardReads = sapply(REV.orients, primerHits, fn = fnFs.cut[[16]]), 
+       REV.ReverseReads = sapply(REV.orients, primerHits, fn = fnRs.cut[[16]]))

Forward Complement Reverse RevComp FWD.ForwardReads 0 0 0 0 FWD.ReverseReads 0 0 0 0 REV.ForwardReads 1 0 0 0 REV.ReverseReads 0 0 0 0

My cutadapt code is as follows (basically the same as in the tutorial):

# Defining where cuadapt is
cutadapt <- "C:/Users/Guill/Desktop/Dada_2_pipeline_Tereza_ITS_version/cutadapt-3.4.exe" # CHANGE ME to the cutadapt path on your machine

#Running shell commands to run Cutadapt
system2(cutadapt, args = "--version") # Run shell commands from R

# Creating a directory where the output of Cutadapt will be put in
path.cut <- file.path(path, "cutadapt")
if(!dir.exists(path.cut)) dir.create(path.cut)
fnFs.cut <- file.path(path.cut, basename(fnFs))
fnRs.cut <- file.path(path.cut, basename(fnRs))

FWD.RC <- dada2:::rc(FWD)
REV.RC <- dada2:::rc(REV)
# Trim FWD and the reverse-complement of REV off of R1 (forward reads)
R1.flags <- paste("-g", FWD, "-a", REV.RC) 
# Trim REV and the reverse-complement of FWD off of R2 (reverse reads)
R2.flags <- paste("-G", REV, "-A", FWD.RC) 
# Run Cutadapt
for(i in seq_along(fnFs)) {
  system2(cutadapt, args = c(R1.flags, R2.flags, "-n", 2, # -n 2 required to remove FWD and REV from reads
                             "-o", fnFs.cut[i], "-p", fnRs.cut[i], # output files
                             fnFs.filtN[i], fnRs.filtN[i])) # input files
}

If I understand well, there should not be any primer sequences as this can artificially increased the number of ASVs.

How can I solve this issue? Should I make the Cutadapt step more stringent by increasing the minimum overlap? If so, how should I modify the Cutadapt flags for this purpose? Moreover, when I was running Cutadapt I wa getting a warning message saying that some of my primer sequences might be incomplete. I checked that the primer sequences are correct. The primers used are the same as in Chen et al. (2010); PlosOne

Thank you very much in advance.

Guillermouceda commented 3 weeks ago

I have exploring how other samples looks like. For example this is a sample with way more reads: Before cutadapt:

Forward Complement Reverse RevComp FWD.ForwardReads 119234 0 0 0 FWD.ReverseReads 623 0 0 1 REV.ForwardReads 563 0 0 2 REV.ReverseReads 130313 0 0 0

After cutadapt:

Forward Complement Reverse RevComp FWD.ForwardReads 0 0 0 0 FWD.ReverseReads 623 0 0 0 REV.ForwardReads 563 0 0 0 REV.ReverseReads 0 0 0 0

Should I be concern about this ?

benjjneb commented 1 week ago

What it seems like is that a small number of reads are "switched", that is R2 is the forward read and R1 is the reverse read. That cutadapt command you are running is not set up to detect that scenario. My first recommendation would be to add the --discard-untrimmed parameter to the cutadapt call. I think this should throw away those ~600 "switched" reads, as the FWD primer will not be detected on R1 nor will the REV primer be detected on R2.

But test it, the interaction between various cutadapt flags can be confusing sometimes.

Guillermouceda commented 1 week ago

Hello @benjjneb

Thank you for our suggestion indeed it solved the issue.