Open claraqin opened 4 years ago
Hi Clara - one of the graduate students in my lab, Glade, found that one of the keys for him was setting the TruncQ parameter higher than the default. While this may not seem critical, his reasoning was that with ITS seqs there is a long, low quality tail that can cause reads to get rejected later. If you are more aggressive with trimming early that actually saves more down the line. Here are the settings he found worked best:
out <- filterAndTrim(cutFs, filtFs, cutRs, filtRs, maxN = 0, maxEE = c(2, 2), truncQ = 9, minLen = 50, rm.phix = TRUE, compress = TRUE, multithread = TRUE) Best, Kabir
On Mar 26, 2020, at 9:00 PM, Clara Qin notifications@github.com<mailto:notifications@github.com> wrote:
Currently, error tolerance is assumed to be default -e 0.1. Changing the error tolerance could allow more primer sequences to be identified (and thus removed) by cutadapt.
See https://cutadapt.readthedocs.io/en/stable/guide.html#error-tolerance
To be fair, this might not be the most consequential parameter. Error tolerance of 0.1 corresponds to maximum mismatch of 2 bases. Allowing up to 3 mismatched bases would have removed an additional 138 reverse-primer matches from the reverse reads – approximately 10% of all the matches (with max.mismatch=3) in the pre-cutadapt reverse reads.
runB69PP, first R1 sample Forward Complement Reverse RevComp FWDPrimer.R1.reads 423 0 0 3 FWDPrimer.R2.reads 26 0 0 13 REVPrimer.R1.reads 647 0 0 298 REVPrimer.R2.reads 737 0 0 4
runB69PP, first R1 sample, setting max.mismatch = 2 Forward Complement Reverse RevComp FWDPrimer.R1.reads 731 0 0 4 FWDPrimer.R2.reads 31 0 0 314 REVPrimer.R1.reads 1518 0 0 807 REVPrimer.R2.reads 1196 0 0 13
runB69PP, first R1 sample, setting max.mismatch = 3 Forward Complement Reverse RevComp FWDPrimer.R1.reads 757 0 0 4 FWDPrimer.R2.reads 31 0 0 343 REVPrimer.R1.reads 1528 0 0 944 REVPrimer.R2.reads 1342 0 0 19
runB69PP, post cutadapt (1 of 2), first R1 sample Forward Complement Reverse RevComp FWDPrimer.R1.reads 0 0 0 3 FWDPrimer.R2.reads 26 0 0 0 REVPrimer.R1.reads 646 0 0 0 REVPrimer.R2.reads 0 0 0 4
runB69PP, post cutadapt (1 of 2), first R1 sample, setting max.mismatch = 2 Forward Complement Reverse RevComp FWDPrimer.R1.reads 0 0 0 4 FWDPrimer.R2.reads 31 0 0 0 REVPrimer.R1.reads 1509 0 0 0 REVPrimer.R2.reads 0 0 0 13
runB69PP, post cutadapt (1 of 2), first R1 sample, setting max.mismatch = 3 Forward Complement Reverse RevComp FWDPrimer.R1.reads 1 0 0 4 FWDPrimer.R2.reads 31 0 0 5 REVPrimer.R1.reads 1519 0 0 132 REVPrimer.R2.reads 138 0 0 19
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/claraqin/NEON_soil_microbe_processing/issues/6, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AC5DJWWSQ6BBRMX2JX7C6QLRJQQFFANCNFSM4LUW33NQ.
Kabir Peay Associate Professor Dept. of Biology Stanford University (650) 723-0552
Hi Kabir,
If I'm not mistaken, I think you're bringing up a separate issue, so I've just opened up a new issue (#7) and tagged you in it.
Best, Clara
Just a note that while I haven't made the cutadapt error tolerance into another parameter, I did modify utils.R
in commit f3c1119
so that the hard-coded error tolerance -e
was 0.2
instead of the default 0.1
. This allows up to 4 mismatched bases instead of the default 2, which is necessary because of the low primer identification rates under the default error tolerance.
I'm actually not sure whether it would be a good idea to make this into a parameter, so I'll leave this issue open until we can reach more of a consensus.
Currently, error tolerance is assumed to be default
-e 0.1
. Changing the error tolerance could allow more primer sequences to be identified (and thus removed) bycutadapt
.See https://cutadapt.readthedocs.io/en/stable/guide.html#error-tolerance
To be fair, this might not be the most consequential parameter. Error tolerance of 0.1 corresponds to maximum mismatch of 2 bases. Allowing up to 3 mismatched bases would have removed an additional 138 reverse-primer matches from the reverse reads – approximately 10% of all the matches (with
max.mismatch=3
) in the pre-cutadapt reverse reads.