claraqin / neonMicrobe

Processing NEON soil microbe marker gene sequence data into ASV tables.
GNU Lesser General Public License v3.0
9 stars 4 forks source link

DADA2::filterAndTrim removes majority of ITS reads #7

Closed claraqin closed 4 years ago

claraqin commented 4 years ago

The filterAndTrim step of the DADA2 processing pipeline removes a majority of ITS reads from most samples in most sequencing runs. For example, after filterAndTrim with the default parameters (maxN = 0, maxEE = c(2, 2), truncQ = 2, minLen = 50), the median percentage of reads remaining in a sample from sequencing run B69PP is only 16%.

@mykophile commented:

Hi Clara - one of the graduate students in my lab, Glade, found that one of the keys for him was setting the TruncQ parameter higher than the default. While this may not seem critical, his reasoning was that with ITS seqs there is a long, low quality tail that can cause reads to get rejected later. If you are more aggressive with trimming early that actually saves more down the line. Here are the settings he found worked best:

out <- filterAndTrim(cutFs, filtFs, cutRs, filtRs, maxN = 0, maxEE = c(2, 2), truncQ = 9, minLen = 50, rm.phix = TRUE, compress = TRUE, multithread = TRUE) Best, Kabir

claraqin commented 4 years ago

Following @mykophile 's advice, I was able to increase the proportion of reads remaining from run B69PP to over 50% (median). This required that I also relax the minLen argument such that minLen=20. Is this too lenient?

filterAndTrim_params_test

mykophile commented 4 years ago

Hi Clara - I think minLen 20 is too low. There are some yeasts with very small ITS sequences but my experience is that 20 bp is not enough to be useful in downstream steps, like OTU calling or taxonomic assignment. I think I would more comfortable with minLen = 50 or 100, but if most of the seqs are only 50 or 100 bp, then I think that is going to be problematic and I recommend rather keeping a lower fraction of higher quality sequences. I am playing around with some other filter and trim approaches on BMI Plate 3 and will let you know how that goes.

On Mar 28, 2020, at 8:02 PM, Clara Qin notifications@github.com<mailto:notifications@github.com> wrote:

Following @mykophilehttps://github.com/mykophile 's advice, I was able to increase the proportion of reads remaining from run B69PP to over 50% (median). This required that I also relax the minLen argument such that minLen=20. Is this too lenient?

[filterAndTrim_params_test]https://user-images.githubusercontent.com/12421420/77839117-e30dfb00-712e-11ea-8057-ac2cbb2e42eb.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/claraqin/NEON_soil_microbe_processing/issues/7#issuecomment-605551044, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AC5DJWVTGJZQ6DNZR6IJP7LRJ225XANCNFSM4LVZYJ3A.

Kabir Peay Associate Professor Dept. of Biology Stanford University (650) 723-0552

claraqin commented 4 years ago

Closing this issue because we found out during the sensitivity analysis that we can increase the maxEE parameter(s) without negative consequences to read merging, taxonomy assignment, etc. https://people.ucsc.edu/~claraqin/test_dada2_params_plots.html

maxEE can also be varied across sequencing runs.

maxEE = 8 is a reasonable parameter to use for most NEON sequencing runs.