benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
462 stars 142 forks source link

Loosing too many reads after filter and trim? #1133

Closed termithorbor closed 3 years ago

termithorbor commented 4 years ago

Dear all,

I am wondering if I might loose too many reads after filter and trim. I am using V3V4 16S primers and my expected amplicon size without primers is 427. I used cutadapt to remove my primers from my sequences and this are sample quality plots:

Forward image

Reverse image

I used the following parameters for filter and trimm:

filterAndTrim(fnFs, filtFs, fnRs, filtRs, truncLen=c(250,200), maxN=0, maxEE=c(2,2), truncQ=2, rm.phix=TRUE, compress=TRUE, multithread=TRUE)

The result is this:

reads.in reads.out 97304 58515 71739 44064

Am I loosing too much or is it still acceptable for soil samples? And what is roughly an acceptable number of reads in general passing filter and trim?

Note: I have also tried it with maxEE=c(3,5) and now I get more reads through, but is it still reasonable?

reads.in reads.out 97304 68665 71739 51593

And is it in general recommended to analyse different sample let's say soil and water samples at the same time with DADA2 as long as the same amplicon was sequenced or is it better to do seperate analyses?

Thank you very much in advance.

benjjneb commented 4 years ago

Am I loosing too much or is it still acceptable for soil samples? And what is roughly an acceptable number of reads in general passing filter and trim?

Anything above ~25% is "acceptable" for filtering and trimming (and even below that in extreme cases. Keeping more reads is nice, but lower quality reads are less useful than you might think in improving the resolution of the sampled community, so removing them is usually doing more good than bad, even if it is reducing the raw number of reads you are working with. Here you still have tens of thousands of good quality reads passing your filter -- just fine!

termithorbor commented 4 years ago

Thank you very much - so you would stick to maxEE=c(2,2) or is c(3,5) still fine?

And what about this question? And is it in general recommended to analyse different sample let's say soil and water samples at the same time with DADA2 as long as the same amplicon was sequenced or is it better to do seperate analyses?

benjjneb commented 4 years ago

Either is fine. I would just stick with maxEE=2

termithorbor commented 4 years ago

Okay and still this one - And is it in general recommended to analyse different sample let's say soil and water samples at the same time with DADA2 as long as the same amplicon was sequenced or is it better to do seperate analyses?

Thanks again and sorry if I am annoying you.

benjjneb commented 4 years ago

Samples that are processed in the same fashion (same PCR, same sequencing), and thus share common error rates should be processed together. Doesn't matter if they are from different environments, fundamentally DADA2 is modeling the error process to reveal the true sample composition.