benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
459 stars 142 forks source link

Query regarding DADA2 - Metagenomics preliminary analysis #958

Closed rajavarman21 closed 4 years ago

rajavarman21 commented 4 years ago

Background: We have sequenced 10 samples of metagenomics data using illumina ampliseq. Its is 16S metagenomics panel. The data set consist of 301bp read length, paired end reads. The quality of the data is drops after 200bp.Therefore we decided to compare the reads with respect to Q20/Q25/Q30/full reads to trim the reads for merging(switching).As a result we got q25 data is merging better than all others. Sharing the parameters used(DADA2 script) as well as the resultant excel for your reference.

Key Point: Full raw data: Forward read length 301bp-Reveres read length 301bp Q20 Quality: Forward read Length 301bp- Reverse Read Length 260bp Q25 quality: Forward read length 285bp- Reverse read length 235bp Q30 quality: Forward read length 215bp- Reverse read length 190bp

Queries:

  1. When we used the raw data(full data) without any trimming the data are getting filtered in huge number as you see in table. what could be reason for it?

  2. Where seeing the comparison table we realized that Q25 quality (forward read length 285bp- reverse read length 235bp) seems to have optimal merged reads(highlighted bold). Can we consider that as optimal and take it forward for further analysis.

  3. As per Dada2 software parameters there has to be over lap of 12bp minimum between forward and reverse reads. As we have trimmed the reads as per the quality still the merging happens in q25?! can you help us by explaining the possible reason behind this. I.e. which algorithm is been used behind for such kind of situations? 10_samples_DADA2_output_comparison.xlsx Example_dada2_scripts.txt

benjjneb commented 4 years ago

The key unanswered issue here is what is the amplicon you are sequencing? It is 16S, but what 16S region and what primer set? This determined how long the expected amplicon is, and affects what appropriate truncation length parameters are (they must be long enough to overlap).

benjjneb commented 4 years ago

Please re-open this issue with updates if still looking for assistance.