Retaining 50% of the reads. Is it fine to assign taxonomy?

benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution

http://benjjneb.github.io/dada2/

GNU Lesser General Public License v3.0

459 stars 142 forks source link

Retaining 50% of the reads. Is it fine to assign taxonomy? #891

Closed sandipansamaddar closed 4 years ago

sandipansamaddar commented 4 years ago

Hello, I am using dada2 for analyzing 16S amplicon data. Raw sequences were obtained by sequencing the V4-V5 region of the 16S using primer 515F/926R on a 2X300 bp Miseq platform. I didn't encounter an error following the tutorial but from my input to chimera identification it shows that I retained in average 50% of the reads. Is it fine in this step to proceed forward for assigning taxonomy or I should re-analyze by changing some parameters. truncLen option was used in fliterandTrim using c(250,180). Primers were removed beforehand using cutadapt.

I am attaching a table to show how it went in every step.

Summarytable_dada2.docx

Thank you for this wonderful pipeline.

benjjneb commented 4 years ago

The ones step I would be mildly concerned about is merging, where you are losing a larger fraction of reads than I would like. I would try two things to see if it helps there: Increase truncLen slightly (maybe +5-10nts) and see if that increases the fraction merging, and try dada(..., pool="pseudo") to see if pseudo-pooling increases the fraction that merge.

If neither step increases the fraction merging noticeably then I would just go with what you have.

sandipansamaddar commented 4 years ago

Thanks for the prompt response. I used pseudo in my commands so I am fine with that. dada_forward<- dada(derep_forward, err=err_forward_reads, pool="pseudo", multithread =TRUE) dada_reverse<- dada(derep_reverse, err=err_reverse_reads, pool="pseudo", multithread =TRUE)

But as you suggested I will try to use trunclen with c(260,190) and try to see what happens. Will keep you updated. Thank you again.

Best,

Sandipan