benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
460 stars 142 forks source link

Removing chimeras from Sequel full-length 16S rRNA data using DADA2 #1300

Closed yuharasatoshi closed 3 years ago

yuharasatoshi commented 3 years ago

We obtained full-length 16S rRNA sequencing data of ZymoBIOMICS Microbial Community Standards using Sequel according to the manufacturer's instructions. https://www.pacb.com/wp-content/uploads/Procedure-Checklist-%E2%80%93-Amplification-of-Full-Length-16S-Gene-with-Barcoded-Primers-for-Multiplexed-SMRTbell-Library-Preparation-and-Sequencing.pdf

And we are now analyzing the data according to your paper. https://benjjneb.github.io/LRASManuscript/LRASms_Zymo.html

However, the majority of the data was lost after removing chimeras.

ccs primers filtered denoised [1,] 222830 197310 189307 185756

bim <- isBimeraDenovo(dd, minFoldParentOverAbundance=3.5) table(bim)

bim FALSE TRUE 34 162

Any advice or comments would be appreciated.

benjjneb commented 3 years ago

What fraction of the reads were lost?

sum(dd$denoised[bim])/sum(dd$denoised)
yuharasatoshi commented 3 years ago

Thank you for your reply.

sum(dd$denoised[bim])/sum(dd$denoised) 0.03910054

benjjneb commented 3 years ago

4% of reads being identified as chimeras is totally normal. Not a cause for concern.

yuharasatoshi commented 3 years ago

Thank you for your comment.

According to the analysis log, 196 variants were obtained after denoizing.

189,307 reads in 38,141 unique seqs. 196 seq variants were inferred from 38,141 input unique seqs.

165 out of 196 variants were removed after removeBimeraDenovo. Is it also normal?

benjjneb commented 3 years ago

Many ASVs but few reads being chimeric is totally normal, especially in low diversity samples like a mock community.

Lots of very low abundance chimeras can be produced by PCR. That's what you are seeing.