benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
460 stars 141 forks source link

Question about using DADA2 for functional genes #492

Closed jonathanylin closed 6 years ago

jonathanylin commented 6 years ago

I am analyzing a functional gene amplicon sequence dataset, and I am wondering if you could answer some of my questions below?

I did some preliminary processing using DADA2 on my functional gene dataset, and I ended up with thousands of ESVs. This number was actually higher than the number of OTUs I obtained from 97% de novo OTU picking on the same dataset. I am wondering if you could comment on the efficacy of using DADA2 for functional genes (higher error rates, etc.). I would greatly appreciate any advice that you may have on this!

benjjneb commented 6 years ago

I'm curious whether DADA2 is an appropriate pipeline to process functional genes to infer ESVs.

Absolutely. Outside of assignTaxonomy, there is nothing 16S-specific about the rest of the DADA2 pipeline. It is a tool for amplicon sequencing data generally.

I did some preliminary processing using DADA2 on my functional gene dataset, and I ended up with thousands of ESVs. This number was actually higher than the number of OTUs I obtained from 97% de novo OTU picking on the same dataset. I am wondering if you could comment on the efficacy of using DADA2 for functional genes (higher error rates, etc.). I would greatly appreciate any advice that you may have on this!

To me that is not unexpected. Resolving to the single-nucleotide level will reveal variation that is missed when binning sequences by similarity. Using DADA2 on functional genes should be fine. Recall, the core idea of DADA2 is to model the error rates, and the error rates produced by PCR/sequencing don't depend on whether the sequenced locus is ribosomal or not.

Do you know of any papers/studies that used DADA2 to process functional gene datasets? I have not been able to find much in the literature regarding this. If you know of any, would you be willing to forward them to me?

I don't know if these are functional really, but some non-16S uses in the literature:

COI in fish: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389620/ Tumor barcodes: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5495136/ Fungal ITS: https://nph.onlinelibrary.wiley.com/doi/full/10.1111/nph.15035

jonathanylin commented 6 years ago

@benjjneb Thank you so much for your help!

LaraMacheriotou commented 5 years ago

Hey guys,

I am also having some issues with CO1 ASVs from Dada2 containing stop codons (SC). I checked the CO1 fish paper mentioned above, methods&materials does not mention a step for removing SCs, I opened their raw ASV file (S2file, vertebrate mitochondrial table to translate into amino acids) and this contains many stop codons so perhaps the ASV numbers should be reconsidered!

bioinfonext commented 5 years ago

Hi LaraMacheriotou,

could able to optimize the workflow for functional gene amplicon analysis? and share how to proceed with the functional gene amplicon analysis?