benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
459 stars 142 forks source link

Is dada2 suitable for Nextera XT + viral genome ? #1529

Closed omarkr8 closed 3 months ago

omarkr8 commented 2 years ago

Hi,

I have been wrangling some metabarcoding MiSeq data using dada2 and otu clustering to help simplify the search for recombinant sequences. I have managed to run through my make-shift workflow, but am wondering if dada2 was suitable for this work in the first place.

i'll explain my experimental setup and then how i used dada2.

A cell culture is infected with a virus and an element that should result in recombinant viral genomes in some. This heterogenous culture is extracted and 2-3 amplicons (2-4kb) are generated. These amplicons are indexed using Nextera XT ( so are fragmented), and then sequenced on MiSeq (150 x 2). so results are at most 150bp reads.

my concerns are as follows. since my sample is very mixed (amplicon loci, different tagmentation fragments, possible recombinant), does that interfere with learnerrors?

Since im looking for recombinants, I found a way to extract the list of chimeras that would normally be filtered out. I can then scan this subset for genuine mutants.

does this makes sense? or am i way off..

benjjneb commented 2 years ago

This isn't the kind of data for which DADA2 is designed, and performance on it would likely range from marginal at best to completely wrong.

It sounds like you want to map your fragments on to amplicon reference sequences and go from there.

In another potential future, you could consider using long-read amplicon sequencing (e.g. PacBio HiFi or LoopSeq) combined with DADA2 to get single-nucleotide variant resolution on amplicons of that length. See our papers here: https://doi.org/10.1093/nar/gkz569 and https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-021-01072-3