benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
469 stars 142 forks source link

issue with amplicon directionality #1396

Closed HansCastorp77 closed 4 months ago

HansCastorp77 commented 3 years ago

Hi @benjjneb,

I've been processing some old amplicon sequencing sets with dada2 recently with no trouble (I'm a big fan) . However the last set of paired reads seems to have a mix of primer orientations across both R1 and R2. e.g the 5' end of R1 reads are roughly 50% fwd primer and 50% rev primer (same for R2). I'm wondering if this will pose a problem for dada2 analysis? Or am I better served separating fwd -> rev and rev -> fwd reads (e.g. based on 5' primer using cutadapt) and running them through dada2 separately? (Is there a way to reverse complement the rev -> fwd sequence table to allow combination with the fwd -> rev table before chimera removal?)

Very best, David

benjjneb commented 3 years ago

Basically there's two important things: The primers need to be properly removed in both orientations, and you want the orientations to merge together at the end (rather than having a forward and rev-comp version of each ASV).

Typically I have dealt with this by separating out the orientations at the start (e.g. with cutadapt), processing each orientation separately, then combining them into a common sequence table afterwards. This can be done by simply reverse-complementing the sequences in the sequence-table corresponding to the RC orientation, e.g.:

seqtab.rc.fixed <- seqtab.rc
colnames(seqtab.rc.fixed) <- dada2::rc(colnames(seqtab.rc))

Then you can merge that sequence table with the one from the forward orientation data:

seqtab <- mergeSequenceTables(seqtab.fwd, seqtab.rc.fixed, repeats="sum")
# the repeats flag described what to do if the same sample is found in different sequence tables
HansCastorp77 commented 3 years ago

Thanks for getting back to me on this - I ended up using:

colnames(seqtabrev) <- as.character(rc(DNAStringSet(colnames(seqtabrev))))

and then merged this with the forward orientation as mentioned. This seems to have done the trick

Thank you for producing such a good pipeline.