benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
459 stars 142 forks source link

Single orientation asv in mixed orientation amplicon library? #1880

Closed DollyPrrrton closed 3 months ago

DollyPrrrton commented 7 months ago

Hi Benjamin, Thanks for all your work on dada2 and this page.

I’ve come across something in a recent library that I havent been able to figure out from a few days exploring possibilities and trawling through the discussions here. I thought I would post it here and see if any one had any experience or knowledge of similar. Apologies if ive missed something very obvious. Im still getting my head round some aspects of metabarcoding and dada2.

The library is a small test library (Miseq Nano run) to check primers and barcoding effectiveness. It’s a mixed orientation amplicon library prepared by ligation. For almost all asv’s im therefore getting two asv counts, forward and reverse, at around 50/50 ratio. What’s bothering me is one of the ASV’s, which happens to be a taxa I am particularly interested in capturing diversity in, doesn’t have a reverse counterpart. Im therefore trying to figure out if the asv is a pcr or seq artifact, a processing error, or a dropout of the reverse counterpart, and whether I should remove it from the downstream analysis?

A few more details and avenues ive looked at:

Im scratching my head here. Any help or insight would be greatly appreciated. At the end of the day I can remove the asv from further analysis if I cant figure it out, but would like to understand it if possible.

Many thanks

benjjneb commented 7 months ago

That does sound strange and worthy of follow-up.

I would start by trying to nail down exactly where the observed variant and expected reverse-complement variant fail to both appear in the data.

Since the discriminating bases are in the forward-orientation R1 part of the read, it should be possible to create the observed R1 sequence, and the expected (but missing) R2 sequence from the reverse complement orientation.

Beginning at the beginning: Are those sequences present in the raw sequence data? (maybe using a sample where the observed ASV was seen at relatively high abundance).

DollyPrrrton commented 7 months ago

Hi Benjamin,

Thanks very much for the response. That was helpful. I went and checked back one of the samples raw files. The snp of interest was located at the forward orientation 5' end, so should have been present in Forw_Orientation.R1 and Rev_Orientation.R2. I checked with blast and grep -c and it was, at fairly even counts, and around the same count as the forward ASV output in the original run. (As an aside, can this approach be used as a "belts-and-braces" approach to validate ASVs of interest?)

Ive run out of time now, but will check a few other steps next. If its present in the raw Rev.orientation.R2 then it should have merged and been present as a the reverse orientation ASV. Im wondering if its been removed by removeBimeraDenovo? I'll try and find time next week to check.

Thanks again for the help!