benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
459 stars 142 forks source link

DADA2 Not recovering known community members in mock community samples #1005

Closed skelto3 closed 4 years ago

skelto3 commented 4 years ago

Hello,

I constructed simple mock communities comprised of short synthetic genes that vary only at a 6 bp region in the middle, combined in various known concentrations (example sequences of one such mock community pasted below). In every try so far, at least one of the known mock community members is not recovered after denoising, despite an abundance of perfect matches being present in the raw reads. I have included the known sequences as priors (forward and reverse compliments prior to merging), used pooling and no pooling, and tried selfconsist = T and F, each to no avail. In the below example mock communtiy, based on the known concentrations of the differnt variants going into the mock, it appears that the first and second sequences are being assigned to the same ASV, which is given the same sequence as the second mock member, and thus I recover zero perfect matches for the first mock member. This is particularly puzzling because the first mock member comprisies ~a third of the raw reads in some samples.

Is there anything else I can try to get DADA2 to descriminate among these similar sequences?

thank you.

Pmb.F.priors <- c("AGCTATTCTATTCCTAAATAATACATCCAACACTCCAACACTATTATTCCTAGCAACC", "AGCTATTCTATTCCTAAATAATACTCTCAACACTCCAACACTATTATTCCTAGCAACC", "AGCTATTCTATTCCTAAATAATAAGAGCAACACTCCAACACTATTATTCCTAGCAACC", "AGCTATTCTATTCCTAAATAATAATGACAACACTCCAACACTATTATTCCTAGCAACC", "AGCTATTCTATTCCTAAATAATATACACAACACTCCAACACTATTATTCCTAGCAACC")

benjjneb commented 4 years ago

To clarify, the sequences linked here are the mock community sequences you are trying to recover? And, is it just these sequences being denoised, or are they part of a long sequenced region?

Also, "first" and "second" in your text, corresponds to the 1st and 2nd sequence in Pmb.F.priors?

skelto3 commented 4 years ago

Yes, the sequences listed are those that I am trying to recover, and they should be the only sequences present in the samples. These sequences are the complete amplicon (after removing primers), they are not part of a longer sequenced region. I promise there are good reasons for why I am metabarcoding such a tiny region that I realize are not obvious. Yes, first and second correspond to the order in the Pmb.F.priors vector.

On Sat, May 9, 2020 at 4:17 PM Benjamin Callahan notifications@github.com wrote:

To clarify, the sequences linked here are the mock community sequences you are trying to recover? And, is it just these sequences being denoised, or are they part of a long sequenced region?

Also, "first" and "second" in your text, corresponds to the 1st and 2nd sequence in Pmb.F.priors?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/benjjneb/dada2/issues/1005#issuecomment-626229954, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTPX4GTKFPKXS4PSIJJG4DRQW24TANCNFSM4M4OKNWQ .

-- James Skelton Community Ecologist

webpage: poetsworm.com

email: skelto3@g skelto3@vt.edumail.com

benjjneb commented 4 years ago

That is... strange. When I use the dada2 alignment from within the R package, these sequences are all clearly distinguished from one another so what is going on?

unname(outer(Pmb.F.priors, Pmb.F.priors, nwhamming, vec=TRUE))

What version of the dada2 R package are you using? Could you share an example fastq file with me?

skelto3 commented 4 years ago

Using v‘1.14.0’ Would be willing to share a fastq privately. How may I do so?

On Mon, May 11, 2020 at 10:32 AM Benjamin Callahan notifications@github.com wrote:

That is... strange. When I use the dada2 alignment from within the R package, these sequences are all clearly distinguished from one another so what is going on?

unname(outer(Pmb.F.priors, Pmb.F.priors, nwhamming, vec=TRUE))

What version of the dada2 R package are you using? Could you share an example fastq file with me?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/benjjneb/dada2/issues/1005#issuecomment-626740137, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTPX4FTOTBYWE6DBKRRT63RRAD6RANCNFSM4M4OKNWQ .

-- James Skelton Community Ecologist

webpage: poetsworm.com

email: skelto3@g skelto3@vt.edumail.com

benjjneb commented 4 years ago

You can email me: benjamin DOT j DOT callahan AT gmail DOT com

benjjneb commented 4 years ago

Did we get this figured out over email?

skelto3 commented 4 years ago

Yes. Changing gap_penalty to 20 resolved the issue. Thank you for checking back.

On Thu, Jul 16, 2020, 4:27 PM Benjamin Callahan notifications@github.com wrote:

Did we get this figured out over email?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/benjjneb/dada2/issues/1005#issuecomment-659652904, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTPX4BR7NLN45U7DAS6O7LR35PBXANCNFSM4M4OKNWQ .