benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
469 stars 142 forks source link

Final amplicon size for illumina v3v4 16S #1749

Closed otaviolovison closed 4 months ago

otaviolovison commented 1 year ago

Hello!

I am having difficulties to explain the expected and the true final amplicon size (after preprocessing). I am reading some forum posts in which @benjjneb suggests that v3v4 amplicon size should be around 444 and 464 bp. That's the "after preprocessing" final size? Let's peform some calculations:

NGS performed with a v3 kit. Then we have 300 + 300 read's size. Based on my quality profile I performed a trunc on 280 and 230. That gives me a 510, with a very good margin to merge. Removing primers (27, 31 - primer length + 10 as suggested) I loose 58 bp, and I loose another 12 bp for overlap.. that should give me a 440 amplicon size.

In practice, I am having sequences ranging from 381 to 409 bp. So, I believe (please, correct me if I'm wrong) that the 444-464 amplicon size mentioned is the theoretical expected size, without preprocessing.. when I remove 58 bp from primers and 12 from overlap, I have a ~ 394 bp sequence size, which matches with my range (381 to 409).

Please, tell me if I am doing something wrong here. Thanks.

benjjneb commented 1 year ago

The expected amplicon lengths are based on specific V3V4 primer set and library setup (the "Illumina" v3v4 protocol), before trimming. Depending on your specific v3v4 primers, this may vary some.

It's not clear to me how you chose 27/31 for your primer lengths. Those are the lengths of the sequenced primers (plus sequenced padding) at the start of the forward/reverse reads respectively?

And the size of the overlap is variable. 12 is a minimum value that mergePairs requires by default to merge the reads together. But any overlap bigger than that also will pass mergePairs.

My guess is that you are using the "Ilumina" v3v4 setup, which starts at ~440=460 before trimming. Then you have trimmed off ~60 nts by trimLeft and so have ~380-400 post-trimming. If that is true, I'd just update this to use the correct primer lengths as trimLeft parameters (I think it is c(17,21)?).

otaviolovison commented 1 year ago

Yes, you are right!

About the 'extra 10' trimming with primers: in your paper 'Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses' you say: "We also choose to trim the first 10 nucleotides of each read based on empirical observations across many Illumina datasets that these base positions are particularly likely to contain pathological errors."

That's why I am trimming an extra 10 bases.

benjjneb commented 1 year ago

We also choose to trim the first 10 nucleotides of each read based on empirical observations across many Illumina datasets that these base positions are particularly likely to contain pathological errors.

This is no longer a recommendation. I would advise against trimming the extra 10 nts here.

otaviolovison commented 1 year ago

Ok, thanks!

MSc. Otávio von Ameln Lovison CRF/RS 12363 Farmacêutico bioquímico Especialista em Citologia Clínica Especialista em Microbiologia Clínica Mestre em Ciências Farmacêuticas (CAPES 7) pela Universidade Federal do Rio Grande do Sul (PPGCF/UFRGS) *Doutorando *em Ciências Farmacêuticas (CAPES 7) pela Universidade Federal do Rio Grande do Sul (PPGCF/UFRGS) Instituto Nacional de Pesquisa em Resistência Antimicrobiana - INPRA

Laboratório de Pesquisa em Resistência Bacteriana - LABRESIS Laboratório de Microbiologia e Saúde Única - ICBS/UFRGS Núcleo de Bioinformática (Bioinformatics Core) do Hospital de Clínicas de Porto Alegre

Em ter., 6 de jun. de 2023 às 17:31, Benjamin Callahan < @.***> escreveu:

We also choose to trim the first 10 nucleotides of each read based on empirical observations across many Illumina datasets that these base positions are particularly likely to contain pathological errors.

This is no longer a recommendation. I would advise against trimming the extra 10 nts here.

— Reply to this email directly, view it on GitHub https://github.com/benjjneb/dada2/issues/1749#issuecomment-1579406071, or unsubscribe https://github.com/notifications/unsubscribe-auth/AL3LUXXA4XMDOCA2X3ECAX3XJ6HTTANCNFSM6AAAAAAY4QG73E . You are receiving this because you authored the thread.Message ID: @.***>