benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
470 stars 142 forks source link

High number of bimeras/ messy error plots? #1996

Open brittneybrowning opened 2 months ago

brittneybrowning commented 2 months ago

Hi there,

I am getting messy error plots for the data I am currently working with. At least messier than the data I have worked with before. I have tried to change around the parameters quite a few times (trunclen, trimleft, maxEE, nbases) but the plots look relatively the same. I think the end numbers look okay (pasted below) but there were 72,670 bimeras out of 81,257 input sequences identified which seems quite high, right? Is this a cause for concern?

image

image

image

sum(seqtab.nochim)/sum(seqtab) = 0.7819456

Input filtered denoisedF denoisedR merged nonchim Y0171.2 294860 269403 267242 266813 247298 198713 Y0358.2 294411 268469 265398 264269 238815 197863 Y0420.2 313627 284503 282315 281946 265204 211598 Y0478 826820 745033 740476 740225 699440 543107 Y0479 224503 194656 194116 194161 188096 166462 Y0480 396939 358467 356391 355846 337321 266075

Thanks in advance for any thoughts!

benjjneb commented 2 months ago

Your error models look fine. The slightly funny "hook" at the end is due to the binned quality scores of your data. A long thread on that is here: https://github.com/benjjneb/dada2/issues/1307. If you want to force a cleaner looking error model, option 4 in this comment has been used previously: https://github.com/benjjneb/dada2/issues/1307#issuecomment-957680971

there were 72,670 bimeras out of 81,257 input sequences identified which seems quite high, right?

It is not uncommon that a majority of ASVs are identified as chimeric. Chimeras are very diverse (consider how many possibilities there are of combining different pairs of real ASVs) and are typically lower abundance. So the preferred check is on the number of reads that are removed as chimeric.

sum(seqtab.nochim)/sum(seqtab) = 0.7819456

This is high (22% chimeras) but in the range of what we see in real data. It is not necessarily a cause for concern, but I would double-check that my primers have been removed, and consider altering my PCR parameters in future experiments (fewer cycles, long elongation times) to cut down on chimera creation.

brittneybrowning commented 2 months ago

This is so helpful!! Thank you very much :)