DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
713 stars 271 forks source link

Massive changes in taxonomic assignment when using different versions of kraken2 #500

Open JeremyCourtin opened 3 years ago

JeremyCourtin commented 3 years ago

Hello,

I am using kraken2 to assign sedaDNA reads to a custom database (made of mitochondrion and chloroplast genomes). I had to redo some sequencing and my project quite go running through time which made me use different kraken2 versions. This is what brings me to noticing a massive difference in the taxonomic assignment between kraken2 2.0.8 and 2.1.1 versions.

I used similar parameters and a confidence of 0.05 (no standard confidence threshold) on the same dataset vs the same database and got such results:

Number of vertebrates assigned: Kraken2 version | 2.15m | 2.3m | 25.8m | 47.6m | 49.5m | extblank | libblank 2.0.8 | 18604 | 17636 | 15158 | 16229 | 18569 | 48 | 103 2.1.1 | 537 | 753 | 493 | 556 | 752 | 4 | 0

In addition to this massive difference in reads assigned, I have quite different taxonomic assignments and more noise with the 2.0.8 version.

I checked the changelogs to find a potential explanation that would explain the difference but could not find anything when using a fixed threshold...

Do you have an idea why such a difference and how could I explain it? Or did I do something wrong? I would be open to share .kraken and. report via mail if needed but this impact greatly the composition of my samples and make me wonder which version to trust or how much can we rely on studies using kraken 2.0.8 or 2.1.1 as hey both give such different results.

Thank you for the tool and your help.

Best regards,

Jeremy Courtin

jenniferlu717 commented 3 years ago

I'm not certain when exactly we incorporated it but we added in a new parameter to greatly reduce any false positives and increase confidence in the kraken2 results (--minimum-hit-groups) which requires more minimizers to match consecutively to call it. So you may have more unclassified reads but you will have greater confidence in what kraken2 calls in the later versions.