DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
714 stars 271 forks source link

Question: Matching process #792

Closed tramelliwe closed 2 months ago

tramelliwe commented 8 months ago

Hey Thank you for this powerful tool!

I've got a question relating to the k-mer matching process. From what I understand, this only relies on exact matching, as opposed to popular alignment tools that do accept mismatches. Therefore I'm wondering why, when I start with an input of reads that were unsuccessfully aligned to the human genome, I still have 96% of the reads that are matched to the human genome when using Kraken2? I would expect that, if there was indeed an exact match of my read with the human genome, then the alignment software would have successfully aligned it. For info, my reads are ~200bp-long, come from ONT sequencing, and have been aligned to the human genome using Minimap2. Any help on this would be appreciated!

salzberg commented 8 months ago

The matches reported by KrakenUniq require only a single 31-bp match. Kraken2 requires a similar match, and just 1 k-mer is enough. However, that's not enough for the default settings of most aligners such as Minimap2 or Bowtie2. Those aligners won't report a match to human if all that matches in a 200bp read is a single 31-mer. For Bowtie2, you can just its settings and report much shorter matches if you want it to.

tramelliwe commented 2 months ago

Thank you that makes sense!