DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
714 stars 271 forks source link

Why does Kraken assign a read to NCBI tax ID 1 rather than 'unclassified'. #762

Closed CatInTheLab closed 11 months ago

CatInTheLab commented 1 year ago

I have contigs in my sample that kraken2 assigns directly to the NCBI tax ID of 1. This tax ID is the following (from NCBI): This is the top level of the taxonomy database maintained by NCBI/GenBank. I was wondering why contigs get assigned this tax ID, rather than being assigned as 'Unclassified'?. Aka, what is the difference between a sequenced that is assigned to tax ID 1, and sequences that are unclassified. Thankyou.

salzberg commented 1 year ago

Kraken uses the lowest common ancestor (LCA) to assign taxonomy IDs to k-mers. If a k-mer is in 2 widely divergent things, such as vector (synthetic construct) and E. coli, then it gets assigned the ID 1. Then when classifying contigs/reads, it looks at all the k-mers that it recognized and basically votes - although it's more complicated than that. If all the k-mers that it recognized are "1", then the contig is assigned "1". That's probably what happened in your case.