Closed CatInTheLab closed 11 months ago
Kraken uses the lowest common ancestor (LCA) to assign taxonomy IDs to k-mers. If a k-mer is in 2 widely divergent things, such as vector (synthetic construct) and E. coli, then it gets assigned the ID 1. Then when classifying contigs/reads, it looks at all the k-mers that it recognized and basically votes - although it's more complicated than that. If all the k-mers that it recognized are "1", then the contig is assigned "1". That's probably what happened in your case.
I have contigs in my sample that kraken2 assigns directly to the NCBI tax ID of 1. This tax ID is the following (from NCBI): This is the top level of the taxonomy database maintained by NCBI/GenBank. I was wondering why contigs get assigned this tax ID, rather than being assigned as 'Unclassified'?. Aka, what is the difference between a sequenced that is assigned to tax ID 1, and sequences that are unclassified. Thankyou.