DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
705 stars 270 forks source link

Classified taxid not reported in LCA mappings #578

Open edseabolt opened 2 years ago

edseabolt commented 2 years ago

Hello,

I have a custom Kraken2 database that I have created from domain amino acid sequence (via --protein option) and classifying microbiome data using the translated search feature. I'm trying to understand the outputs generated in the report. Example 1, gives an example where Kraken2 states the classified taxid is 1196, however, the LCA mappings do not contain it. Example 2, has the taxid 14809 reported as the classified taxid and is reported in the LCA mappings. What is the reason for this difference and would you expect the classified taxid to always appear in the LCA mappings?

Example 1: C SRR848953.2152 f6a34686ee3c4736acf31eab955e0216 (taxid 1196) 101|100 0:24 -:- 0:24 -:- 0:15 11237:1 0:2 5528:1 0:5 -:- 0:17 5831:1 0:3 629:1 0:2 -:- 0:5 14134:2 0:16 5730:1 -:- 0:24 |:| 0:24 -:- 0:23 -:- 0:1 1251:1 0:2 15367:1 0:19 -:- 0:24 -:- 0:23 -:- 0:24

Example 2: C SRR848953.15259 c81e4e8e77428add37061f300dce19b3 (taxid 14809) 100|98 0:24 -:- 0:23 -:- 0:24 -:- 0:24 -:- 0:23 -:- 0:15 125:1 0:1 15296:1 0:6 |:| 0:11 626:1 0:8 1284:1 0:2 -:- 0:1 74:1 0:1 6052:1 7377:1 0:12 8888:1 0:5 -:- 0:23 -:- 6391:1 0:16 15327:1 0:5 -:- 0:23 -:- 14809:2 0:6 1044:1 0:1 10328:1 6148:1 0:2 951:1 0:8

Regards,

Ed

bextra commented 2 years ago

+1 to follow this as well. @DerrickWood any suggestions on what we should look into further?

jenniferlu717 commented 2 years ago

This would happen if two leaves have identical scores, so the algorithm will choose their LCA as the classification as it cannot distinguish between the two leaves.

bextra commented 2 years ago

I see. Thank you for clarifying.