DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
686 stars 267 forks source link

Classification of reads as higher taxon (family) but not as the lower taxon (genus/species) #681

Open danarte opened 1 year ago

danarte commented 1 year ago

Hi, I'm classifying samples with a database I created from combining several refseq databases (including the mammalian database). I notice that in many instances I have many reads assigned to a higher order but few reads assigned to lower orders. For example here is a part of the report file: 0.01 71 5 C4 314145 Laurasiatheria 0.01 66 17 O 33554 Carnivora 0.01 48 12 O1 379583 Feliformia 0.01 36 33 F 9681 Felidae 0.00 3 2 F1 338152 Felinae 0.00 1 0 G 9682 Felis 0.00 1 1 S 9685 Felis catus 0.00 1 0 O1 379584 Caniformia 0.00 1 1 F 9608 Canidae

So here I see 36 reads assigned to Felidae Family, but only 1 read is assigned to a specific species under that family.

Is there any reading resource available on that topic? (I didn't find anything to read about it, but perhaps I didn't formulate my question correctly) What does that mean and how do reads get aligned to a high taxon rather than a specific species?