DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
718 stars 270 forks source link

Specific k-mers no longer classified when additional sequences added to DB #541

Open meenachakra opened 2 years ago

meenachakra commented 2 years ago

We classified metagenomic reads using a custom database.

This is the Kraken2 classification for an example paired read (from the .krak file)

taxid 2161 83|86 0:42 2161:5 0:2 |:| 0:1 2161:1 0:3 2161:5 0:42

We then added additional sequences to the custom database - all in a separate domain that was not present in the original database.

This is the new Kraken2 classification of the same paired read:

taxid 49542 83|86 49542:12 0:37 |:| 49542:1 0:39 49542:12

We're not sure why specific kmers would be successfully mapped with the original database, but not mapped at all when additional sequences were added to the database.

Why, for example, are some of the last kmers in the forward read not classified in the new classification? They should at least receive the same classification as they did with the original database, correct?

Thank you for your help!

jenniferlu717 commented 2 years ago

How did you add the additional sequences? Did you rebuild after adding the sequences?