DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
683 stars 266 forks source link

Kraken2 custom build skips high identity added fastas #833

Open annabel-dekker opened 1 month ago

annabel-dekker commented 1 month ago

Dear developers,

First and foremost, congratulations on an amazing tool. We have been attempting to implement a custom build that merges our in-house 16S seqs with NCBI. We successfully used the taxonkit to generate nodes.dmp and names.dmp, adjusted the fasta headers and built the database after adding the sequences with kraken2-build --add-to-library. I run into the problem that sometimes this works, all fastas are included, but sometimes it doesn't. I notice it failed using kraken2-inspect, and I don't see all the taxa back in the list. I have been testing different scenario's and it seems that highly identical sequences don't make the cut (let's say 16S sequences of length 1500 with ~6 mismatches). We are testing the limits of kraken2, krakenuniq and bracken in sensitively and specifically identifying correct pathogens using our sequencing techniques. It's difficult to identify whether there is a step in kraken2-build --build that skips taxa for which there is too little variation - so I'm curious if this rings a bell. Could you help me think about a solution? Or are there parameters I could still try to have them be included in the database after all?

Kind regards, Annabel Dekker