First and foremost, congratulations on an amazing tool.
We have been attempting to implement a custom build that merges our in-house 16S seqs with NCBI.
We successfully used the taxonkit to generate nodes.dmp and names.dmp, adjusted the fasta headers and built the database after adding the sequences with kraken2-build --add-to-library. I run into the problem that sometimes this works, all fastas are included, but sometimes it doesn't. I notice it failed using kraken2-inspect, and I don't see all the taxa back in the list.
I have been testing different scenario's and it seems that highly identical sequences don't make the cut (let's say 16S sequences of length 1500 with ~6 mismatches). We are testing the limits of kraken2, krakenuniq and bracken in sensitively and specifically identifying correct pathogens using our sequencing techniques.
It's difficult to identify whether there is a step in kraken2-build --build that skips taxa for which there is too little variation - so I'm curious if this rings a bell. Could you help me think about a solution? Or are there parameters I could still try to have them be included in the database after all?
Dear developers,
First and foremost, congratulations on an amazing tool. We have been attempting to implement a custom build that merges our in-house 16S seqs with NCBI. We successfully used the taxonkit to generate nodes.dmp and names.dmp, adjusted the fasta headers and built the database after adding the sequences with kraken2-build --add-to-library. I run into the problem that sometimes this works, all fastas are included, but sometimes it doesn't. I notice it failed using kraken2-inspect, and I don't see all the taxa back in the list. I have been testing different scenario's and it seems that highly identical sequences don't make the cut (let's say 16S sequences of length 1500 with ~6 mismatches). We are testing the limits of kraken2, krakenuniq and bracken in sensitively and specifically identifying correct pathogens using our sequencing techniques. It's difficult to identify whether there is a step in kraken2-build --build that skips taxa for which there is too little variation - so I'm curious if this rings a bell. Could you help me think about a solution? Or are there parameters I could still try to have them be included in the database after all?
Kind regards, Annabel Dekker