DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
729 stars 273 forks source link

build_db: compact hash table capacity exceeded #321

Open mw55309 opened 4 years ago

mw55309 commented 4 years ago

Hello

I am running into this error, I googled it and no-one else seems to have run into it, so I thought I would raise an issue.

I am building a very small test database, but crucially I have edited (i.e. added) additional rows to the names.dmp and nodes.dmp file, because I want to use the NCBI taxonomy, but add some custom species to it.

The command is simply:

kraken2-build --threads 16 --build --db test

The output is:

Creating sequence ID to taxonomy ID map (step 1)...
Sequence ID to taxonomy ID map already present, skipping map creation.
Estimating required capacity (step 2)...
Estimated hash table requirement: 10240 bytes
Capacity estimation complete. [0.017s]
Building database files (step 3)...
Taxonomy parsed and converted.
CHT created with 4 bits reserved for taxid.
build_db: compact hash table capacity exceeded

Within test both seqid2taxid.map and taxo.k2d.tmp have at least begun to be created. Within test/taxonomy then prelim_map.txt has been created

Of course I have no idea if the error message is related to my editing of nodes.dmp and names.dmp!

I am building on a server with 16x16Gb configuration.

Kraken2 installed from bioconda

Cheers Mick

mw55309 commented 4 years ago

OK I solved this - basically my database was too small

I was building a test database with 3 tiny genomes. It looks like the estimator comes up with an unrealistically small amount of memory to store the taxid:

CHT created with 4 bits reserved for taxid.

When I added more and bigger genomes, the database built with no problems.

You probably need to set a minimum for the number of bits reserved for taxid :-D