DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
714 stars 271 forks source link

kraken2-build doesn't work when hash table is too large (v 2.1.2) #694

Closed ericproffitt closed 1 year ago

ericproffitt commented 1 year ago

When the estimated size of the hash table is larger than ~2GB, there is some silent error that occurs which causes the hash.k2d file to remain empty (32 bytes) once the kraken2-build --build command completes.

For example, with the default fungi database, the following code

mkdir kraken2-db
kraken2-build --download-taxonomy --db kraken2-db
kraken2-build --download-library fungi --db kraken2-db
kraken2-build --build --db kraken2-db

results in a hash.k2d file that is only 32 bytes (basically empty), however there is no error raised during the build process. The taxo.k2d file may also be misformatted, but it's more difficult to tell.

If I were to instead run this code with archaea or viral, the hash file would be generated correctly, presumably because archaea and viral are smaller databases.

I'm on an M1 MacBook Pro with 64GB of RAM and a 2TB hard drive, so I should have enough memory to build the fungi database.

ericproffitt commented 1 year ago

For anyone else who's having this problem, originally I was using Kraken 2 built from source, and I was able to resolve it by instead using the version installed via anaconda3. I have no idea why this problem only occurs when building from source, as they're ostensibly the same version.