DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
711 stars 271 forks source link

How can I tell if kraken2-build is "stuck"? #492

Open jessicarowell opened 3 years ago

jessicarowell commented 3 years ago

My kraken2-build command has been running for a few days (building a ginormous database), but I noticed that it's been "stuck" here since at least this morning. I expect it to count up the number of processed sequences...

It's pretty much using all 48 threads I gave it at 100% and about 295 out of 384 GB available RAM. I just wonder how I could know whether it's stuck or not, or should I expect it to stop at this place for such a long time?

Creating sequence ID to taxonomy ID map (step 1)...
Sequence ID to taxonomy ID map already present, skipping map creation.
Estimating required capacity (step 2)...
Estimated hash table requirement: 307880598672 bytes
Capacity estimation complete. [1h43m15.360s]
Building database files (step 3)...
Taxonomy parsed and converted.
CHT created with 22 bits reserved for taxid.
Processed 13391809 sequences (68077272874 bp)...

Thanks.

RaverJay commented 3 years ago

Seeing the same here.

30 threads building an nt database, with 500+ GB RAM available, has now been running for 348 hours O_O

$> nice kraken2-build --threads 30 --build --db /data/fass1/database/kraken2_nt_2021-07-14
Creating sequence ID to taxonomy ID map (step 1)...
Sequence ID to taxonomy ID map already present, skipping map creation.
Estimating required capacity (step 2)...
Estimated hash table requirement: 306927991952 bytes
Capacity estimation complete. [3h41m17.841s]
Building database files (step 3)...
Taxonomy parsed and converted.
CHT created with 22 bits reserved for taxid.
Processed 13391500 sequences (68077340435 bp)...
gitamahm commented 3 years ago

@jessicarowell , @RaverJay from previous posts, it seems that using the --fast-build option helps.

RaverJay commented 3 years ago

Thanks @gitamahm ! It got it to build now, using the --fast-build option, in about 30 hours with 40 threads on a 512 GB RAM machine

RaverJay commented 2 years ago

Well I would be glad to, but it is almost 300GB in size, so I do not know how/where to upload that.