DaehwanKimLab / centrifuge

Classifier for metagenomic sequences
GNU General Public License v3.0
235 stars 73 forks source link

Bus error (core dumped) while building database #215

Closed vscmarques closed 2 years ago

vscmarques commented 2 years ago

Hello everyone,

I am working on building a database for centrifuge but have been encountering various issues. Firstly, the taxid files were discontinued; I searched for new files and downloaded from here the nucl_gb.accession2taxid.gz and nucl_wgs.accession2taxid.gz files. Merged them into a new file named gi_taxid_nucl.map and proceeded with the database build.

This was the first error I got:

Calculating joined length Writing header Reserving space for joined string Could not allocate space for a joined string of 441911838016 elements. Please try running centrifuge-build on a computer with more memory.

Now, allocating 200GB of RAM and 16 cores to the process, I get this error:

Calculating joined length Writing header Reserving space for joined string Joining reference sequences /var/spool/slurmd/job18134077/slurm_script: line 29: 60366 Bus error (core dumped) centrifuge-build -p 16 --bmax 1342177280 --conversion-table gi_taxid_nucl.map --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp nt.fa nt

I googled and searched around here and still could not figure what this could possibly be. Anyone with the same problem? Anyone with any idea what it could be?

Thank you in advance for any help you can provide.

mourisl commented 2 years ago

What is the total size of your sequences? Is it 441G (seems too large)? Thanks.

vscmarques commented 2 years ago

The nt.fa file is sized 468G... The gi_taxid_nucl.map is 35G. Are the files I downloaded instead of the old mapping file too big? I cannot find a replacement for it except for these ones though.

Thank you for the reply!

mourisl commented 2 years ago

The nt file is very large, you may need around 1.5T memory for this... (The simple loading of the sequences would take 468G memory)

vscmarques commented 2 years ago

Makes sense... But this is the file I get when I download the sequences from NCBI as stated in the instructions... Am I doing anything wrong?

mourisl commented 2 years ago

Sorry for the late reply. There is nothing wrong, if it is for nt.fa, that size makes sense.