Closed vscmarques closed 2 years ago
What is the total size of your sequences? Is it 441G (seems too large)? Thanks.
The nt.fa file is sized 468G... The gi_taxid_nucl.map is 35G. Are the files I downloaded instead of the old mapping file too big? I cannot find a replacement for it except for these ones though.
Thank you for the reply!
The nt file is very large, you may need around 1.5T memory for this... (The simple loading of the sequences would take 468G memory)
Makes sense... But this is the file I get when I download the sequences from NCBI as stated in the instructions... Am I doing anything wrong?
Sorry for the late reply. There is nothing wrong, if it is for nt.fa, that size makes sense.
Hello everyone,
I am working on building a database for centrifuge but have been encountering various issues. Firstly, the taxid files were discontinued; I searched for new files and downloaded from here the nucl_gb.accession2taxid.gz and nucl_wgs.accession2taxid.gz files. Merged them into a new file named gi_taxid_nucl.map and proceeded with the database build.
This was the first error I got:
Calculating joined length Writing header Reserving space for joined string Could not allocate space for a joined string of 441911838016 elements. Please try running centrifuge-build on a computer with more memory.
Now, allocating 200GB of RAM and 16 cores to the process, I get this error:
Calculating joined length Writing header Reserving space for joined string Joining reference sequences /var/spool/slurmd/job18134077/slurm_script: line 29: 60366 Bus error (core dumped) centrifuge-build -p 16 --bmax 1342177280 --conversion-table gi_taxid_nucl.map --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp nt.fa nt
I googled and searched around here and still could not figure what this could possibly be. Anyone with the same problem? Anyone with any idea what it could be?
Thank you in advance for any help you can provide.