Closed cae803 closed 2 years ago
The manual might be a bit out-of-date. To build nt database, you can try the "make nt" command in the indices folder. You can check the Makefile there for more details.
Dear @mourisl , Thank you for the suggestion. I'll try the command!
Hello @cae803 Have you successfully built the nt database? I am having a lot of difficulties. If you could please tell me if you managed to do it and how, I would be extremely thankful. Thank you!
Hi @vscmarques I have stopped the construction of the database due to insufficient memory. My error is as follows.
Calculating joined length
Writing header
Reserving space for joined string
Could not allocate space for a joined string of 441911838016 elements.
Please try running centrifuge-build on a computer with more memory.
Total time for call to driver() for forward index: 01:47:33
Error: Encountered internal Centrifuge exception (#1)
Command: centrifuge-build-bin --wrapper basic-0 -p 1 --ftabchars=14 --conversion-table /dev/fd/63 --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp nt-dusted.fna tmp_nt/nt
Deleting "tmp_nt/nt.1.cf" file written during aborted indexing attempt.
Deleting "tmp_nt/nt.2.cf" file written during aborted indexing attempt.
Deleting "tmp_nt/nt.3.cf" file written during aborted indexing attempt.
Deleting "tmp_nt/nt.1.cf" file written during aborted indexing attempt.
Deleting "tmp_nt/nt.2.cf" file written during aborted indexing attempt.
Deleting "tmp_nt/nt.3.cf" file written during aborted indexing attempt.
I am currently arranging additional memory.
Hi @vscmarques
I finally completed building nt database using the following command!
make THREADS=32 nt
I referred to this wiki: https://github.com/khyox/recentrifuge/wiki/Centrifuge-nt
It required about 300GB of memory in my workstation. Here is the time consumed. real 513m38.670s user 4125m4.993s sys 75m53.097s
Hi, authors. Thank you for distributing Centrifuge!
I'd like to build nt database by referring to the manual. However, I have an issue with getting a map file.
The gi_taxid_nucl.dmp.gz seems to be out of date. The readme file of the gi_taxid file (ftp://ftp.ncbi.nih.gov/pub/taxonomy//obsolete/gi_taxid.readme) says "the gi_taxid* files update in this directory has been discontinued. Please use files from directory ./accession2taxid".
Are there any plans to support the accession2taxid file? It would be nice to be able to use the new taxid file.