narsapuramvijaykumar commented 6 years ago

Hello Team Centrifuge,

I was trying to build my bacterial database using centrifuge, it took almost 60+ hours but still the build status is running..!! Any idea about when it will complete and usually how much time does it take to build a database of size 36GB...?

Command used

I have used 250GB memory with 10 processors. Below is the std out from the program Settings: Output files: "/home/vj/centifuge/abv..cf" Line rate: 7 (line is 128 bytes) Lines per side: 1 (side is 128 bytes) Offset rate: 4 (one in 16) FTable chars: 10 Strings: unpacked Local offset rate: 3 (one in 8) Local fTable chars: 6 Max bucket size: default Max bucket size, sqrt multiplier: default Max bucket size, len divisor: 4 Difference-cover sample period: 1024 Endianness: little Actual local endianness: little Sanity checking: disabled Assertions: disabled Random seed: 0 Sizeofs: void:8, int:4, long:8, size_t:8 Input files DNA, FASTA: /home/vj/centifuge/input-sequences.fna Reading reference sizes Time reading reference sizes: 00:18:06 Calculating joined length Writing header Reserving space for joined string Joining reference sequences

And struck at "Joining reference sequences" point..Any suggestions to improve these..?

Thanks in advance,

Regards, vijay N

mourisl commented 6 years ago

Do you have the commands that you used for downloading the reference genomes?

narsapuramvijaykumar commented 6 years ago

ref genome download command-line as below. centrifuge-download -o library -m -d "bacteria" refseq > seqid2taxid.map

And also build command line as mentioned below. centrifuge-build -p 10 --conversion-table /home/vj/centifuge/seqid2taxid.map --taxonomy-tree /home/vj/centifuge/taxonomy/nodes.dmp --name-table /home/vj/centifuge/taxonomy/names.dmp /home/vj/centifuge/input-sequences.fna /home/vj/centifuge/abv

DaehwanKimLab / centrifuge

centrifuge Build issue #113

Command used