DaehwanKimLab / centrifuge

Classifier for metagenomic sequences
GNU General Public License v3.0
237 stars 73 forks source link

centrifuge-build: Error: Encountered exception: 'No more suffixes' #136

Open Greblica opened 6 years ago

Greblica commented 6 years ago

Hi,

I'm running a centrifuge-build on a relatively small (40 GB) custom database. The process run almost to the end when I got the error.

Error: Encountered exception: 'No more suffixes' Command: centrifuge-build-bin --wrapper basic-0 -p 50 --bmax 1342177280 --ftabchars 14 --conversion-table Centrifuge_refseq/refseq_GO.access2taxid.txt --taxonomy-tree TaxonomyNCBI20180619/nodes.dmp --name-table TaxonomyNCBI20180619/names.dmp Centrifuge_refseq/refseq_GenomesOrganelles_2018625_dustmasked.fna Centrifuge_refseq/refseqGO_20182506 Deleting "Centrifuge_refseq/refseqGO_20182506.1.cf" file written during aborted indexing attempt. Deleting "Centrifuge_refseq/refseqGO_20182506.2.cf" file written during aborted indexing attempt. Deleting "Centrifuge_refseq/refseqGO_20182506.3.cf" file written during aborted indexing attempt.

Do you have any idea where does this error comes from? What does it mean? I am totally stuck. I don't know if it helps, but at the beginning of the run, the command creates cf files. However, 2.cf is always empty (unlike 1.cf and 3.cf). I do not understand why this happens.

I really want to build by custom databases in order for them to be fully comparable across different classifiers.

Thanks a lot in advance for your help, G

samhunter commented 5 years ago

Hi Greblica, if you haven't solved this already:

I kept getting this behavior as well. I think that centrifuge-build tries to create some very large temporary files, and fails if the drive runs out of space. It took a while to figure this out because it seemed that I had plenty of space free (~100Gb), but when I built the database on a different drive (with many TB free) it started working properly. I was building the database from 13 eukaryotic genomes. The database build process took 7 hours on a system with 1Tb of RAM, with access to 50 of the 64 cores.

In the interest of full disclosure, I also set the -bmax higher ( setting --bmax 3342177280 vs the auto-detected setting of --bmax 2310080734) because it seems that centrifuge-build wasn't taking full advantage of the 1Tb of ram I have available. With --bmax 3342177280 set it is using 671.8Gb.

I hope it helps.

Sam