bioinformatics-centre / kaiju

Fast taxonomic classification of metagenomic sequencing reads using a protein reference database
http://kaiju.binf.ku.dk
GNU General Public License v3.0
260 stars 68 forks source link

Core Dump during makeDB.sh -e #7

Closed bstamps closed 8 years ago

bstamps commented 8 years ago

Hello- I'm having an issue creating the Kaiju DB including eukarya with the following error... any help would be great!

# infilename= kaiju_db_nr_euk.faa
# outfilename= kaiju_db_nr_euk
# Alphabet= ACDEFGHIKLMNPQRSTVWY
# nThreads= 2
# length= 0.000000
# checkpoint= 5
# caseSens=OFF
# revComp=OFF
# term= *
# revsort=OFF
# help=OFF 
readFasta: Failed to alloc seq of length 26921682502
Sequences read time = 0.000400s
Segmentation fault (core dumped)
pmenzel commented 8 years ago

Hello,

the NR database including fungi and microbial eukaryotes is much larger (>74.5m protein sequences) than only the proteins from the complete genome database. Thus, makeDB.sh needs more RAM for creating Kaiju's index. As of yesterday, makeDB.sh -t 1 -e (the option -t 1 only uses 1 thread) needs 68GB of RAM.

You can download the index currently used by the web server: http://kaiju.binf.ku.dk/server (blue box on the left side), if you don't have that much memory.