bioinformatics-centre / kaiju

Fast taxonomic classification of metagenomic sequencing reads using a protein reference database
http://kaiju.binf.ku.dk
GNU General Public License v3.0
260 stars 68 forks source link

makedb.sh -e failing #33

Closed taylorreiter closed 7 years ago

taylorreiter commented 7 years ago

I have attempted to build and use the makedb.sh -e kaiju database. It has failed with the following output:

$ ls -l ~/kaijudb_e/kaijudb_e/ total 167947636 -rw-rw-r-- 1 ubuntu ubuntu 33420490846 Mar 17 02:25 kaiju_db_nr_euk.bwt -rw-rw-r-- 1 ubuntu ubuntu 35484801934 Mar 17 01:00 kaiju_db_nr_euk.faa -rw-rw-r-- 1 ubuntu ubuntu 48369697476 Mar 17 02:37 kaiju_db_nr_euk.fmi -rw-rw-r-- 1 ubuntu ubuntu 9380484230 Mar 17 02:25 kaiju_db_nr_euk.sa -rw-r--r-- 1 ubuntu ubuntu 837101 Mar 16 23:20 merged.dmp -rw-r--r-- 1 ubuntu ubuntu 138034822 Mar 16 23:20 names.dmp -rw-r--r-- 1 ubuntu ubuntu 107384758 Mar 16 23:20 nodes.dmp -rw-rw-r-- 1 ubuntu ubuntu 27940725654 Mar 16 07:45 nr.gz -rw-rw-r-- 1 ubuntu ubuntu 14285367259 Mar 16 23:57 prot.accession2taxid -rw-rw-r-- 1 ubuntu ubuntu 2811685571 Mar 12 08:38 prot.accession2taxid.gz -rw-rw-r-- 1 ubuntu ubuntu 38816191 Mar 16 23:20 taxdump.tar.gz $ ~/kaiju/bin/kaiju -t ~/kaijudb_e/kaijudb_e/nodes.dmp -f ~/kaijudb_e/kaijudb_e/kaiju_db.fmi -i /mnt/work/hisat/unaligned/unaligned_SRR926282qc.fq -v -o kaiju_e_test 03:20:10 Reading database Reading taxonomic tree from file /home/ubuntu/kaijudb_e/kaijudb_e/nodes.dmp Reading index from file /home/ubuntu/kaijudb_e/kaijudb_e/kaiju_db.fmi Could not open file /home/ubuntu/kaijudb_e/kaijudb_e/kaiju_db.fmi Kaiju 1.5.0 Copyright 2015,2016 Peter Menzel, Anders Krogh License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html

Usage: /home/ubuntu/kaiju/bin/kaiju -t nodes.dmp -f kaiju_db.fmi -i reads.fastq [-j reads2.fastq]

Mandatory arguments: -t FILENAME Name of nodes.dmp file -f FILENAME Name of database (.fmi) file -i FILENAME Name of input file containing reads in FASTA or FASTQ format

Optional arguments: -j FILENAME Name of second input file for paired-end reads -o FILENAME Name of output file. If not specified, output will be printed to STDOUT -z INT Number of parallel threads (default: 1) -a STRING Run mode, either "mem" or "greedy" (default: mem) -e INT Number of mismatches allowed in Greedy mode (default: 0) -m INT Minimum match length (default: 11) -s INT Minimum match score in Greedy mode (default: 65) -x Enable SEG low complexity filter -p Input sequences are protein sequences -v Enable verbose output

I had ample ram to build the database, and the hard drive has an extra ~43 GB of space after the makedb.sh -e command finishes. Additionally, the output of makedb.sh -e informs me that the building process has finished.

I ran makedb.sh -p and it worked fine.

Any guidance on this issue would be greatly appreciated.

pmenzel commented 7 years ago

Hi, when you do makeDB.sh -e, then the resulting database file is called kaiju_db_nr_euk.fmi. Then when you call kaiju, you need to give that file name to the -f option:

~/kaiju/bin/kaiju -t ~/kaijudb_e/kaijudb_e/nodes.dmp -f ~/kaijudb_e/kaijudb_e/kaiju_db_nr_euk.fmi ...

The reason is, that you can have different databases in the same directory without them overwriting each other. But probably I should probably remove the fixed kaiju_db.fmi from the help output of kaiju, because the file is only named like that for makeDB.sh -n or -p.

taylorreiter commented 7 years ago

How silly of me! I can't believe I missed that. Thank you for your prompt reply.

A change in the help file would be helpful!