Closed hudenise closed 8 years ago
I just tried it here and makeDB.sh went through smoothly. It seems that in your case, the last step (mkfmi) was skipped.
The full output should look like this:
$ makeDB.sh -v
Downloading taxonomy files from NCBI
2016-08-08 11:59:12 URL: ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz [2751] -> ".listing" [1]
2016-08-08 12:00:58 URL: ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz [36832801] -> "taxdump.tar.gz" [1]
Extracting nodes.dmp and names.dmp files
Creating directory genomes/
Downloading file list for full genomes...
2016-08-08 12:01:02 URL: ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/archaea/assembly_summary.txt [154723] -> "assembly_summary.archaea.txt" [1]
2016-08-08 12:01:52 URL: ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/assembly_summary.txt [18037645] -> "assembly_summary.bacteria.txt" [1]
Downloading 5336 genome files from GenBank FTP server. This may take a while...
2016-08-08 12:01:55 URL: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000007725.1_ASM772v1/GCF_000007725.1_ASM772v1_genomic.gbff.gz [410498] -> "genomes/GCF_000007725.1_ASM772v1_genomic.gbff.gz" [1]
...
2016-08-08 14:43:28 URL: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000827835.1_ASM82783v1/GCF_000827835.1_ASM82783v1_genomic.gbff.gz [2769976] -> "genomes/GCF_000827835.1_ASM82783v1_genomic.gbff.gz" [1]
Downloading virus genomes from GenBank FTP server...
2016-08-08 14:50:12 URL: ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.1.genomic.gbff.gz [160469055] -> "genomes/viral.1.genomic.gbff.gz" [1]
2016-08-08 14:51:09 URL: ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.2.genomic.gbff.gz [18433279] -> "genomes/viral.2.genomic.gbff.gz" [1]
Extracting protein sequences from downloaded files...
Creating Borrows-Wheeler transform...
# infilename= kaiju_db.faa
# outfilename= kaiju_db
# Alphabet= ACDEFGHIKLMNPQRSTVWY
# nThreads= 5
# length= 0.000000
# checkpoint= 3
# caseSens=OFF
# revComp=OFF
# term= *
# revsort=OFF
# help=OFF
Sequences read time = 111.470166s
SLEN 5926791275
NSEQ 18269649
ALPH *ACDEFGHIKLMNPQRSTVWY
SA NCHECK=1
Sorting done, time = 8727.987023s
Creating FM-Index...
# filenm= kaiju_db
# removecmd= NULL (null)
# help=OFF
Reading BWT from file kaiju_db.bwt ... DONE
BWT of length 5762364424 has been read with 18269649 sequencs, alphabet=*ACDEFGHIKLMNPQRSTVWY
Reading suffix array from file kaiju_db.sa ... DONE
Writing BWT header and SA to file kaiju_db.fmi ... DONE
Constructing FM index
10% ... 20% ... 30% ... 40% ... 50% ... 60% ... 70% ... 80% ... 90% ... 100% ... index2 done ...
DONE
Writing FM index to file ... DONE
!! You can now delete files kaiju_db.bwt and kaiju_db.sa !!
Done!
You can delete the folder genomes/ as well as the files taxdump.tar.gz, kaiju_db.faa, kaiju_db.bwt, and kaiju_db.sa
Kaiju only needs the files kaiju_db.fmi, nodes.dmp, and names.dmp.
You could try to just repeat the last step with $../bin/mkfmi kaiju_db
Thanks, indeed my first output was looking quite short compare to yours. However I just tested the command you send me: still no failure message but files are still missing: ../bin/mkfmi kaiju_db
Reading BWT from file kaiju_db.bwt ... DONE BWT of length 5762359137 has been read with 18269638 sequencs, alphabet=*ACDEFGHIKLMNPQRSTVWY Reading suffix array from file kaiju_db.sa ...
[hudenise@ebi6-209 kaijudb]$ ls -ltr total 16199688 -rw-r--r-- 1 hudenise metagen 102157973 Aug 4 15:20 nodes.dmp -rw-r--r-- 1 hudenise metagen 130600015 Aug 4 15:20 names.dmp -rw-rw-r-- 1 hudenise metagen 5979563210 Aug 4 16:52 kaiju_db.faa -rw-rw-r-- 1 hudenise metagen 4653666088 Aug 4 17:14 kaiju_db.sa -rw-rw-r-- 1 hudenise metagen 5665780084 Aug 4 17:14 kaiju_db.bwt -r--r--r-- 1 hudenise metagen 36813213 Aug 5 09:20 taxdump.tar.gz -rw-rw-r-- 1 hudenise metagen 154723 Aug 5 11:12 assembly_summary.archaea.txt -rw-rw-r-- 1 hudenise metagen 582300 Aug 5 11:12 downloadlist.txt -rw-rw-r-- 1 hudenise metagen 18037645 Aug 5 11:12 assembly_summary.bacteria.txt drwxrwxr-x 2 hudenise metagen 1093632 Aug 5 11:28 genomes
Hubert
On 08/08/2016 15:46, Peter Menzel wrote:
I just tried it here and makeDB.sh went through smoothly. It seems that in your case, the last step (mkfmi) was skipped.
The full output should look like this:
$ makeDB.sh -v Downloading taxonomy files from NCBI 2016-08-08 11:59:12 URL: ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz [2751] -> ".listing" [1] 2016-08-08 12:00:58 URL: ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz [36832801] -> "taxdump.tar.gz" [1] Extracting nodes.dmp and names.dmp files Creating directory genomes/ Downloading file list for full genomes... 2016-08-08 12:01:02 URL: ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/archaea/assembly_summary.txt [154723] -> "assembly_summary.archaea.txt" [1] 2016-08-08 12:01:52 URL: ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/assembly_summary.txt [18037645] -> "assembly_summary.bacteria.txt" [1] Downloading 5336 genome files from GenBank FTP server. This may take a while... 2016-08-08 12:01:55 URL: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000007725.1_ASM772v1/GCF_000007725.1_ASM772v1_genomic.gbff.gz [410498] -> "genomes/GCF_000007725.1_ASM772v1_genomic.gbff.gz" [1] ... 2016-08-08 14:43:28 URL: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000827835.1_ASM82783v1/GCF_000827835.1_ASM82783v1_genomic.gbff.gz [2769976] -> "genomes/GCF_000827835.1_ASM82783v1_genomic.gbff.gz" [1] Downloading virus genomes from GenBank FTP server... 2016-08-08 14:50:12 URL: ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.1.genomic.gbff.gz [160469055] -> "genomes/viral.1.genomic.gbff.gz" [1] 2016-08-08 14:51:09 URL: ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.2.genomic.gbff.gz [18433279] -> "genomes/viral.2.genomic.gbff.gz" [1] Extracting protein sequences from downloaded files... Creating Borrows-Wheeler transform... # infilename= kaiju_db.faa # outfilename= kaiju_db # Alphabet= ACDEFGHIKLMNPQRSTVWY # nThreads= 5 # length= 0.000000 # checkpoint= 3 # caseSens=OFF # revComp=OFF # term= * # revsort=OFF # help=OFF Sequences read time = 111.470166s SLEN 5926791275 NSEQ 18269649 ALPH *ACDEFGHIKLMNPQRSTVWY SA NCHECK=1 Sorting done, time = 8727.987023s Creating FM-Index... # filenm= kaiju_db # removecmd= NULL (null) # help=OFF Reading BWT from file kaiju_db.bwt ... DONE BWT of length 5762364424 has been read with 18269649 sequencs, alphabet=*ACDEFGHIKLMNPQRSTVWY Reading suffix array from file kaiju_db.sa ... DONE Writing BWT header and SA to file kaiju_db.fmi ... DONE Constructing FM index 10% ... 20% ... 30% ... 40% ... 50% ... 60% ... 70% ... 80% ... 90% ... 100% ... index2 done ... DONE Writing FM index to file ... DONE !! You can now delete files kaiju_db.bwt and kaiju_db.sa !! Done! You can delete the folder genomes/ as well as the files taxdump.tar.gz, kaiju_db.faa, kaiju_db.bwt, and kaiju_db.sa Kaiju only needs the files kaiju_db.fmi, nodes.dmp, and names.dmp.
You could try to just repeat the last step with
$../bin/mkfmi kaiju_db
You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/bioinformatics-centre/kaiju/issues/8#issuecomment-238260388
Dr Hubert DENISE
Metagenomics European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom Tel : (+44)01223 494102
Hm, so it means that mkfmi stops on
Reading suffix array from file kaiju_db.sa ...
without an explicit errror message? And it doesn't reach the point
Reading suffix array from file kaiju_db.sa ... DONE
?
I don't see what could happen there and I assume you have enough RAM, since mkbwt completed successfully, at least 16GB? Otherwise maybe you could try a different machine..
In the meantime, you can use the index files that are used by the web server http://kaiju.binf.ku.dk/database/kaiju_index.tgz, which contains the kaiju_db.fmi and *.dmp files.
Ok, I increased the memory to 16G and the fmi file is now being created. Thanks Hubert
On 08/08/2016 16:03, Peter Menzel wrote:
Hm, so it means that mkfmi stops on
Reading suffix array from file kaiju_db.sa ...
without an explicit errror message? And it doesn't reach the point
Reading suffix array from file kaiju_db.sa ... DONE
?
I don't see what could happen there and I assume you have enough RAM, since mkbwt completed successfully, at least 16GB? Otherwise maybe you could try a different machine..
In the meantime, you can use the index files that are used by the web server http://kaiju.binf.ku.dk/database/kaiju_index.tgz, which contains the kaiju_db.fmi and *.dmp files.
You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/bioinformatics-centre/kaiju/issues/8#issuecomment-238265869
Dr Hubert DENISE
Metagenomics European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom Tel : (+44)01223 494102
Good to hear!
Sorry, I've got another question: Running '../bin/kaiju -t nodes.dmp -f kaiju_db.fmi -i ../testsequence.fasta' gave me a list of classified and unclassified sequences with taxId associated with the classified ones. Fine. However I should be able to save this output by adding the '-o output.txt' option to my call. Unfortunately, the output.txt is not created and no data are saved. Would you be able to advise, please? Thanks Hubert
On 08/08/2016 16:56, Peter Menzel wrote:
Closed #8.
You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/bioinformatics-centre/kaiju/issues/8#event-748598612
Dr Hubert DENISE
Metagenomics European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom Tel : (+44)01223 494102
Hi, that is again quite strange. You have enough disk space free and permission to create the file? You could try the option -v for a bit more verbose output, which should look like this:
./bin/kaiju -t nodes.dmp -f kaiju_db.fmi -i test.fa -v -o output.txt
10:21:26 Reading database
Reading taxonomic tree from file nodes.dmp
Reading index from file kaiju_db.fmi
Output file: output.txt
10:21:45 Start classification using 1 threads.
10:21:45 Finished.
It seems that I've got it working by giving 16G of memory instead of the 8G I used by default. It would be nice to get an error message for this kind of issue. Would you be able to let us know if the memory requirement increase with the size of the query set or does it stay more or less constant? Thanks Hubert
On 09/08/2016 09:25, Peter Menzel wrote:
Hi, that is again quite strange. You have enough disk space free and permission to create the file? You could try the option -v for a bit more verbose output, which should look like this:
./bin/kaiju -t nodes.dmp -f kaiju_db.fmi -i test.fa -v -o output.txt 10:21:26 Reading database Reading taxonomic tree from file nodes.dmp Reading index from file kaiju_db.fmi Output file: output.txt 10:21:45 Start classification using 1 threads. 10:21:45 Finished.
You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/bioinformatics-centre/kaiju/issues/8#issuecomment-238486750
Dr Hubert DENISE
Metagenomics European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom Tel : (+44)01223 494102
Good that it works now. In principle Kaiju needs a bit more memory than the size of the fmi file. (which gets bigger due to increasing number of genomes in GenBank). The size of your input fasta/q file does not affect the memory requirements.
Great, thanks for your timely responses. Much appreciated, cheers Hubert
On 09/08/2016 09:50, Peter Menzel wrote:
In principle Kaiju needs a bit more memory than the size of the fmi file. (which gets bigger due to increasing number of genomes in GenBank). The size of your input fasta/q file does not affect the memory requirements
Dr Hubert DENISE
Metagenomics European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom Tel : (+44)01223 494102
Hi everyone,
I was creating the Kaiju database, the .fmi was not created and I my kaijudb directory look like this:
merged.dmp names.dmp nodes.dmp refseq taxdump.tar.gz
and the refseq subdirectory contents are:
assembly_summary.archaea.txt assembly_summary.bacteria.txt downloadlist.txt kaiju_db_refseq.faa source
Should I run
ERROR: File kaijudb/.bwt containing BWT could not be opened for reading Thanks, TJ
I am also experiencing the same issue. What was the fix? Many thanks
Hi everyone, I was creating the Kaiju database, the .fmi was not created and I my kaijudb directory look like this: merged.dmp names.dmp nodes.dmp refseq taxdump.tar.gz and the refseq subdirectory contents are: assembly_summary.archaea.txt assembly_summary.bacteria.txt downloadlist.txt kaiju_db_refseq.faa source Should I run on
to create the .fmi ? I'm unsure whether I should start over. I tried kaiju-mkfmi kaijudb, but I got filenm= kaijudb/
removecmd= NULL (null)
help=OFF
ERROR: File kaijudb/.bwt containing BWT could not be opened for reading Thanks, TJ
Hi, I'm new to Kaiju and would like to test it for our purpose as alternative to Qiime. The installation was straight forward (thanks!) however when downloading the NCBI database, the kaiju_db.fmi file is not created, meaning that I'm unable to run kaiju My call was: $../bin/makeDB.sh -v and the screen ouput looks OK: Extracting protein sequences from downloaded files... Creating Borrows-Wheeler transform...
infilename= kaiju_db.faa
outfilename= kaiju_db
Alphabet= ACDEFGHIKLMNPQRSTVWY
nThreads= 5
length= 0.000000
checkpoint= 3
caseSens=OFF
revComp=OFF
term= *
revsort=OFF
help=OFF
Sequences read time = 86.800000s SLEN 5926785889 NSEQ 18269638 ALPH *ACDEFGHIKLMNPQRSTVWY SA NCHECK=0 Sorting done, time = 5752.120000s
$ ls assembly_summary.archaea.txt assembly_summary.bacteria.txt downloadlist.txt genomes kaiju_db.bwt kaiju_db.faa kaiju_db.sa names.dmp nodes.dmp taxdump.tar.gz