bioinformatics-centre / kaiju

Fast taxonomic classification of metagenomic sequencing reads using a protein reference database
http://kaiju.binf.ku.dk
GNU General Public License v3.0
272 stars 67 forks source link

makedb progenomes fails #273

Open Mewgia opened 1 year ago

Mewgia commented 1 year ago

Hello! I'm trying to download progenomes database and see that: Downloading taxdump.tar.gz .listing [ <=> ] 1.85K --.-KB/s in 0.01s
2023-11-23 13:11:56 URL: ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz [1890] -> ".listing" [1] taxdump.tar.gz 100%[=========================================================================>] 60.88M 13.6MB/s in 5.5s
2023-11-23 13:12:03 URL: ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz [63835104] -> "taxdump.tar.gz" [1] Extracting taxdump.tar.gz Downloading proGenomes database https://progenomes.embl.de/data/repGenomes/freeze12.proteins.representatives.fasta.gz: 2023-11-23 13:12:16 ERROR 404: Not Found.

What should I do? Thanks.

pmenzel commented 1 year ago

update kaiju to the latest version from GitHub. The URL for the progenomes download changed since release v1.9.2.

I should make a new release including this fix..

Mewgia commented 1 year ago

Thanks, done. But now there's a new error: Downloading proGenomes database progenomes3.proteins.representatives.f 100%[=========================================================================>] 23.78G 21.1MB/s in 17m 45s 2023-11-27 10:53:36 URL:https://progenomes.embl.de/data/repGenomes/progenomes3.proteins.representatives.fasta.bz2 [25536098227/25536098227] -> "progenomes/source/progenomes3.proteins.representatives.fasta.bz2" [1] Downloading virus genomes from RefSeq Extracting protein sequences from downloaded files xargs: warning: options --max-args and --replace/-I/-i are mutually exclusive, ignoring previous --max-args value Creating Borrows-Wheeler transform infilename= progenomes/kaiju_db_progenomes.faa outfilename= progenomes/kaiju_db_progenomes Alphabet= ACDEFGHIKLMNPQRSTVWY nThreads= 5 length= 0.000000 checkpoint= 3 caseSens=OFF revComp=OFF term= revsort=OFF help=OFF Sequences read time = 2346.803643s SLEN 46856907323 NSEQ 141847069 ALPH ACDEFGHIKLMNPQRSTVWY Killed

96Gb RAM, Debian GNU/Linux 11 (bullseye) Previously, I've used kaiju with progenomes on debian jessie without any problems.

pmenzel commented 1 year ago

Have a look at the table in the README for the memory requirements for building the kaiju index for each available reference database. For proGenomes v3 it is 120GB of RAM.

EorgeKit commented 1 year ago

update kaiju to the latest version from GitHub. The URL for the progenomes download changed since release v1.9.2.

I should make a new release including this fix..

I would like to install the latest version because of the same issue , but unfortunately the conda version I downloaded is still recording v1.9.2 despite bioconda page saying its 1.10.0 and the kaiju-makedb -s progenomes still gets the same not found error. can you please check since I cant install the github way because I am working in a cluster and I don't have the sudo permissions which means I have to wait until they resolve my installation ticket which takes days sometimes.

conda create -n kaiju2 -y  -c bioconda kaiju=1.10.0
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/maloo/.conda/envs/kaiju2

  added / updated specs:
    - kaiju=1.10.0

The following NEW packages will be INSTALLED:

  _libgcc_mutex      conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge 
  _openmp_mutex      conda-forge/linux-64::_openmp_mutex-4.5-2_gnu 
  bzip2              conda-forge/linux-64::bzip2-1.0.8-hd590300_5 
  c-ares             conda-forge/linux-64::c-ares-1.23.0-hd590300_0 
  ca-certificates    conda-forge/linux-64::ca-certificates-2023.11.17-hbcca054_0 
  curl               conda-forge/linux-64::curl-8.4.0-hca28451_0 
  gettext            conda-forge/linux-64::gettext-0.21.1-h27087fc_0 
  kaiju              bioconda/linux-64::kaiju-1.10.0-h43eeafb_0 
  keyutils           conda-forge/linux-64::keyutils-1.6.1-h166bdaf_0 
  krb5               conda-forge/linux-64::krb5-1.21.2-h659d440_0 
  ld_impl_linux-64   conda-forge/linux-64::ld_impl_linux-64-2.40-h41732ed_0 
  libcurl            conda-forge/linux-64::libcurl-8.4.0-hca28451_0 
  libedit            conda-forge/linux-64::libedit-3.1.20191231-he28a2e2_2 
  libev              conda-forge/linux-64::libev-4.33-h516909a_1 
  libexpat           conda-forge/linux-64::libexpat-2.5.0-hcb278e6_1 
  libffi             conda-forge/linux-64::libffi-3.4.2-h7f98852_5 
  libgcc-ng          conda-forge/linux-64::libgcc-ng-13.2.0-h807b86a_3 
  libgomp            conda-forge/linux-64::libgomp-13.2.0-h807b86a_3 
  libidn2            conda-forge/linux-64::libidn2-2.3.4-h166bdaf_0 
  libnghttp2         conda-forge/linux-64::libnghttp2-1.58.0-h47da74e_0 
  libnsl             conda-forge/linux-64::libnsl-2.0.1-hd590300_0 
  libsqlite          conda-forge/linux-64::libsqlite-3.44.2-h2797004_0 
  libssh2            conda-forge/linux-64::libssh2-1.11.0-h0841786_0 
  libstdcxx-ng       conda-forge/linux-64::libstdcxx-ng-13.2.0-h7e041cc_3 
  libunistring       conda-forge/linux-64::libunistring-0.9.10-h7f98852_0 
  libuuid            conda-forge/linux-64::libuuid-2.38.1-h0b41bf4_0 
  libzlib            conda-forge/linux-64::libzlib-1.2.13-hd590300_5 
  ncurses            conda-forge/linux-64::ncurses-6.4-h59595ed_2 
  openssl            conda-forge/linux-64::openssl-3.2.0-hd590300_1 
  perl               conda-forge/linux-64::perl-5.32.1-4_hd590300_perl5 
  pip                conda-forge/noarch::pip-23.3.1-pyhd8ed1ab_0 
  python             conda-forge/linux-64::python-3.12.0-hab00c5b_0_cpython 
  readline           conda-forge/linux-64::readline-8.2-h8228510_1 
  setuptools         conda-forge/noarch::setuptools-68.2.2-pyhd8ed1ab_0 
  tk                 conda-forge/linux-64::tk-8.6.13-noxft_h4845f30_101 
  tzdata             conda-forge/noarch::tzdata-2023c-h71feb2d_0 
  wget               conda-forge/linux-64::wget-1.20.3-ha35d2d1_1 
  wheel              conda-forge/noarch::wheel-0.42.0-pyhd8ed1ab_0 
  xz                 conda-forge/linux-64::xz-5.2.6-h166bdaf_0 
  zlib               conda-forge/linux-64::zlib-1.2.13-hd590300_5 
  zstd               conda-forge/linux-64::zstd-1.5.5-hfc55251_0 

Downloading and Extracting Packages

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate kaiju2
#
# To deactivate an active environment, use
#
#     $ conda deactivate
conda activate kaiju2
 kaiju
Error: Please specify the location of the nodes.dmp file, using the -t option.

Kaiju 1.9.2
Copyright 2015-2022 Peter Menzel, Anders Krogh
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
pmenzel commented 1 year ago

Just tried to install v1.10.0 via conda and got the correct version. You can also just compile it from source and run it without need for root.

Jeffery-Ni commented 7 months ago

I have encountered the same problem:

\033[0;32mExtracting taxdump.tar.gz\033[0m Creating Borrows-Wheeler transform

infilename= refseq/kaiju_db_refseq.faa

outfilename= refseq/kaiju_db_refseq

Alphabet= ACDEFGHIKLMNPQRSTVWY

nThreads= 10

length= 0.000000

checkpoint= 3

caseSens=OFF

revComp=OFF

term= *

revsort=OFF

help=OFF

Sequences read time = 339.643674s SLEN 50636639214 NSEQ 155772604 ALPH *ACDEFGHIKLMNPQRSTVWY /home/Public/Anaconda3/ENTER/envs/kaiju/bin/kaiju-makedb: line 261: 1817838 Killed kaiju-mkbwt -n $threadsBWT -e $exponentSA -a ACDEFGHIKLMNPQRSTVWY -o $DB/kaijudb$DB $DB/kaijudb$DB.faa

Is this the problem with RAM?

pmenzel commented 7 months ago

Probably. See the README for required RAM for each reference database. You can also download premade indexes.

Jeffery-Ni commented 7 months ago

but the indexes offered on kaiju website is a little outdatetd, is there a way to locally make the index with out this much RAM? the server i work on has about 115gb of free RAM, just short to build the current refseq index

pmenzel commented 7 months ago

You won't loose much when using the index file from last year. The memory requirements cannot be reduced..