fbreitwieser / krakenuniq

🐙 KrakenUniq: Metagenomics classifier with unique k-mer counting for more specific results
GNU General Public License v3.0
220 stars 44 forks source link

Building a microbial nt database #114

Open DesmondoDekker opened 2 years ago

DesmondoDekker commented 2 years ago

Building a microbial nt database

Dear staff,

I am trying to build a microbial-nt database. According to the tutorial I run the following command:

krakenuniq-download --db DB --taxa "archaea,bacteria,viral,fungi,protozoa,helminths" --dust --exclude-environmental-taxa microbial-nt

Next I run the following command krakenuniq-build --db microbial-nt/ --kmer-len 31 --taxids-for-genomes --taxids-for-sequences

But I got the following error message.

_Kraken build set to minimize disk writes. Found 5 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory. Creating k-mer set (step 1 of 6)... Using /home/clusterusers/lborruso/.conda/envs/krakenuniq/libexec/jellyfish-install/bin/jellyfish count_unique: malformed fasta file - expected header char > not found Hash size not specified, using '0' /home/clusterusers/lborruso/.conda/envs/krakenuniq/libexec/builddb.sh: line 46: /home/clusterusers/lborruso/.conda/envs/krakenuniq/libexec/jellyfish-install/bin/jellyfish: No such file or directory

Thanks a lot

Luigi

alekseyzimin commented 2 years ago

Please make sure you have installed krakenuniq with the option to install jellyfish ( -j ). Jellyfish v1.1 is mandatory for building databases.

DesmondoDekker commented 2 years ago

Dear Aleksey Zimin,

Thank I have installed jellyfish, but unfortunately, I get another error message using both the "--work-on-disk" and not "--work-on-disk option". Any clue about reducing the RAM required?.

Here the command:

!/bin/bash

SBATCH --partition=bioinfo

SBATCH --nodes=1

SBATCH --ntasks=1

SBATCH --cpus-per-task=12

SBATCH --mem=128000

source /opt/modules/init/bash module purge module load anaconda3 conda activate krakenuniq krakenuniq-build --db /data_agro/DB_microbe --kmer-len 31 --threads 12 --taxids-for-genomes --taxids-for-sequences --work-on-disk


StickHu commented 1 year ago

Hi, have you solved the problem?