WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
259 stars 52 forks source link

database does not exist at path none #80

Closed yulong0827 closed 1 year ago

yulong0827 commented 3 years ago

Thanks first. this is the database setup. So, uniref, pfam and dbcan is ok for use? DRAM-setup.py prepare_databases --output_dir DRAM_data --threads 28 2021-04-27 21:32:56.852851: Database preparation started 9:55:06.911831: UniRef database processed 11:26:04.742593: PFAM database processed 11:26:24.184667: dbCAN database processed Traceback (most recent call last): File "/home/liuyulong/miniconda3/envs/dram/bin/DRAM-setup.py", line 146, in args.func(**args_dict) File "/home/liuyulong/miniconda3/envs/dram/lib/python3.6/site-packages/mag_annotator/database_processing.py", line 467, in prepare_databases verbose=verbose) File "/home/liuyulong/miniconda3/envs/dram/lib/python3.6/site-packages/mag_annotator/database_processing.py", line 214, in download_and_process_viral_refseq download_file(refseq_url, refseq_faa, verbose=verbose) File "/home/liuyulong/miniconda3/envs/dram/lib/python3.6/site-packages/mag_annotator/utils.py", line 27, in download_file run_process(['wget', '-O', output_file, url], verbose=verbose) File "/home/liuyulong/miniconda3/envs/dram/lib/python3.6/site-packages/mag_annotator/utils.py", line 39, in run_process stderr=stderr).stdout.decode(errors='ignore') File "/home/liuyulong/miniconda3/envs/dram/lib/python3.6/subprocess.py", line 418, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['wget', '-O', 'DRAM_data/database_files/viral.1.protein.faa.gz', 'ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.1.protein.faa.gz']' returned non-zero exit status 4.

When annotatiDRAM.py annotate -i 1.fa -o 1dram --threads 24 1 fastas found 2021-04-28 09:57:16.859237: Annotation started Traceback (most recent call last): File "/home/liuyulong/miniconda3/envs/dram/bin/DRAM.py", line 153, in args.func(**args_dict) File "/home/liuyulong/miniconda3/envs/dram/lib/python3.6/site-packages/mag_annotator/annotate_bins.py", line 969, in annotate_bins_cmd checkm_quality, rename_bins, keep_tmp_dir, low_mem_mode, threads, verbose) File "/home/liuyulong/miniconda3/envs/dram/lib/python3.6/site-packages/mag_annotator/annotate_bins.py", line 1000, in annotate_bins db_handler = DatabaseHandler(db_locs['description_db']) File "/home/liuyulong/miniconda3/envs/dram/lib/python3.6/site-packages/mag_annotator/database_handler.py", line 17, in init raise ValueError('Database does not exist at path %s' % database_loc) ValueError: Database does not exist at path None Is this mean it is not correct for database location? Thank you

shafferm commented 3 years ago

This is failing because there was an error when you ran DRAM-setup.py. This may have happened because NCBI refseq FTP was down or maybe the server you are running on lost internet connect for some reason when you tried to run DRAM-setup.py. I would recommend trying to run the first command again and see if it finishes without an error this time. If that happens then annotate should work.

yulong0827 commented 3 years ago

This is failing because there was an error when you ran DRAM-setup.py. This may have happened because NCBI refseq FTP was down or maybe the server you are running on lost internet connect for some reason when you tried to run DRAM-setup.py. I would recommend trying to run the first command again and see if it finishes without an error this time. If that happens then annotate should work. Thank you very much for your reply. However, i tried again and got the same error:subprocess.CalledProcessError: Command '['wget', '-O', 'DRAM_data/database_files/viral.1.protein.faa.gz', 'ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.1.protein.faa.gz']' returned non-zero exit status 4. So i wonder if there is any alternative way for db preparation? perhaps it is a web caused error?

shafferm commented 3 years ago

Can you try running that command directly and see if it works? The command is wget -O DRAM_data/database_files/viral.1.protein.faa.gz ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.1.protein.faa.gz.

jamesck2 commented 2 years ago

Hi all, I'm having a similar issue that hasn't been resolved. I've tried running DRAM-setup.py twice with this batch command:

#!/bin/bash
#####  Constructed by HPC everywhere #####
#SBATCH --mail-user=********@********.***
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --time=12:00:00
#SBATCH --mem=500gb
#SBATCH --partition=largememory
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --job-name=DRAM-database
#SBATCH --output=DRAM-database.out
#SBATCH --error=DRAM-database.err

######  Module commands #####
module load anaconda/python3.8/2020.07

######  Job commands go below this line #####
cd /N/slate/jakosmo

source activate DRAM

DRAM-setup.py prepare_databases --output_dir /N/slate/jakosmo/DRAM_data
DRAM-setup.py print_config

And the output text file displays this:

2022-01-23 10:34:07.057045: Database preparation started
Processed search databases
KEGG db: None
KOfam db: /N/slate/jakosmo/DRAM_data/kofam_profiles.hmm
KOfam KO list: /N/slate/jakosmo/DRAM_data/kofam_ko_list.tsv
UniRef db: /N/slate/jakosmo/DRAM_data/uniref90.20220122.mmsdb
Pfam db: /N/slate/jakosmo/DRAM_data/pfam.mmspro
dbCAN db: /N/slate/jakosmo/DRAM_data/dbCAN-HMMdb-V9.txt
RefSeq Viral db: /N/slate/jakosmo/DRAM_data/refseq_viral.20220123.mmsdb
MEROPS peptidase db: /N/slate/jakosmo/DRAM_data/peptidases.20220123.mmsdb
VOGDB db: /N/slate/jakosmo/DRAM_data/vog_latest_hmms.txt

Descriptions of search database entries
Pfam hmm dat: /N/slate/jakosmo/DRAM_data/Pfam-A.hmm.dat.gz
dbCAN family activities: /N/slate/jakosmo/DRAM_data/CAZyDB.07302020.fam-activities.txt
VOG annotations: /N/slate/jakosmo/DRAM_data/vog_annotations_latest.tsv.gz

Description db: /N/slate/jakosmo/DRAM_data/description_db.sqlite

DRAM distillation sheets
Genome summary form: /N/slate/jakosmo/DRAM_data/genome_summary_form.20220123.tsv
Module step form: /N/slate/jakosmo/DRAM_data/module_step_form.20220123.tsv
ETC module database: /N/slate/jakosmo/DRAM_data/etc_mdoule_database.20220123.tsv
Function heatmap form: /N/slate/jakosmo/DRAM_data/function_heatmap_form.20220123.tsv
AMG database: /N/slate/jakosmo/DRAM_data/amg_database.20220123.tsv

However, I am getting this error in the error file which is preventing me from successfully being able to run DRAM-v.py:

Traceback (most recent call last):
  File "/N/u/jakosmo/Carbonate/.conda/envs/DRAM/bin/DRAM-setup.py", line 157, in <module>
    args.func(**args_dict)
  File "/N/u/jakosmo/Carbonate/.conda/envs/DRAM/lib/python3.9/site-packages/mag_annotator/database_processing.py", line 289, in prepare_databases
    output_dbs['uniref_db_loc'] = download_and_process_uniref(uniref_loc, temporary, uniref_version=uniref_version,
  File "/N/u/jakosmo/Carbonate/.conda/envs/DRAM/lib/python3.9/site-packages/mag_annotator/database_processing.py", line 91, in download_and_process_uniref
    make_mmseqs_db(uniref_fasta_zipped, uniref_mmseqs_db, create_index=True, threads=threads, verbose=verbose)
  File "/N/u/jakosmo/Carbonate/.conda/envs/DRAM/lib/python3.9/site-packages/mag_annotator/utils.py", line 38, in make_mmseqs_db
    run_process(['mmseqs', 'createindex', output_loc, tmp_dir, '--threads', str(threads)], verbose=verbose)
  File "/N/u/jakosmo/Carbonate/.conda/envs/DRAM/lib/python3.9/site-packages/mag_annotator/utils.py", line 27, in run_process
    return subprocess.run(command, check=check, shell=shell, stdout=subprocess.PIPE,
  File "/N/u/jakosmo/Carbonate/.conda/envs/DRAM/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['mmseqs', 'createindex', '/N/slate/jakosmo/DRAM_data/database_files/uniref90.20220123.mmsdb', '/N/slate/jakosmo/DRAM_data/database_files/tmp', '--threads', '10']' returned non-zero exit status 1.

I don't know if manually downloading the viral protein database from refseq as mentioned above helped @yulong0827 , but would trying the same thing with UniRef help? How would I go about that? Or is there perhaps another solution? Thanks in advance, please advise.

rmFlynn commented 2 years ago

@jamesck2 In the same directory as where you are running dram try running that command mmseqs createindex /N/slate/jakosmo/DRAM_data/database_files/uniref90.20220123.mmsdb /N/slate/jakosmo/DRAM_data/database_files/tmp --threads 10 See what errors it spits out.

rmFlynn commented 1 year ago

Closing because of inactivity