WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
249 stars 52 forks source link

DRAM-setup.py prepare databases error "returned non-zero exit status 4." #189

Closed TX0814 closed 2 years ago

TX0814 commented 2 years ago

Hi Mike, I met the same error in https://github.com/WrightonLabCSU/DRAM/issues/112. I have tried to resolve it by your comments in issue 112, but I failed. Dorothydu12 said he/she re-installed using provided pipeline and it works. So, do you know the pipeline he/she said?

$ DRAM-setup.py prepare_databases --output_dir /home/cloudam/envs/DRAM/DRAM_data 2022-07-20 09:15:37.041710: Database preparation started Downloading dbCAN family activities from : https://bcb.unl.edu/dbCAN2/download/Databases/V10/CAZyDB.07292021.fam-activities.txt Traceback (most recent call last): File "/home/cloudam/.conda/envs/DRAM/bin/DRAM-setup.py", line 158, in args.func(**args_dict) File "/home/cloudam/.conda/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_processing.py", line 296, in prepare_databases pfam_hmm_dat = download_pfam_descriptions(output_dir, verbose=verbose) File "/home/cloudam/.conda/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_processing.py", line 121, in download_pfam_descriptions download_file('ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.dat.gz', pfam_hmm_dat, File "/home/cloudam/.conda/envs/DRAM/lib/python3.10/site-packages/mag_annotator/utils.py", line 16, in download_file run_process(['wget', '-O', output_file, url], verbose=verbose) File "/home/cloudam/.conda/envs/DRAM/lib/python3.10/site-packages/mag_annotator/utils.py", line 27, in run_process return subprocess.run(command, check=check, shell=shell, stdout=subprocess.PIPE, File "/home/cloudam/.conda/envs/DRAM/lib/python3.10/subprocess.py", line 524, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['wget', '-O', '/home/cloudam/envs/DRAM/DRAM_data/Pfam-A.hmm.dat.gz', 'ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.dat.gz']' returned non-zero exit status 4.

rmFlynn commented 2 years ago

I assume that the pipeline included the perl script to fix the sql error but this is not related. Did you run the wget command on its own? wget -O /home/cloudam/envs/DRAM/DRAM_data/Pfam-A.hmm.dat.gz ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.dat.gz if it works, then it could be a one-off server error.

TX0814 commented 2 years ago

I have run the wget command on its own, but it didn't work.

$ wget -O /home/cloudam/envs/DRAM/DRAM_data/Pfam-A.hmm.dat.gz ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.dat.gz --2022-07-22 14:59:27-- ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.dat.gz => ‘/home/cloudam/envs/DRAM/DRAM_data/Pfam-A.hmm.dat.gz’ Resolving ftp.ebi.ac.uk (ftp.ebi.ac.uk)... 193.62.193.138 Connecting to ftp.ebi.ac.uk (ftp.ebi.ac.uk)|193.62.193.138|:21... failed: Connection timed out. Retrying. --2022-07-22 15:01:35-- ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.dat.gz (try: 2) => ‘/home/cloudam/envs/DRAM/DRAM_data/Pfam-A.hmm.dat.gz’ Connecting to ftp.ebi.ac.uk (ftp.ebi.ac.uk)|193.62.193.138|:21... failed: Connection timed out. Retrying. --2022-07-22 15:03:44-- ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.dat.gz (try: 3) => ‘/home/cloudam/envs/DRAM/DRAM_data/Pfam-A.hmm.dat.gz’ Connecting to ftp.ebi.ac.uk (ftp.ebi.ac.uk)|193.62.193.138|:21... failed: Connection timed out. Retrying. --2022-07-22 15:05:55-- ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.dat.gz (try: 4) => ‘/home/cloudam/envs/DRAM/DRAM_data/Pfam-A.hmm.dat.gz’ Connecting to ftp.ebi.ac.uk (ftp.ebi.ac.uk)|193.62.193.138|:21... failed: Connection timed out. Retrying. ......

rmFlynn commented 2 years ago

I don't know why you can't access the pfam server from your own, but I think the next step is to find another way to get pfam perhaps you can download it from your browser with http://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/. Get the Pfam-A.hmm.dat.gz and Pfam-A.full.gz put them in the dir where you are running your dram setup command and run it like this DRAM-setup.py prepare_databases --output_dir /home/cloudam/envs/DRAM/DRAM_data --pfam_loc Pfam-A.full.gz --pfam_hmm_dat Pfam-A.hmm.dat.gz

TX0814 commented 2 years ago

Hi Mike, Another error occurs as below:

$ DRAM-setup.py prepare_databases --output_dir /home/cloudam/envs/DRAM/DRAM_data/D --pfam_loc Pfam-A.full.gz --pfam_hmm_dat Pfam-A.hmm.dat.gz 2022-07-25 13:52:12.591614: Database preparation started Downloading dbCAN family activities from : https://bcb.unl.edu/dbCAN2/download/Databases/V10/CAZyDB.07292021.fam-activities.txt Downloading dbCAN from: http://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V10.txt 2:39:12.211926: dbCAN database processed 16:58:53.587779: UniRef database processed 17:53:47.701440: PFAM database processed Traceback (most recent call last): File "/home/cloudam/.conda/envs/DRAM/bin/DRAM-setup.py", line 158, in args.func(**args_dict) File "/home/cloudam/.conda/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_processing.py", line 317, in prepare_databases output_dbs['viral_db_loc'] = download_and_process_viral_refseq(viral_loc, temporary, threads=threads, File "/home/cloudam/.conda/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_processing.py", line 163, in download_and_process_viral_refseq download_file(refseq_url, refseq_faa, verbose=verbose) File "/home/cloudam/.conda/envs/DRAM/lib/python3.10/site-packages/mag_annotator/utils.py", line 16, in download_file run_process(['wget', '-O', output_file, url], verbose=verbose) File "/home/cloudam/.conda/envs/DRAM/lib/python3.10/site-packages/mag_annotator/utils.py", line 27, in run_process return subprocess.run(command, check=check, shell=shell, stdout=subprocess.PIPE, File "/home/cloudam/.conda/envs/DRAM/lib/python3.10/subprocess.py", line 524, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['wget', '-O', '/home/cloudam/envs/DRAM/DRAM_data/D/database_files/viral.1.protein.faa.gz', 'ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.1.protein.faa.gz']' returned non-zero exit status 4.

Should I download all files from my browser and run dram setup command? If so, what files and commands are required?

rmFlynn commented 2 years ago

It looks like it only fails on ftp connections, you should talk to your admin and tell about this problem, they should allow you to download with ftp. If it is not your admin, it may be the nation state you are in, as some have been known to block ftp traffic but this is unlikely. In what ever case the DBs you may need to download are: pfam, from: 'ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.dat.gz' kofam, from: ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz kofam_ko_list, from: ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz pfam from: ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.full.gz viral from: ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.%s.protein.faa.gz use --skip_uniref if you can or get it from: https://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref%s/uniref%s.fasta.gz' this may work because it has a html front end. peptidase: ftp://ftp.ebi.ac.uk/pub/databases/merops/current_release/pepunit.lib

DRAM-setup.py prepare_databases --help will give you all the arguments you need for these.