WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
249 stars 52 forks source link

MMSeqs Write Issue #187

Closed joshuakirsch closed 2 years ago

joshuakirsch commented 2 years ago

Hi I'm trying to use DRAM on our supercomputer which should have enough memory but I keep getting an error with installing the uniref database. Here is the last bit of output from the command DRAM-setup.py prepare_databases --output_dir DRAM_data --threads 15 --verbose

2022-07-12 09:59:09 (1.19 MB/s) - ‘DRAM_data/CAZyDB.07292021.fam-activities.txt’ saved [68035/68035]

downloading ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.dat.gz --2022-07-12 09:59:09-- ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.dat.gz => ‘DRAM_data/Pfam-A.hmm.dat.gz’ Resolving ftp.ebi.ac.uk (ftp.ebi.ac.uk)... 193.62.193.138 Connecting to ftp.ebi.ac.uk (ftp.ebi.ac.uk)|193.62.193.138|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/databases/Pfam/current_release ... done. ==> SIZE Pfam-A.hmm.dat.gz ... 514890 ==> PASV ... done. ==> RETR Pfam-A.hmm.dat.gz ... done. Length: 514890 (503K) (unauthoritative)

Pfam-A.hmm.dat.gz 100%[=============================================================================================================>] 502.82K 832KB/s in 0.6s

2022-07-12 09:59:11 (832 KB/s) - ‘DRAM_data/Pfam-A.hmm.dat.gz’ saved [514890]

Downloading dbCAN from: http://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V10.txt downloading http://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V10.txt --2022-07-12 09:59:11-- http://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V10.txt Resolving bcb.unl.edu (bcb.unl.edu)... 129.93.162.49 Connecting to bcb.unl.edu (bcb.unl.edu)|129.93.162.49|:80... connected. HTTP request sent, awaiting response... 302 Found Location: https://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V10.txt [following] --2022-07-12 09:59:11-- https://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V10.txt Connecting to bcb.unl.edu (bcb.unl.edu)|129.93.162.49|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 100147232 (96M) [text/plain] Saving to: ‘./dbCAN-HMMdb-V10.txt’

./dbCAN-HMMdb-V10.txt 100%[=============================================================================================================>] 95.51M 86.4MB/s in 1.1s

2022-07-12 09:59:12 (86.4 MB/s) - ‘./dbCAN-HMMdb-V10.txt’ saved [100147232/100147232]

0:00:07.134574: dbCAN database processed downloading https://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref90/uniref90.fasta.gz --2022-07-12 09:59:15-- https://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref90/uniref90.fasta.gz Resolving ftp.uniprot.org (ftp.uniprot.org)... 128.175.240.195 Connecting to ftp.uniprot.org (ftp.uniprot.org)|128.175.240.195|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 35679552148 (33G) [application/x-gzip] Saving to: ‘DRAM_data/database_files/uniref90.fasta.gz’

DRAM_data/database_files/uniref90.fasta.gz 100%[=============================================================================================================>] 33.23G 62.0MB/s in 9m 28s

2022-07-12 10:08:44 (59.9 MB/s) - ‘DRAM_data/database_files/uniref90.fasta.gz’ saved [35679552148/35679552148]

Can not write to data file DRAM_data/database_files/uniref90.20220712.mmsdb.14 Traceback (most recent call last): File "/opt/miniconda3/envs/DRAM/bin/DRAM-setup.py", line 158, in args.func(**args_dict) File "/opt/miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_processing.py", line 311, in prepare_databases output_dbs['uniref_db_loc'] = download_and_process_uniref(uniref_loc, temporary, uniref_version=uniref_version, File "/opt/miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_processing.py", line 94, in download_and_process_uniref make_mmseqs_db(uniref_fasta_zipped, uniref_mmseqs_db, create_index=True, threads=threads, verbose=verbose) File "/opt/miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/utils.py", line 35, in make_mmseqs_db run_process(['mmseqs', 'createdb', fasta_loc, output_loc], verbose=verbose) File "/opt/miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/utils.py", line 27, in run_process return subprocess.run(command, check=check, shell=shell, stdout=subprocess.PIPE, File "/opt/miniconda3/envs/DRAM/lib/python3.10/subprocess.py", line 524, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['mmseqs', 'createdb', 'DRAM_data/database_files/uniref90.fasta.gz', 'DRAM_data/database_files/uniref90.20220712.mmsdb']' returned non-zero exit status 1.

joshuakirsch commented 2 years ago

I should mention our computer has 7.7 terabytes of free disk space and approx. 510 Gb of RAM

rmFlynn commented 2 years ago

Yep, that makes no sense, my best guess is that somehow your Uniref database was compromised, does this happen if you run setup again? If you have not yet, try running mmseqs createdb DRAM_data/database_files/uniref90.fasta.gz DRAM_data/database_files/uniref90.20220712.mmsdb If that does not work and re-downloading the database fails, you may need to open an issue with mmseqs, but that is unlikely.