WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
239 stars 50 forks source link

dbcan2 url gone ?? #331

Open EricDeveaud opened 4 months ago

EricDeveaud commented 4 months ago

Hello,

dbcan2 url seems to be out of service... since few days. I alway gto.

wget http://bcb.unl.edu/dbCAN2/download/Databases/dbCAN-HMMdb-V11.txt
--2024-02-19 09:06:12--  http://bcb.unl.edu/dbCAN2/download/Databases/dbCAN-HMMdb-V11.txt
Resolving bcb.unl.edu (bcb.unl.edu)... 129.93.162.49
Connecting to bcb.unl.edu (bcb.unl.edu)|129.93.162.49|:80... failed: Connection timed out.
Retrying.

is there any altertnative to get the DRAM's dbcan2 required files ?

regards

Eric

carleton-envbiotech commented 4 months ago

I ran into this issue as well, not sure if it is on dbCAN's end or a connectivity issue. I tried providing the following argument to a previous version of the dbCAN database I had stored and it gave a slightly different error:

Input code: sudo /datastore/tools/gregoire/envs/DRAM/bin/DRAM-setup.py prepare_databases --output_dir /datastore/researchdata/DRAM_data --dbcan_loc /datastore/researchdata/dram/dbCAN-HMMdb-V11.txt --threads 40

Output error: 2024-02-19 13:11:25,566 - Starting the process of downloading data 2024-02-19 13:11:25,576 - The kegg_loc argument was not used to specify a downloaded kegg file, and dram can not download it its self. So it is assumed that the user wants to set up DRAM without it 2024-02-19 13:11:25,576 - The gene_ko_link_loc argument was not used to specify a downloaded gene_ko_link file, and dram can not download it its self. So it is assumed that the user wants to set up DRAM without it 2024-02-19 13:11:25,576 - Database preparation started 2024-02-19 13:11:25,577 - Downloading kofam_hmm 2024-02-19 13:13:58,359 - Downloading kofam_ko_list 2024-02-19 13:14:01,289 - Downloading uniref 2024-02-19 13:55:20,596 - Downloading pfam 2024-02-19 14:15:57,458 - Downloading pfam_hmm 2024-02-19 14:15:58,914 - Downloading dbcan_fam_activities 2024-02-19 14:15:58,915 - Downloading dbCAN family activities from : https://bcb.unl.edu/dbCAN2/download/Databases/V11/CAZyDB.08062022.fam-activities.txt 2024-02-19 14:18:10,246 - Something went wrong with the download of the url: https://bcb.unl.edu/dbCAN2/download/Databases/V11/CAZyDB.08062022.fam-activities.txt 2024-02-19 14:18:10,260 - <urlopen error [Errno 110] Connection timed out> Traceback (most recent call last): File "/datastore/tools/gregoire/envs/DRAM/bin/DRAM-setup.py", line 187, in <module> args.func(**args_dict) File "/datastore/tools/gregoire/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_processing.py", line 595, in prepare_databases locs[i] = download_functions[i]( File "/datastore/tools/gregoire/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_processing.py", line 178, in download_dbcan_fam_activities download_file(url, dbcan_fam_activities, logger, verbose=verbose) File "/datastore/tools/gregoire/envs/DRAM/lib/python3.10/site-packages/mag_annotator/utils.py", line 33, in download_file raise URLError("DRAM whas not able to download a key database, check the logg for details") urllib.error.URLError: <urlopen error DRAM whas not able to download a key database, check the logg for details>

carleton-envbiotech commented 4 months ago

I tried using the following command where I had the dbCAN databases downloaded from a prior DRAM installation:

sudo /datastore/tools/gregoire/envs/DRAM/bin/DRAM-setup.py prepare_databases --output_dir /datastore/researchdata/DRAM_data --dbcan_loc '/datastore/researchdata/dram/dbCAN-HMMdb-V11.txt' --dbcan_fam_activities '/datastore/researchdata/dram/CAZyDB.08062022.fam-activities.txt' --threads 40

And it got past the original error listed above, but it ended up running into an issue with the hmmpress command when starting to process dbCAN (excerpt below):

2024-02-19 16:23:31,744 - Processing dbcan Traceback (most recent call last): File "/datastore/tools/gregoire/envs/DRAM/bin/DRAM-setup.py", line 187, in <module> args.func(**args_dict) File "/datastore/tools/gregoire/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_processing.py", line 615, in prepare_databases processed_locs = process_functions[i](locs[i], output_dir, LOGGER, File "/datastore/tools/gregoire/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_processing.py", line 350, in process_dbcan run_process(['hmmpress', '-f', output], logger, verbose=verbose) File "/datastore/tools/gregoire/envs/DRAM/lib/python3.10/site-packages/mag_annotator/utils.py", line 61, in run_process results = subprocess.run(command, check=check, shell=shell, File "/datastore/tools/gregoire/envs/DRAM/lib/python3.10/subprocess.py", line 503, in run with Popen(*popenargs, **kwargs) as process: File "/datastore/tools/gregoire/envs/DRAM/lib/python3.10/subprocess.py", line 971, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "/datastore/tools/gregoire/envs/DRAM/lib/python3.10/subprocess.py", line 1863, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'hmmpress'

In following the original link posted by @EricDeveaud above, I get a 404 URL not found, but searching for the download page for dbCAN on the web leads me to a slightly different address that seems to support the download (note the absence of 'download/Databases' in the link): https://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V11.txt

carleton-envbiotech commented 4 months ago

This issue seems to have resolved, but it looks like it is downloading v11 when there is a v12 out.