Closed spencerlong1 closed 1 year ago
Hello,
I am getting the same error, also with DRAM 1.4.5, which was recently made available on conda.
Looking at the ftp address (https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/), there is no viral.2.protein.faa.gz, so it's not surprising DRAM doesn't find it. In case it helps to find out what's going on: All files in that folders are only a few days old (2023-01-13) and one day earlier (so on 12th), I could sucessfully download and prepare the databases, including viral (but with --skip_uniref). Now there is only a viral.1.protein.faa.gz at that ftp address.
Cheers, Nikolas
Hello,
I am getting the same error, also with DRAM 1.4.5, which was recently made available on conda.
Looking at the ftp address (https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/), there is no viral.2.protein.faa.gz, so it's not surprising DRAM doesn't find it. In case it helps to find out what's going on: All files in that folders are only a few days old (2023-01-13) and one day earlier (so on 12th), I could sucessfully download and prepare the databases, including viral (but with --skip_uniref). Now there is only a viral.1.protein.faa.gz at that ftp address.
Cheers, Nikolas
Hi Nikolas,
good find, and I am seeing the same thing. I wonder if viral.2. is no longer needed, and in that case, there is a way to skip it , or alternatively a way to find the old versions. I will play around today.
Cheers, Spencer
Thanks guys looks like we might have to update the path I will get on it.
It looks like the change is real and also that it is here to stay. I am testing a fix to only pull one file now and will make a new point release when it is done. Or more likely, I will have @dmitrisvetlov do it.
Hello, I am new to DRAM. I had the same issue with the conda latest version. How should we skip or solve this? If I run DRAM now it seems it cannot find any database path (although they are in "DRAM_data/database_files", probably because the database download process did not end up correctly because of lacking viral2 files??
First can you post the output of DRAM-setup.py version
?
As in issue #236; you can download the file from https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/
. Put it in a folder on your server and point to it using DRAM-setup.py prepare_databases --viral_loc viral_file.faa.gz
there will only be one file but if in the future there are separate viral files, you can cat them together to make the merged.faa.gz.
Thanks. The output of DRAM-setup.py version: 1.4.5 Allright, I downloaded viral.1.protein.faa.gz file and put it in DRAM_data folder, but when running DRAM-setup.py prepare_databases --viral_loc path_to_viral.1.protein.faa.gz this happens: FileExistsError: [Errno 17] File exists: './database_files' I guess as I already run the DRAM-setup.py prepare_databases --output_dir DRAM_data step, it collides with the already downloaded files...
The latest version of DRAM in conda is 1.4.6 https://anaconda.org/bioconda/dram, you will see in the release notes that that is the point release for single viral files. You may want to upgrade for future stability.
Yes, you must put it in a new location or delete the failed folder. You must set up all the databases at the same time for now; At least if you want to have a reliable set up, that is what you must do.
Allright, but is there any way to use already downloaded files, or I must remove DRAM_data folder and download everything again with the new version installed? thanks
You can use already downloaded files using the -loc_DRAM-setup.py prepare_databases --help
to see the many arguments. Then use a new location for the output. It would be more work than it is worth, in my opinion, the downloading is typically the fast part of the setup process.
I am using DRAM version 1.4.6, and also have the same error: database_handler.py:123: UserWarning: Database does not exist at path None warnings.warn("Database does not exist at path %s" % description_loc)
Hi,
Left DRAM downloading the databases last few days and have run into the following error both times: (which I know is common):
(DRAM) [sdl1u18@cyan51 ~]$ cd ../../scratch/sdl1u18/ (DRAM) [sdl1u18@cyan51 sdl1u18]$ DRAM-setup.py prepare_databases --output_dir ../../scratch/sdl1u18/ /home/sdl1u18/.conda/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_handler.py:123: UserWarning: Database does not exist at path None warnings.warn("Database does not exist at path %s" % description_loc) 2023-01-17 10:14:16,620 - Starting the process of downloading data 2023-01-17 10:14:16,620 - The kegg_loc argument was not used to specify a downloaded kegg file, and dram can not download it its self. So it is assumed that the user wants to set up DRAM without it 2023-01-17 10:14:16,620 - The gene_ko_link_loc argument was not used to specify a downloaded gene_ko_link file, and dram can not download it its self. So it is assumed that the user wants to set up DRAM without it 2023-01-17 10:14:16,620 - Database preparation started 2023-01-17 10:14:16,620 - Downloading kofam_hmm 2023-01-17 10:20:25,663 - Downloading kofam_ko_list 2023-01-17 10:20:30,338 - Downloading uniref 2023-01-17 18:47:01,406 - Downloading pfam 2023-01-17 18:48:11,888 - Downloading pfam_hmm 2023-01-17 18:48:12,088 - Downloading dbcan 2023-01-17 18:48:17,232 - Downloading dbcan_fam_activities 2023-01-17 18:48:17,232 - Downloading dbCAN family activities from : https://bcb.unl.edu/dbCAN2/download/Databases/V11/CAZyDB.08062022.fam-activities.txt 2023-01-17 18:48:17,878 - Downloading dbcan_subfam_ec 2023-01-17 18:48:17,879 - Downloading dbCAN sub-family encumber from : https://bcb.unl.edu/dbCAN2/download/Databases/V11/CAZyDB.08062022.fam.subfam.ec.txt 2023-01-17 18:48:18,887 - Downloading vogdb 2023-01-17 18:48:25,272 - Downloading vog_annotations 2023-01-17 18:48:25,593 - Downloading viral 2023-01-17 18:48:37,411 - Something went wrong with the download of the url: ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.2.protein.faa.gz 2023-01-17 18:48:37,411 - <urlopen error <urlopen error ftp error: error_perm('550 viral.2.protein.faa.gz: No such file or directory')>> 2023-01-17 18:48:37,840 - Something went wrong with the download of the url: https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.2.protein.faa.gz 2023-01-17 18:48:37,840 - HTTP Error 404: Not Found Traceback (most recent call last): File "/home/sdl1u18/.conda/envs/DRAM/bin/DRAM-setup.py", line 184, in
args.func(**args_dict)
File "/home/sdl1u18/.conda/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_processing.py", line 532, in prepare_databases
locs[i] = download_functions[i](
File "/home/sdl1u18/.conda/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_processing.py", line 218, in download_viral
download_file(url, output_name, logger, alt_urls=[url_http], verbose=verbose)
File "/home/sdl1u18/.conda/envs/DRAM/lib/python3.10/site-packages/mag_annotator/utils.py", line 33, in download_file
raise URLError("DRAM whas not able to download a key database, check the logg for details")
urllib.error.URLError: <urlopen error DRAM whas not able to download a key database, check the logg for details>
Looks like the viral.2.protein.faa.gz hasnt downloaded. I see in my database_files that viral.1.protein.faa.gz is present, so I am wondering why this might be? Those that run our HPC dont seem to think it is the firewall (which let through everything else so far), and fttp seems fine if viral.1. has made it through. Was also just wondering how much more is required after this step, as I will just use the database_loc commands for what is already there (assuming uniref and pfam etc seem fine at this stage?)
apologies if basic, I am new to DRAM and annotation software as a whole!
Cheers! Spencer