Closed yugen-miyahara closed 1 year ago
Hmm, it looks like you can't get the files because of ftp problems, that is common enough. Although, a password prompt is unheard of. Something is blocking you from within your network, perhaps? In any case, I will just attach the files you need: viral.4.protein.faa.gz ko_list.gz .
Thank you!
As in #issue189 I used the "DRAM-setup.py prepare_databases --help" and tried to setup the databases again. But I have got this error.
(DRAM) yugenuni@iMac-4 Yugen_HD % DRAM-setup.py prepare_databases --output_dir /Volumes/Yugen_HD/DRAM_data --pfam_loc /Volumes/Yugen_HD/DRAM_data/Pfam-A.full.gz --pfam_hmm_dat /Volumes/Yugen_HD/DRAM_data/Pfam-A.hmm.dat.gz --kofam_ko_list_loc /Volumes/Yugen_HD/DRAM_data/ko_list.gz --peptidase_loc /Volumes/Yugen_HD/DRAM_data/pepunit.lib --kofam_hmm_loc /Volumes/Yugen_HD/DRAM_data/profiles.tar.gz --uniref_loc /Volumes/Yugen_HD/DRAM_data/database_files/uniref90.fasta.gz --dbcan_loc /Volumes/Yugen_HD/DRAM_data/dbCAN-HMMdb-V10.txt --dbcan_fam_activities /Volumes/Yugen_HD/DRAM_data/CAZyDB.07292021.fam-activities.txt --dbcan_version 10 --viral_loc /Volumes/Yugen_HD/DRAM_data/database_files/viral.4.protein.faa.gz
2022-08-06 14:50:27.742421: Database preparation started
Traceback (most recent call last):
File "/Volumes/Yugen_HD/envs/DRAM/bin/DRAM-setup.py", line 158, in
Do I need to set up the databases again like this or just import a text file with the locations of the databases? Something I have just tried is putting the two database files that weren't able to be downloaded into a separate folder and specifying their database location when setting up the databases so I don't get the error that database_files exists? Just waititng for the rest of the databases to be downloaded.
This is my DRAM-setup.py print_config: /Volumes/Yugen_HD/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_handler.py:51: UserWarning: Database does not exist at path None warnings.warn('Database does not exist at path %s' % self.description_loc) Processed search databases KEGG db: None KOfam db: None KOfam KO list: None UniRef db: None Pfam db: None dbCAN db: None RefSeq Viral db: None MEROPS peptidase db: None VOGDB db: None
You just need to delete or move the old output file from your failed setup
/Volumes/Yugen_HD/DRAM_data
DRAM will not overwrite an existing folder or file
I have another error where viral.1.protein.faa.gz couldn't be downloaded properly.
(DRAM) yugenuni@iMac-4 ~ % DRAM-setup.py prepare_databases --verbose --keep_database_files --output_dir /Volumes/Yugen_HD/DRAM_data --uniref_loc /Volumes/Yugen_HD/DRAM_data/uniref90.fasta.gz --pfam_loc /Volumes/Yugen_HD/DRAM_data/Pfam-A.full.gz --pfam_hmm_dat /Volumes/Yugen_HD/DRAM_data/Pfam-A.hmm.dat.gz --kofam_hmm_loc /Volumes/Yugen_HD/DRAM_data/profiles.tar.gz --dbcan_loc /Volumes/Yugen_HD/DRAM_data/dbCAN-HMMdb-V10.txt --dbcan_fam_activities /Volumes/Yugen_HD/DRAM_data/CAZyDB.07292021.fam-activities.txt --vogdb_loc /Volumes/Yugen_HD/DRAM_data/vog.hmm.tar.gz --vog_annotations /Volumes/Yugen_HD/DRAM_data/vog.annotations.tsv.gz --peptidase_loc /Volumes/Yugen_HD/DRAM_data/pepunit.lib --genome_summary_form_loc /Volumes/Yugen_HD/DRAM_data/genome_summary_form.tsv --module_step_form_loc /Volumes/Yugen_HD/DRAM_data/module_step_form.tsv --etc_module_database_loc /Volumes/Yugen_HD/DRAM_data/etc_module_database.tsv --function_heatmap_form_loc /Volumes/Yugen_HD/DRAM_data/function_heatmap_form.tsv
2022-08-08 11:41:04.520782: Database preparation started
0:00:33.625230: dbCAN database processed
6:27:27.480869: UniRef database processed
16:51:31.044183: PFAM database processed
downloading ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.1.protein.faa.gz
dyld[44001]: missing symbol called
Traceback (most recent call last):
File "/Volumes/Yugen_HD/envs/DRAM/bin/DRAM-setup.py", line 158, in
I already have the viral.1.protein.faa.gz file downloaded, so I just specified where it is now with --viral_loc and now deleted my "database_files" folder and now have to download the 700GB again. Is there a way to specify where all the files in the database_files folder are rather than deleting the folder and having to download another 700GB again? You did mention above we have to delete or move the output files. Seems like there are no commands in the "prepare_databases -h" to specify any of these files. Or is this done with set_database_locations?
It should be possible to specify every file in the database folder rather than download them. Sadly, and this is very sad there is no all data file. If -h didn't show you the solution I think you might need to type --help you will end up with one very long command regrettably. But in your situation it might be the only way.
I would consider fixing this honestly but I have two weeks of business trips and some very big projects to do in the meantime. I'm sorry this Will not get as much attention. Tomorrow morning, I'll try to get you an example command with all files specified, I have one sitting around somewhere.
That's no problem. I really appreciate the quick responses.
It got the furthest it's been so far.
/Volumes/Yugen_HD/DRAM_data/ko_list already exists -- do you wish to overwrite (y or n)? n not overwriting 21:49:07.901437: KOfam ko list processed 21:49:07.901548: PFAM hmm dat processed 21:49:07.901562: dbCAN fam activities processed 21:49:07.901598: VOGdb annotations processed downloading https://raw.githubusercontent.com/shafferm/DRAM/master/data/amg_database.tsv --2022-08-10 12:13:14-- https://raw.githubusercontent.com/shafferm/DRAM/master/data/amg_database.tsv Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 21569 (21K) [text/plain] Saving to: ‘/Volumes/Yugen_HD/DRAM_data/database_files/amg_database.20220810.tsv’
/Volumes/Yugen_HD/DRAM_da 100%[=====================================>] 21.06K --.-KB/s in 0s
2022-08-10 12:13:50 (62.3 MB/s) - ‘/Volumes/Yugen_HD/DRAM_data/database_files/amg_database.20220810.tsv’ saved [21569/21569]
21:49:46.581508: DRAM databases and forms downloaded
21:49:47.039261: Files moved to final destination
/Volumes/Yugen_HD/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_handler.py:51: UserWarning: Database does not exist at path None
warnings.warn('Database does not exist at path %s' % self.description_loc)
Traceback (most recent call last):
File "/Volumes/Yugen_HD/envs/DRAM/bin/DRAM-setup.py", line 158, in
I should've clicked yes. I'll have to try again with the prepare_databases. I still only get the same instructions with --help and -h. Is the database_files folder specified by "--keep_database_files"?
I was also wondering when it's preparing the databases in the database_files folder, is it downloading from the internet or extracting from the database files I have already specified?
For the kofam_ko_list.tsv error where it can't find the database file, I tried to do what alisDRI did in Issue#157 but it didn't work.
I figured out I just need to have the database input files in another folder. The link to the "etc_module_database.tsv" file is not working. Is it possible if someone could please upload it here so I can put it in the databases folder?
Cheers, Yugen
I was able to get the file. The database setup has run up to the point but failed as in #133 where uniref databases are incorrect.
error: 21:12:44.261656: Files moved to final destination
/Volumes/Yugen_HD/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_handler.py:51: UserWarning: Database does not exist at path /Volumes/Yugen_HD/DRAM_databases/description_db.sqlite
warnings.warn('Database does not exist at path %s' % self.description_loc)
Traceback (most recent call last):
File "/Volumes/Yugen_HD/envs/DRAM/bin/DRAM-setup.py", line 158, in
I did what you said to do in issue 133 where I changed the database name to the correct date and reimported the config file. I was unsure what I need to do next. After I reimported the config file and tried to run annotate I got an error likely because the databases aren't fully setup, the error is:
0:01:09.443686: Getting forward best hits from viral
Traceback (most recent call last):
File "/Volumes/Yugen_HD/envs/DRAM/bin/DRAM-v.py", line 153, in
I tried to run update_description_databases but I get "zsh: killed" which I read in another issue that it is because I don't have enough RAM. However, my computer doesn't have anymore RAM available. I was wondering how I can fix this
Cheers, Yugen
Given that you have built this database before, you may want to reduce the number of threads and of course skip uniref --skip_uniref
which is the most important way to reduce memory. If that does not work, you may want to try the minimal data set from issue 30. Sadly, at this point, there is no way to get around the memory issue except to have more memory.
I am now using a high capacity storage that has unlimited storage to build the databases. I get to the point where uniref again has the wrong dates and it can't find the correct file. I fixed the date issue by importing the updated config file. I try set_database_locations and update_description_db but I still get zsh killed even though I should have unlimited memory through the HCS?
Hi there,
I originally ran DRAM-v for annotations and then tried to run the distillation. But I only got outputs in 1/3 of the distillation output files and I couldn't get a heatmap. Trying to fix it I tried updating DRAM to the new version but then I lost the databases. After a lot of errors of trying to follow multiple other issues and telling DRAM where my database files are I decided to just try redownloading the databases. Sorry I feel like I'm going to be asking multiple other questions.
Now I have the same problem as in #issue189: DRAM-setup.py prepare_databases --output_dir /Volumes/Yugen_HD/DRAM_data --pfam_loc /Volumes/Yugen_HD/DRAM_data/Pfam-A.full.gz --pfam_hmm_dat /Volumes/Yugen_HD/DRAM_data/Pfam-A.hmm.dat.gz 2022-08-04 12:02:28.699808: Database preparation started Downloading dbCAN family activities from : https://bcb.unl.edu/dbCAN2/download/Databases/V10/CAZyDB.07292021.fam-activities.txt Downloading dbCAN from: http://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V10.txt 0:00:32.365817: dbCAN database processed 7:24:02.188046: UniRef database processed 16:50:05.288329: PFAM database processed Traceback (most recent call last): File "/Volumes/Yugen_HD/envs/DRAM/bin/DRAM-setup.py", line 158, in
args.func(**args_dict)
File "/Volumes/Yugen_HD/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_processing.py", line 317, in prepare_databases
output_dbs['viral_db_loc'] = download_and_process_viral_refseq(viral_loc, temporary, threads=threads,
File "/Volumes/Yugen_HD/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_processing.py", line 163, in download_and_process_viral_refseq
download_file(refseq_url, refseq_faa, verbose=verbose)
File "/Volumes/Yugen_HD/envs/DRAM/lib/python3.10/site-packages/mag_annotator/utils.py", line 16, in download_file
run_process(['wget', '-O', output_file, url], verbose=verbose)
File "/Volumes/Yugen_HD/envs/DRAM/lib/python3.10/site-packages/mag_annotator/utils.py", line 27, in run_process
return subprocess.run(command, check=check, shell=shell, stdout=subprocess.PIPE,
File "/Volumes/Yugen_HD/envs/DRAM/lib/python3.10/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['wget', '-O', '/Volumes/Yugen_HD/DRAM_data/database_files/viral.1.protein.faa.gz', 'ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.1.protein.faa.gz']' died with <Signals.SIGABRT: 6>.
I tried downloading all of the database files as from your links through googlechrome or with the wget command but for the "kofam_ko_list, from: ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz" and "viral from: ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.%s.protein.faa.gz" I require to type in a username and password.
Is there another way to download those files?
Cheers, Yugen