Closed cmkobel closed 1 year ago
When I run DRAM on a simple E. faecium strain 116 isolate genome, I get the following KeyError when DRAM processes peptidase hit descriptions.
0:13:54.990374: Getting descriptions of hits from peptidase
/cluster/work/users/cmkobel/miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_handler.py:81: UserWarning: No descriptions were found for your id's. Does this MER0389353 look like an id from peptidase_description
warnings.warn("No descriptions were found for your id's. Does this %s look like an id from %s" % (list(ids)[0],
Traceback (most recent call last):
File "/cluster/work/users/cmkobel/miniconda3/envs/DRAM/bin/DRAM.py", line 189, in <module>
args.func(**args_dict)
File "/cluster/work/users/cmkobel/miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/annotate_bins.py", line 1040, in annotate_bins_cmd
annotate_bins(list(set(fasta_locs)), output_dir, min_contig_size, prodigal_mode, trans_table, bit_score_threshold,
File "/cluster/work/users/cmkobel/miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/annotate_bins.py", line 1079, in annotate_bins
all_annotations = annotate_fastas(fasta_locs, output_dir, db_handler, min_contig_size, prodigal_mode, trans_table,
File "/cluster/work/users/cmkobel/miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/annotate_bins.py", line 1013, in annotate_fastas
annotate_fasta(fasta_loc, fasta_name, fasta_dir, db_handler, min_contig_size, prodigal_mode, trans_table,
File "/cluster/work/users/cmkobel/miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/annotate_bins.py", line 921, in annotate_fasta
annotations = annotate_orfs(gene_faa, db_handler, tmp_dir, start_time, custom_db_locs, custom_hmm_locs,
File "/cluster/work/users/cmkobel/miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/annotate_bins.py", line 821, in annotate_orfs
annotation_list.append(do_blast_style_search(query_db, db_handler.db_locs['peptidase'], tmp_dir,
File "/cluster/work/users/cmkobel/miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/annotate_bins.py", line 684, in do_blast_style_search
hits = formater(hits, header_dict)
File "/cluster/work/users/cmkobel/miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/annotate_bins.py", line 187, in get_peptidase_description
header = header_dict[peptidase_hit]
KeyError: 'MER0389353'
Though, I'm not sure whether this is related to the initial mag_annotator issue.
Sometimes the setup process exits early during the update descriptions step DRAM-setup.py update_description_db
will complete the process. It happening on both systems is odd and something I will look into.
I'm having trouble allocating enough ram to run update_description
. Is it correct that more than 500 GB is needed?
Regarding the machines:
Both systems were CentOS Red Hat with GCC 4.8.5-44:
cmkobel@fe-open-01:~$ cat /proc/version
Linux version 3.10.0-1160.53.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)
cmkobel@login-5 ~ $ cat /proc/version
Linux version 3.10.0-1160.62.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)
So that might be a confounding factor.
Yes it's possible, it's a big problem that takes way too much memory. You might want to skip uniref it gets bigger all the time. Instructions are in the readme to do so, possibly there has been a increase in size. I may need to look into it, there's been a lot of problems like this lately
OK. The machines (head nodes) are both limited to some 377GB ram. I wonder what algorithm update_description uses? Does it really need to load the full uniref into ram at once, or could we make an implementation that works on subset chunks instead?
Probably not, it's just putting it into an SQL database, I think it's probably needless. Although, it could be a quirk of the format of MMseqs files it's been on my to-do list forever. I think it's not going to even get done this month or next.
hello, can you help me to solve this problem? I have downloaded the dram-data in the /scratch/PI/boqianpy/App/DRAM_data/, but I don't have the old dram. which command I should use to setup?
So you want to run DRAM-setup.py prepare_databases
but skip downloading the databases because they are already downloaded? You will need to run DRAM-setup.py prepare_databases --help
to see the arguments and then make a long command pointing to each file with the --<name>_loc
arguments
Here is an example:
DRAM-setup.py prepare_databases --output_dir download_test \
--kegg_loc KEGG_LOC /my/path/database_files/kegg-all-orgs_unique_reheader.pep" //# KEGG protein file, should be a single .pep, please merge all KEGG pep files (default: None)
--threads 30 //# Number of threads to use building mmseqs2 databases (default: 10)
--kofam_hmm_loc /my/path/database_files/kofam_profiles.tar.gz //# hmm file for KOfam (profiles.tar.gz) (default: None)
--kofam_ko_list_loc /my/path/database_files/kofam_ko_list.tsv.gz //# KOfam ko list file (ko_list.gz) (default: None)
--uniref_loc /my/path/database_files/uniref90.fasta.gz //# File path to uniref, if already downloaded (uniref90.fasta.gz) (default: None)
--pfam_loc /my/path/database_files/Pfam-A.full.gz //# File path to pfam-A full file, if already downloaded (Pfam-A.full.gz) (default: None)
--pfam_hmm_dat /my/path/Pfam-A.hmm.dat.gz //# pfam hmm .dat file to get PF descriptions, if already downloaded (Pfam-A.hmm.dat.gz) (default: None)
--dbcan_loc /my/path/database_files/CAMPER_v1.0.0-beta.1.tar.gz //# File path to dbCAN, if already downloaded (dbCAN-HMMdb-V9.txt) (default: None)
--dbcan_fam_activities /my/path/CAZyDB.07292021.fam-activities.txt //# CAZY family activities file, if already downloaded (CAZyDB.07302020.fam-activities.txt) (default: None)
--dbcan_sub_fam_activities /my/path/CAZyDB.07292021.fam.subfam.ec.txt //# CAZY subfamily activities file, if already downloaded (CAZyDB.07292021.fam.subfam.ec.txt) (default: None)
--vogdb_loc /my/path/database_files/vog.hmm.tar.gz //# hmm file for vogdb, if already downloaded (vog.hmm.tar.gz) (default: None)
--vog_annotations /my/path/vog_annotations_latest.tsv.gz //# vogdb annotations file, if already downloaded (vog.annotations.tsv.gz) (default: None)
--camper_tar_gz_loc /my/path/database_files/CAMPER_v1.0.0-beta.1.tar.gz //#
--viral_loc /my/path/database_files/viral.merged.protein.faa.gz //# File path to merged viral protein faa, if already downloaded (viral.x.protein.faa.gz) (default: None)
--peptidase_loc /my/path/database_files/merops_peptidases_nr.faa //# File path to MEROPS peptidase fasta, if already downloaded (pepunit.lib) (default: None)
--genome_summary_form_loc /my/path/database_files/genome_summary_form.20220504.tsv //# File path to genome summary form,if already downloaded (default: None)
--module_step_form_loc /my/path/database_files/module_step_form.20220504.tsv //# File path to module step form, ifalready downloaded (default: None)
--etc_module_database_loc /my/path/database_files/etc_mdoule_database.20220504.tsv //# File path to etc module database, if already downloaded (default: None)
--function_heatmap_form_loc /my/path/database_files/function_heatmap_form.20220504.tsv //# File path to function heatmap form, if already downloaded (default: None)
--amg_database_loc /my/path/database_files/amg_database.20220504.tsv # File path to amg database, if already downloaded (default: None)
thank you for your answers, I finished the setup.
--
DRAM-setup.py prepare_databases --output_dir DRAMdata --kegg_loc /Users/DRAMdatabases/kegg_all.pep --threads 30 /Users/opt/anaconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_handler.py:103: UserWarning: Database does not exist at path None warnings.warn('Database does not exist at path %s' % description_loc)
How do I resolve this issue?
Have you upgraded to the latest DRAM? I ask because I thoght this was fixed, let me know what the output of DRAM-setup.py print_config
and DRAM-setup.py version
On second thoght @SF-Dragon, the output you gave only contains a warning, where are the errors? This warning should have been expected, and should not have stopped the setup.
Thank you very much for your help. I used the lates version of DRAM ver.1.4.0. The setup.py have been stopped at downloading vogdb so I manually downloaded it.
Then I used -loc option to skip downloading already existing databases, but I was not able to finish the setup with following messages. The output of print_config is attached. setup.config1.txt
DRAM-setup.py prepare_databases --output_dir DRAMdata --kegg_loc /Users/DRAMdatabases/kegg_all.pep --threads 30 \ --kofam_hmm_loc /Users/DRAMdatabases/kofam_profiles.tar.gz \ --kofam_ko_list_loc /Users/DRAMdatabases/kofam_ko_list.tsv.gz \ --uniref_loc /Users/DRAMdatabases/uniref90.fasta.gz \ --pfam_loc /Users/DRAMdatabases/Pfam-A.full.gz \ --pfam_hmm_loc /Users/DRAMdatabases/Pfam-A.hmm.dat.gz \ --dbcan_loc /Users/DRAMdatabases/dbCAN-HMMdb-V11.txt \ --dbcan_fam_activities /Users/DRAMdatabases/CAZyDB.08062022.fam-activities.txt \ --vogdb_loc /Users/DRAMdatabases/vog.hmm.tar.gz \ --vog_annotations /Users/DRAMdatabases/vog.annotations.tsv.gz /Users/opt/anaconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_handler.py:103: UserWarning: Database does not exist at path None warnings.warn('Database does not exist at path %s' % description_loc) 2022-12-02 10:11:16,885 - Starting the process of downloading data Traceback (most recent call last): File "/Users/opt/anaconda3/envs/DRAM/bin/DRAM-setup.py", line 184, in <module> args.func(**args_dict) File "/Users/opt/anaconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_processing.py", line 540, in prepare_databases raise ValueError(f"The fallowing user provided paths don't seem to exist: {missing_user_inputs}") ValueError: The fallowing user provided paths don't seem to exist: ['kegg', 'kofam_hmm', 'kofam_ko_list', 'uniref', 'pfam', 'pfam_hmm', 'dbcan', 'vogdb']
Thanks for all the details you provided I was able to find a rather obnoxious bug that has now been fixed. I haven't been able to check conda today but it should have gone through their database and you should simply be able to update dram. You will probably still get the warning but not the error and everything should be working, let me know if you have more problems and I'll address them quickly.
It works now! Thank you very much.
Hello
I just installed DRAM with conda on a fresh miniconda3 install on two independent HPC's
After running the
DRAM-setup.py prepare_databases --output_dir DRAM_data
step, I get the following error on both systems.I'm not sure how to fix this error as the all databases seem to exist when checking with
DRAM-setup.py print_config
Here is the full log of
prepare_databases
andprint_config
log.txtHow do I go about fixing this?