WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
246 stars 52 forks source link

Issues while updating databases using DRAM-setup.py set_database_locations #152

Closed avishekdutta14 closed 1 year ago

avishekdutta14 commented 2 years ago

Hi,

Thank you for this wonderful tool.

I am facing an issue while updating description db using DRAM-setup.py set_database_locations --update_description_db.

After installing DRAM, everything was working fine except for a warning which said that a particular gene Id (database identifier) was missing from the description. I followed the solution (https://github.com/WrightonLabCSU/DRAM/issues/86), but the problem persisted. I felt that the database was not downloaded correctly. So, I removed the database and again prepared it using the following syntax:DRAM-setup.py prepare_databases --output_dir DRAM_data --skip_uniref. I also re-installed the DRAM-bio. Since the names during DRAM-setup.py print_config were not matching with the new database and the DRAM distillation forms, I updated the PATH using: DRAM-setup.py set_database_locations.

Everything looked fine except for KEGG db: and UniRef db: which was showing some database file names which were not present. I tried updating the description db using DRAM-setup.py set_database_locations --update_description_db but it returned error: No such file or directory: 'filename_h'. This error was related to the KEGG database. Since I did not have the KEGG database compiled for DRAM, I think I was receiving this error. Then I modified DRAM_CONFIG manually by replacing the filename for KEEG and UniRef with "None", and then imported the CONFIG, but still, the error persists: FileNotFoundError: [Errno 2] No such file or directory: 'None_h'.

I think that this error is due to the unavailability of the KEGG database. The same will be true for UniRef. I think that there is something that I am missing since I was able to run and update the description without KEGG and UniRef database before. Is there any location that I should use for DRAM-setup.py set_database_locations --kegg_db_loc & --uniref_db_loc to let DRAM know that I am not using the KEGG and UniRef databases?

It will be really helpful if you can guide me with this.

Thanks in advance.

rmFlynn commented 2 years ago

The solution may be as simple at DRAM-setup.py export_config --output_file fix_me.txt then open fix_me.text find "kofam": "<some path>/None_h" and "uniref": "<some path>/None_h" make them "kofam": null and "uniref": null then save the file and import DRAM-setup.py import_config --config_loc fix_me.txt This is fully on me and I apologize.

rmFlynn commented 2 years ago

Let me know if you have problems after that, and I will address them!

avishekdutta14 commented 2 years ago

Thank you for your response. I tried that but while DRAM-setup.py print_config & DRAM-setup.py set_database_locations --update_description_db it is giving the following error:

Traceback (most recent call last): File "/home/user/.local/bin/DRAM-setup.py", line 158, in <module> args.func(**args_dict) File "/home/user/.local/lib/python3.6/site-packages/mag_annotator/database_handler.py", line 323, in set_database_paths db_handler = DatabaseHandler() File "/home/user/.local/lib/python3.6/site-packages/mag_annotator/database_handler.py", line 40, in __init__ config = json.loads(open(config_loc).read()) File "/usr/lib64/python3.6/json/__init__.py", line 354, in loads return _default_decoder.decode(s) File "/usr/lib64/python3.6/json/decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib64/python3.6/json/decoder.py", line 357, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 10 (char 9)

rmFlynn commented 2 years ago

Sorry tipo in the commands above, when you use DRAM-setup.py export_config --output_file fix_me.txt what are the contents of the fix_me.txt file?

avishekdutta14 commented 2 years ago

{"kegg": None, "kofam": "/home/user/wmg_tools/annotation/DRAM_data/kofam_profiles.hmm", "kofam_ko_list": "/home/user/wmg_tools/annotation/DRAM_data/kofam_ko_list.tsv", "uniref": None, "pfam": "/home/user/wmg_tools/annotation/DRAM_data/pfam.mmspro", "dbcan": "/home/user/wmg_tools/annotation/DRAM_data/dbCAN-HMMdb-V10.txt", "viral": "/home/user/wmg_tools/annotation/DRAM_data/refseq_viral.20220221.mmsdb", "peptidase": "/home/user/wmg_tools/annotation/DRAM_data/peptidases.20220221.mmsdb", "vogdb": "/home/user/wmg_tools/annotation/DRAM_data/vog_latest_hmms.txt", "pfam_hmm_dat": "/home/user/wmg_tools/annotation/DRAM_data/Pfam-A.hmm.dat.gz", "dbcan_fam_activities": "/home/user/wmg_tools/annotation/DRAM_data/CAZyDB.07292021.fam-activities.txt", "vog_annotations": "/home/user/wmg_tools/annotation/DRAM_data/vog_annotations_latest.tsv.gz", "genome_summary_form": "/home/user/wmg_tools/annotation/DRAM_data/genome_summary_form.20220221.tsv", "module_step_form": "/home/user/wmg_tools/annotation/DRAM_data/module_step_form.20220221.tsv", "etc_module_database": "/home/user/wmg_tools/annotation/DRAM_data/etc_mdoule_database.20220221.tsv", "function_heatmap_form": "/home/user/wmg_tools/annotation/DRAM_data/function_heatmap_form.20220221.tsv", "amg_database": "/home/user/wmg_tools/annotation/DRAM_data/amg_database.20220221.tsv", "description_db": "/home/user/wmg_tools/annotation/DRAM_data/description_db.sqlite"}

rmFlynn commented 2 years ago

Try replacing it with this {"kegg": null, "kofam": "/home/user/wmg_tools/annotation/DRAM_data/kofam_profiles.hmm", "kofam_ko_list": "/home/user/wmg_tools/annotation/DRAM_data/kofam_ko_list.tsv", "uniref": null, "pfam": "/home/user/wmg_tools/annotation/DRAM_data/pfam.mmspro", "dbcan": "/home/user/wmg_tools/annotation/DRAM_data/dbCAN-HMMdb-V10.txt", "viral": "/home/user/wmg_tools/annotation/DRAM_data/refseq_viral.20220221.mmsdb", "peptidase": "/home/user/wmg_tools/annotation/DRAM_data/peptidases.20220221.mmsdb", "vogdb": "/home/user/wmg_tools/annotation/DRAM_data/vog_latest_hmms.txt", "pfam_hmm_dat": "/home/user/wmg_tools/annotation/DRAM_data/Pfam-A.hmm.dat.gz", "dbcan_fam_activities": "/home/user/wmg_tools/annotation/DRAM_data/CAZyDB.07292021.fam-activities.txt", "vog_annotations": "/home/user/wmg_tools/annotation/DRAM_data/vog_annotations_latest.tsv.gz", "genome_summary_form": "/home/user/wmg_tools/annotation/DRAM_data/genome_summary_form.20220221.tsv", "module_step_form": "/home/user/wmg_tools/annotation/DRAM_data/module_step_form.20220221.tsv", "etc_module_database": "/home/user/wmg_tools/annotation/DRAM_data/etc_mdoule_database.20220221.tsv", "function_heatmap_form": "/home/user/wmg_tools/annotation/DRAM_data/function_heatmap_form.20220221.tsv", "amg_database": "/home/user/wmg_tools/annotation/DRAM_data/amg_database.20220221.tsv", "description_db": "/home/user/wmg_tools/annotation/DRAM_data/description_db.sqlite"}

rmFlynn commented 2 years ago

It can't be None it must be null, it looks like a python dict, but it is not.

avishekdutta14 commented 2 years ago

Thanks a lot! null worked. I am not getting any error for DRAM-setup.py print_config and DRAM-setup.py set_database_locations --update_description_db.

avishekdutta14 commented 2 years ago

Another question, the primary reason for re-downloading and updating the database was the following warning

/home/user/.local/lib/python3.6/site-packages/mag_annotator/database_handler.py:81: UserWarning: No descriptions were found for your id's. Does this GT2_Glyco_tranf_2_3 look like an id from dbcan_description db_name)) /home/user/.local/lib/python3.6/site-packages/mag_annotator/database_handler.py:81: UserWarning: No descriptions were found for your id's. Does this GT2_Glyco_tranf_2_4 look like an id from dbcan_description db_name)) This is still persisting. Is there a way to fix this?

rmFlynn commented 2 years ago

Ok that is actually a different problem that I only just now tracked down. dbCAN has changed its naming, and DRAM will need to catch up. I will put out a point release to fix this issue, unfortunately you can't fix it unless you want to hack the hmm and replace all instances of GT2_* with GT2. I think I can get the fix out in the next 24h.

avishekdutta14 commented 2 years ago

Thanks a lot!

icemduru commented 2 years ago

I had the same problem. exporting the config file and fixing it with a text editor solved the problem. thanks.

icemduru commented 2 years ago

Hi,

I got the similar error:

UserWarning: No descriptions were found for your id's. Does this MER0232783 look like an id from peptidase_d escription

rmFlynn commented 1 year ago

Well the solution as always is DRAM-setup.py set_database_locations --update_description_db sorry we missed this.

rmFlynn commented 1 year ago

I closed this because you can now specify the config file you want to use.