Closed dgittins closed 1 year ago
Yes, it looks like dbcan is too old and so its descriptions do not contain the necessary sub-family EC numbers. If this is a new database, then we will need to look deeper. The official advice is to rebuild a database, but I will let you in on a secret, you may be able to use DRAM-setup.py prepare_databases --select_db dbcan
to update just dbcan it is worth a try at least. This eventuality was not covered in the release note, and I will fix that now. Sorry for the frustration!
Hi, thanks for the secret. Could you take a look at the following:
DRAM-setup.py prepare_databases --select_db dbcan --output_dir /fs03/rp24/Database/DRAM --threads 2 --dbcan_fam_activities /fs03/rp24/Database/DRAM/CAZyDB.08062022.fam-activities.txt
2023-01-07 16:31:27,352 - Starting the process of downloading data
2023-01-07 16:31:27,353 - The kegg_loc argument was not used to specify a downloaded kegg file, and dram can not download it its self. So it is assumed that the user wants to set up DRAM without it
2023-01-07 16:31:27,353 - The gene_ko_link_loc argument was not used to specify a downloaded gene_ko_link file, and dram can not download it its self. So it is assumed that the user wants to set up DRAM without it
2023-01-07 16:31:27,354 - Database preparation started
2023-01-07 16:31:27,354 - Downloading dbcan
2023-01-07 16:31:36,143 - All raw data files were downloaded successfully
2023-01-07 16:31:36,144 - Processing dbcan
2023-01-07 16:31:38,503 - dbCAN database processed
2023-01-07 16:31:38,513 - Moved dbcan to final destination, configuration updated
2023-01-07 16:31:38,513 - Populating the description db, this may take some time
Traceback (most recent call last):
File "/home/gnii0001/rp24/gaofeng/tools/Miniconda3/envs/DRAM/bin/DRAM-setup.py", line 184, in <module>
args.func(**args_dict)
File "/home/gnii0001/rp24/gaofeng/tools/Miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_processing.py", line 578, in prepare_databases
db_handler.populate_description_db(db_handler.config['description_db'], select_db, update_config=False)
File "/home/gnii0001/rp24/gaofeng/tools/Miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_handler.py", line 505, in populate_description_db
check_db(i, k)
File "/home/gnii0001/rp24/gaofeng/tools/Miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_handler.py", line 465, in check_db
db_function(), f"{db_name}_description", clear_table=True
File "/home/gnii0001/rp24/gaofeng/tools/Miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_handler.py", line 400, in process_dbcan_descriptions
with open(dbcan_fam_activities) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/projects-wrighton-2/DRAM/development_flynn/public_DRAM/sep_12_22_dram1.4_rc_setup_test/testoutput/DRAM1_4_pycallgraph_3/CAZyDB.08062022.fam-activities.txt'
PS. I first tried without --dbcan_fam_activities
, it returns the same error.
so sorry I let my test environment get committed so some paths got into your config simply edit your config to replace them with null
or post the output of DRAM-setup.py export_config here and i will fix it really fast. I already did this once but now i have fixed it so it is impossible. Sorry.
What I did was importing an existing CONFIG using DRAM-setup.py import_config --config_loc /fs03/rp24/Database/DRAM/CONFIG
, and DRAM was installed through git clone
, followed by pip3 install
, version 1.4.4
Now I see the issue, here it is:
{
"search_databases": {
"kegg": "/home/projects-wrighton-2/DRAM/development_flynn/public_DRAM/sep_12_22_dram1.4_rc_setup_test/testoutput/DRAM1_4_pycallgraph_3/kegg.20221012.mmsdb",
"kofam_hmm": "/home/projects-wrighton-2/DRAM/development_flynn/public_DRAM/sep_12_22_dram1.4_rc_setup_test/testoutput/DRAM1_4_pycallgraph_3/kofam_profiles.hmm",
"kofam_ko_list": "/home/projects-wrighton-2/DRAM/development_flynn/public_DRAM/sep_12_22_dram1.4_rc_setup_test/testoutput/DRAM1_4_pycallgraph_3/kofam_ko_list.tsv",
"uniref": "/home/projects-wrighton-2/DRAM/development_flynn/public_DRAM/sep_12_22_dram1.4_rc_setup_test/testoutput/DRAM1_4_pycallgraph_3/uniref90.20220928.mmsdb",
"pfam": "/home/projects-wrighton-2/DRAM/development_flynn/public_DRAM/sep_12_22_dram1.4_rc_setup_test/testoutput/DRAM1_4_pycallgraph_3/pfam.mmspro",
"dbcan": "/fs03/rp24/Database/DRAM/dbCAN-HMMdb-V11.txt",
"viral": "/home/projects-wrighton-2/DRAM/development_flynn/public_DRAM/sep_12_22_dram1.4_rc_setup_test/testoutput/DRAM1_4_pycallgraph_3/refseq_viral.20220928.mmsdb",
"peptidase": "/home/projects-wrighton-2/DRAM/development_flynn/public_DRAM/sep_12_22_dram1.4_rc_setup_test/testoutput/DRAM1_4_pycallgraph_3/peptidases.20220928.mmsdb",
"vogdb": "/home/projects-wrighton-2/DRAM/development_flynn/public_DRAM/sep_12_22_dram1.4_rc_setup_test/testoutput/DRAM1_4_pycallgraph_3/vog_latest_hmms.txt"
},
"database_descriptions": {
"pfam_hmm": "/home/projects-wrighton-2/DRAM/development_flynn/public_DRAM/sep_12_22_dram1.4_rc_setup_test/testoutput/DRAM1_4_pycallgraph_3/Pfam-A.hmm.dat.gz",
"dbcan_fam_activities": "/home/projects-wrighton-2/DRAM/development_flynn/public_DRAM/sep_12_22_dram1.4_rc_setup_test/testoutput/DRAM1_4_pycallgraph_3/CAZyDB.08062022.fam-activities.txt",
"dbcan_subfam_ec": "/home/projects-wrighton-2/DRAM/development_flynn/public_DRAM/sep_12_22_dram1.4_rc_setup_test/testoutput/DRAM1_4_pycallgraph_3/CAZyDB.08062022.fam.subfam.ec.txt",
"vog_annotations": "/home/projects-wrighton-2/DRAM/development_flynn/public_DRAM/sep_12_22_dram1.4_rc_setup_test/testoutput/DRAM1_4_pycallgraph_3/vog_annotations_latest.tsv.gz"
},
"dram_sheets": {
"genome_summary_form": "/home/projects-wrighton-2/DRAM/development_flynn/public_DRAM/sep_12_22_dram1.4_rc_setup_test/testoutput/DRAM1_4_pycallgraph_3/genome_summary_form.20220928.tsv",
"module_step_form": "/home/projects-wrighton-2/DRAM/development_flynn/public_DRAM/sep_12_22_dram1.4_rc_setup_test/testoutput/DRAM1_4_pycallgraph_3/module_step_form.20220928.tsv",
"etc_module_database": "/home/projects-wrighton-2/DRAM/development_flynn/public_DRAM/sep_12_22_dram1.4_rc_setup_test/testoutput/DRAM1_4_pycallgraph_3/etc_mdoule_database.20220928.tsv",
"function_heatmap_form": "/home/projects-wrighton-2/DRAM/development_flynn/public_DRAM/sep_12_22_dram1.4_rc_setup_test/testoutput/DRAM1_4_pycallgraph_3/function_heatmap_form.20220928.tsv",
"amg_database": "/home/projects-wrighton-2/DRAM/development_flynn/public_DRAM/sep_12_22_dram1.4_rc_setup_test/testoutput/DRAM1_4_pycallgraph_3/amg_database.20220928.tsv"
},
"dram_version": "1.4.0rc1",
"description_db": "/fs03/rp24/Database/DRAM/description_db.sqlite",
"setup_info": {
"kegg": {
"name": "KEGG db",
"description_db_updated": "10/12/2022, 18:52:36",
"citation": " M. Kanehisa, M. Furumichi, Y. Sato, M. Ishiguro-Watanabe, and M. Tanabe, \"Kegg: integrating viruses and cellular organisms,\" Nucleic acids research, vol. 49, no. D1, pp. D545\u2013D551, 2021."
},
"kofam_hmm": {
"name": "KOfam db",
"citation": "T. Aramaki, R. Blanc-Mathieu, H. Endo, K. Ohkubo, M. Kanehisa, S. Goto, and H. Ogata, \"Kofamkoala: Kegg ortholog assignment based on profile hmm and adaptive score threshold,\" Bioinformatics, vol. 36, no. 7, pp. 2251\u20132252, 2020.",
"Download time": "09/28/2022, 11:00:09",
"Origin": "Downloaded by DRAM"
},
"kofam_ko_list": {
"name": "KOfam KO list",
"citation": "T. Aramaki, R. Blanc-Mathieu, H. Endo, K. Ohkubo, M. Kanehisa, S. Goto, and H. Ogata, \"Kofamkoala: Kegg ortholog assignment based on profile hmm and adaptive score threshold,\" Bioinformatics, vol. 36, no. 7, pp. 2251\u20132252, 2020.",
"Download time": "09/28/2022, 11:00:11",
"Origin": "Downloaded by DRAM"
},
"uniref": {
"name": "UniRef db",
"description_db_updated": "09/29/2022, 13:14:40",
"citation": "Y. Wang, Q. Wang, H. Huang, W. Huang, Y. Chen, P. B. McGarvey, C. H. Wu, C. N. Arighi, and U. Consortium, \"A crowdsourcing open platform for literature curation in uniprot,\" PLoS Biology, vol. 19, no. 12, p. e3001464, 2021.",
"version": "90",
"Download time": "09/28/2022, 11:15:01",
"Origin": "Downloaded by DRAM"
},
"pfam": {
"name": "Pfam db",
"citation": "J. Mistry, S. Chuguransky, L. Williams, M. Qureshi, G. A. Salazar, E. L. Sonnhammer, S. C. Tosatto, L. Paladin, S. Raj, L. J. Richardson et al., \"Pfam: The protein families database in 2021,\" Nucleic acids research, vol. 49, no. D1, pp. D412\u2013D419, 2021.",
"Download time": "09/28/2022, 11:49:29",
"Origin": "Downloaded by DRAM",
"description_db_updated": "09/29/2022, 13:23:47"
},
"pfam_hmm": {
"name": "Pfam hmm dat",
"description_db_updated": "Unknown, or Never",
"citation": "J. Mistry, S. Chuguransky, L. Williams, M. Qureshi, G. A. Salazar, E. L. Sonnhammer, S. C. Tosatto, L. Paladin, S. Raj, L. J. Richardson et al., \"Pfam: The protein families database in 2021,\" Nucleic acids research, vol. 49, no. D1, pp. D412\u2013D419, 2021.",
"Download time": "09/28/2022, 11:49:31",
"Origin": "Downloaded by DRAM"
},
"dbcan": {
"name": "dbCAN db",
"citation": "Y. Yin, X. Mao, J. Yang, X. Chen, F. Mao, and Y. Xu, \"dbcan: a web resource for automated carbohydrate-active enzyme annotation,\" Nucleic acids research, vol. 40, no. W1, pp. W445\u2013W451, 2012.",
"version": "11",
"Download time": "01/07/2023, 16:31:36",
"Origin": "Downloaded by DRAM"
},
"dbcan_fam_activities": {
"name": "dbCAN family activities",
"citation": "Y. Yin, X. Mao, J. Yang, X. Chen, F. Mao, and Y. Xu, \"dbcan: a web resource for automated carbohydrate-active enzyme annotation,\" Nucleic acids research, vol. 40, no. W1, pp. W445\u2013W451, 2012.",
"version": "11",
"upload_date": "08062022",
"Download time": "09/28/2022, 11:49:33",
"Origin": "Downloaded by DRAM"
},
"dbcan_subfam_ec": {
"name": "dbCAN subfamily EC numbers",
"citation": "Y. Yin, X. Mao, J. Yang, X. Chen, F. Mao, and Y. Xu, \"dbcan: a web resource for automated carbohydrate-active enzyme annotation,\" Nucleic acids research, vol. 40, no. W1, pp. W445\u2013W451, 2012.",
"version": "11",
"upload_date": "08062022",
"Download time": "09/28/2022, 11:49:33",
"Origin": "Downloaded by DRAM"
},
"vogdb": {
"name": "VOGDB db",
"citation": "J. Thannesberger, H.-J. Hellinger, I. Klymiuk, M.-T. Kastner, F. J. Rieder, M. Schneider, S. Fister, T. Lion, K. Kosulin, J. Laengle et al., \"Viruses comprise an extensive pool of mobile genetic elements in eukaryote cell cultures and human clinical samples,\" The FASEB Journal, vol. 31, no. 5, pp. 1987\u20132000, 2017.",
"version": "latest",
"Download time": "09/28/2022, 11:51:57",
"Origin": "Downloaded by DRAM",
"description_db_updated": "09/29/2022, 13:24:14"
},
"vog_annotations": {
"name": "VOG annotations",
"description_db_updated": "Unknown, or Never",
"citation": "J. Thannesberger, H.-J. Hellinger, I. Klymiuk, M.-T. Kastner, F. J. Rieder, M. Schneider, S. Fister, T. Lion, K. Kosulin, J. Laengle et al., \"Viruses comprise an extensive pool of mobile genetic elements in eukaryote cell cultures and human clinical samples,\" The FASEB Journal, vol. 31, no. 5, pp. 1987\u20132000, 2017.",
"version": "latest",
"Download time": "09/28/2022, 11:51:58",
"Origin": "Downloaded by DRAM"
},
"viral": {
"name": "RefSeq Viral db",
"description_db_updated": "09/29/2022, 13:16:15",
"citation": "J. R. Brister, D. Ako-Adjei, Y. Bao, and O. Blinkova, \"Ncbi viral genomes resource,\" Nucleic acids research, vol. 43, no. D1, pp. D571\u2013D577, 2015. [3] M. Kanehisa, M. Furumichi, Y. Sato, M. Ishiguro-Watanabe, and M. Tan-abe, \"Kegg: integrating viruses and cellular organisms,\" Nucleic acids research, vol. 49, no. D1, pp. D545\u2013D551, 2021.",
"viral_files": 2,
"Download time": "09/28/2022, 11:52:20",
"Origin": "Downloaded by DRAM"
},
"peptidase": {
"name": "MEROPS peptidase db",
"description_db_updated": "09/29/2022, 13:23:40",
"citation": "N. D. Rawlings, A. J. Barrett, P. D. Thomas, X. Huang, A. Bateman, and R. D. Finn, \"The merops database of proteolytic enzymes, their substrates and inhibitors in 2017 and a comparison with peptidases in the panther database,\" Nucleic acids research, vol. 46, no. D1, pp. D624\u2013D632, 2018.",
"Download time": "09/28/2022, 12:01:46",
"Origin": "Downloaded by DRAM"
},
"genome_summary_form": {
"name": "Genome summary form",
"branch": "master",
"Download time": "09/28/2022, 12:01:46",
"Origin": "Downloaded by DRAM"
},
"module_step_form": {
"name": "Module step form",
"branch": "master",
"Download time": "09/28/2022, 12:01:47",
"Origin": "Downloaded by DRAM"
},
"function_heatmap_form": {
"name": "Function heatmap form",
"branch": "master",
"Download time": "09/28/2022, 12:01:47",
"Origin": "Downloaded by DRAM"
},
"amg_database": {
"name": "AMG database",
"branch": "master",
"Download time": "09/28/2022, 12:01:47",
"Origin": "Downloaded by DRAM"
},
"etc_module_database": {
"name": "ETC module database",
"branch": "master",
"Download time": "09/28/2022, 12:01:47",
"Origin": "Downloaded by DRAM"
}
},
"log_path": null
}
So looking over this, it seems that the only database that was updated is dbcan and the rest are the defaults from my test environment. So maybe the import failed, or maybe the update was after the import. in any case, I would copy your original environment find dbcan and replace it "dbcan": "/fs03/rp24/Database/DRAM/dbCAN-HMMdb-V11.txt", also download these: CAZyDB.08062022.fam-activities.txt.gz CAZyDB.08062022.fam.subfam.ec.txt.gz unzip them in the same place and replace dbcan_fam_activities and dbcan_subfam_ec with these lines:
"dbcan_fam_activities": "/fs03/rp24/Database/DRAM/CAZyDB.08062022.fam-activities.txt.gz",
"dbcan_subfam_ec":"/fs03/rp24/Database/DRAM/CAZyDB.08062022.fam.subfam.ec.txt"
Course that will only work if you still have your old enviroment if you lost it you will need full setup. Get the empty config wget https://raw.githubusercontent.com/shafferm/DRAM/master/mag_annotator/CONFIG
, import it with DRAM-setup.py import_config --config_loc some/where/CONFIG
and you will need to run setup again. And I am sorry for the complications.
Worked with v1.4.6
, much appreciated!
Hello
I am running
DRAM.py annotate
in DRAM v1.4.3 with 1 TB of memory, but I get the following error:Do you know what is causing this?
Thank you