WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
249 stars 52 forks source link

Databases used in DRAM #308

Closed dcm9123 closed 10 months ago

dcm9123 commented 11 months ago

Hello! Thanks for this amazing program, I love how straightforward it is! I had a couple of questions for you, if that's ok. We downloaded the databases for DRAM in 2021 (see attached db.txt ))). However, when I run the annotation step, I get the following message: 2023-10-10 10:14:11,994 - The log file is created at /bulk/IMCshared_bulk/sycuro_shared_projects/Twist96/novaseq_run/analysis/plate1/unicycler/DRAM_annotation/unicycler_test_S_1/genome_summaries/distill.log 2023-10-10 10:14:12,016 - Note: the fallowing id fields were not in the annotations file and are not being used: ['kegg_genes_id', 'kegg_id', 'camper_id', 'fegenie_id', 'sulfur_id', 'methyl_id'], but these are ['ko_id', 'kegg_hit', 'peptidase_family', 'cazy_best_hit', 'pfam_hits'] 2023-10-10 10:14:12,138 - Retrieved database locations and descriptions 2023-10-10 10:14:12,201 - Calculated genome statistics 2023-10-10 10:14:14,193 - Generated genome metabolism summary /home/daniel.castanedamogo/anaconda3/envs/DRAM/lib/python3.10/site-packages/altair/utils/core.py:317: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. for col_name, dtype in df.dtypes.iteritems(): 2023-10-10 10:14:15,477 - Generated product heatmap and table 2023-10-10 10:14:15,478 - Completed distillation

To me it seems like some databases are not used like Uniref?

Thank you for your time!

rmFlynn commented 10 months ago

This is an interesting thing about distillation and the "Raw". Database to columns in the "Raw" output are not a one to one relation, in the case of uniref its data may be incorporated into other columns like ko_id via EC numbers but not all raw columns are used in the distillate. The raw contains unique information for all annotations but also more useful computed information and that is what the distillate is looking for, it is not about the databases but about the raw. This message is meant for users that frequently work with the raw and may have raw data made by an older version of dram.

Let me know if that makes sense, and do re-open this issue if not.