WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
249 stars 52 forks source link

DRAM-v.py --low_mem_mode breaks distill #286

Open mlhoggard opened 1 year ago

mlhoggard commented 1 year ago

Hi there,

We have DRAM (1.4.6) set up with the full KEGG database, which has been working ok. But I recently wanted to run some viral contigs using kofam instead of full KEGG via DRAM-v.py annotate --low_mem_mode. The annotate step completed, but then DRAM-v.py distill gave the error: KeyError: 'vogdb_categories'

Checking annotation.tsv, vogdb_categories is indeed missing. But a similar run without --low_mem_mode did have vogdb_categories (and that worked with distill).

I tracked down this section in database_handler.py, where the --low_mem_mode settings are applied:

if low_mem_mode:
            if ("kofam_hmm" not in self.config.get("search_databases")) or (
                "kofam_ko_list" not in self.config.get("search_databases")
            ):
                raise ValueError(
                    "To run in low memory mode KOfam must be configured for use in DRAM"
                )
            dbs_to_use = [i for i in dbs_to_use if i not in ("uniref", "kegg", "vogdb")]

So, evidently (based on the last line above) --low_mem_mode always excludes vogdb as well as uniref and kegg (although the help message only notes the latter two), but vogdb is required for DRAM-v's version of distill.

Is it possible to have --low_mem_mode behave differently for DRAM.py and DRAM-v.py, so that the latter doesn't exclude vogdb? Or alternatively, remove vogdb from that --low_mem_mode exclusion list above, but add an additional --exclude_vogdb option seperately?

Cheers! Mike.