WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
249 stars 52 forks source link

KeyError vogdb categories #176

Closed ZarulHanifah closed 1 year ago

ZarulHanifah commented 2 years ago

Hello,

I was running dram_v for seven genomes, all passed but one failed, with this error message:

/fs02/ie79/Zarul/prophage_detection/.snakemake/conda/a41f0976fb8154f5853379be34959793/lib/python3.10/site-packages/mag_annotator/annotate_bins.py:603: UserWarning: No rRNAs were detected, no rrnas.tsv file will be created.
  warnings.warn('No rRNAs were detected, no rrnas.tsv file will be created.')
2022-05-07 17:54:36.669004: Viral annotation started
0:00:00.263488: Retrieved database locations and descriptions
0:00:00.263541: Annotating final-viral-combined-for-dramv
0:00:00.422184: Turning genes from prodigal to mmseqs2 db
0:00:02.355397: Getting hits from kofam
0:01:01.380347: Getting forward best hits from viral
0:01:04.454533: Getting forward best hits from peptidase
0:01:08.947004: Getting hits from pfam
0:01:22.763530: Getting hits from dbCAN
0:01:29.084671: Getting hits from VOGDB
0:02:08.138741: Merging ORF annotations
0:02:09.722065: Annotations complete, processing annotations
0:02:09.738307: Annotations complete, assigning auxiliary scores and flags
0:02:09.831099: Completed annotations
0:00:00.026330: Retrieved database locations and descriptions
0:00:00.027495: Determined potential amgs
Traceback (most recent call last):
  File "/fs02/ie79/Zarul/prophage_detection/.snakemake/conda/a41f0976fb8154f5853379be34959793/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3621, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'vogdb_categories'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/fs02/ie79/Zarul/prophage_detection/.snakemake/conda/a41f0976fb8154f5853379be34959793/bin/DRAM-v.py", line 153, in <module>
    args.func(**args_dict)
  File "/fs02/ie79/Zarul/prophage_detection/.snakemake/conda/a41f0976fb8154f5853379be34959793/lib/python3.10/site-packages/mag_annotator/summarize_vgfs.py", line 240, in summarize_vgfs
    viral_genome_stats = make_viral_stats_table(annotations, potential_amgs, groupby_column)
  File "/fs02/ie79/Zarul/prophage_detection/.snakemake/conda/a41f0976fb8154f5853379be34959793/lib/python3.10/site-packages/mag_annotator/summarize_vgfs.py", line 92, in make_viral_stats_table
    for i in frame['vogdb_categories']]) / frame.shape[0]
  File "/fs02/ie79/Zarul/prophage_detection/.snakemake/conda/a41f0976fb8154f5853379be34959793/lib/python3.10/site-packages/pandas/core/frame.py", line 3505, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/fs02/ie79/Zarul/prophage_detection/.snakemake/conda/a41f0976fb8154f5853379be34959793/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3623, in get_loc
    raise KeyError(key) from err
KeyError: 'vogdb_categories'

Thank you.

rmFlynn commented 2 years ago

Hmm, It looks like your vogdb is missing its description file. Can you run DRAM-setup.py print_config and post the output?

ZarulHanifah commented 2 years ago

Here you go:

Processed search databases
KEGG db: None
KOfam db: /fs03/ie79/db/DRAM_data/kofam_profiles.hmm
KOfam KO list: /fs03/ie79/db/DRAM_data/kofam_ko_list.tsv
UniRef db: None
Pfam db: /fs03/ie79/db/DRAM_data/pfam.mmspro
dbCAN db: /fs03/ie79/db/DRAM_data/dbCAN-HMMdb-V9.txt
RefSeq Viral db: /fs03/ie79/db/DRAM_data/refseq_viral.20210917.mmsdb
MEROPS peptidase db: /fs03/ie79/db/DRAM_data/peptidases.20210917.mmsdb
VOGDB db: /fs03/ie79/db/DRAM_data/vog_latest_hmms.txt

Descriptions of search database entries
Pfam hmm dat: /fs03/ie79/db/DRAM_data/Pfam-A.hmm.dat.gz
dbCAN family activities: /fs03/ie79/db/DRAM_data/CAZyDB.07302020.fam-activities.txt
VOG annotations: /fs03/ie79/db/DRAM_data/vog_annotations_latest.tsv.gz

Description db: /fs03/ie79/db/DRAM_data/description_db.sqlite

DRAM distillation sheets
Genome summary form: /fs03/ie79/db/DRAM_data/genome_summary_form.20210917.tsv
Module step form: /fs03/ie79/db/DRAM_data/module_step_form.20210917.tsv
ETC module database: /fs03/ie79/db/DRAM_data/etc_mdoule_database.20210917.tsv
Function heatmap form: /fs03/ie79/db/DRAM_data/function_heatmap_form.20210917.tsv
AMG database: /fs03/ie79/db/DRAM_data/amg_database.20210917.tsv

And I have checked, all the files should be present

for i in /fs03/ie79/db/DRAM_data/kofam_profiles.hmm /fs03/ie79/db/DRAM_data/kofam_ko_list.tsv /fs03/ie79/db/DRAM_data/pfam.mmspro /fs03/ie79/db/DRAM_data/dbCAN-HMMdb-V9.txt /fs03/ie79/db/DRAM_data/refseq_viral.20210917.mmsdb /fs03/ie79/db/DRAM_data/peptidases.20210917.mmsdb /fs03/ie79/db/DRAM_data/vog_latest_hmms.txt /fs03/ie79/db/DRAM_data/Pfam-A.hmm.dat.gz /fs03/ie79/db/DRAM_data/CAZyDB.07302020.fam-activities.txt /fs03/ie79/db/DRAM_data/vog_annotations_latest.tsv.gz /fs03/ie79/db/DRAM_data/description_db.sqlite /fs03/ie79/db/DRAM_data/genome_summary_form.20210917.tsv /fs03/ie79/db/DRAM_data/module_step_form.20210917.tsv /fs03/ie79/db/DRAM_data/etc_mdoule_database.20210917.tsv /fs03/ie79/db/DRAM_data/function_heatmap_form.20210917.tsv /fs03/ie79/db/DRAM_data/amg_database.20210917.tsv ; do ls $i ; done
/fs03/ie79/db/DRAM_data/kofam_profiles.hmm
/fs03/ie79/db/DRAM_data/kofam_ko_list.tsv
/fs03/ie79/db/DRAM_data/pfam.mmspro
/fs03/ie79/db/DRAM_data/dbCAN-HMMdb-V9.txt
/fs03/ie79/db/DRAM_data/refseq_viral.20210917.mmsdb
/fs03/ie79/db/DRAM_data/peptidases.20210917.mmsdb
/fs03/ie79/db/DRAM_data/vog_latest_hmms.txt
/fs03/ie79/db/DRAM_data/Pfam-A.hmm.dat.gz
/fs03/ie79/db/DRAM_data/CAZyDB.07302020.fam-activities.txt
/fs03/ie79/db/DRAM_data/vog_annotations_latest.tsv.gz
/fs03/ie79/db/DRAM_data/description_db.sqlite
/fs03/ie79/db/DRAM_data/genome_summary_form.20210917.tsv
/fs03/ie79/db/DRAM_data/module_step_form.20210917.tsv
/fs03/ie79/db/DRAM_data/etc_mdoule_database.20210917.tsv
/fs03/ie79/db/DRAM_data/function_heatmap_form.20210917.tsv
/fs03/ie79/db/DRAM_data/amg_database.20210917.tsv
rmFlynn commented 2 years ago

Unfortunately, the problem is most likely that vogDB had 0 hits for that category. I will work on pushing a fix, but in the meantime the problem could be solved quickly by adding an empty column to your annotations file.

rmFlynn commented 1 year ago

Let me know if the latest update DRAM1.4 does not fix your issue, and re-open this ticket. I am closing for now because I think it is solved now.