Knowledge-Graph-Hub / universalizer

The KG-Hub Universalizer provides functions for knowledge graph cleanup and identifier normalization.
BSD 3-Clause "New" or "Revised" License
3 stars 2 forks source link

KeyError due to node IDs missing from `update_cats` dict #34

Closed caufieldjh closed 1 year ago

caufieldjh commented 1 year ago

In the most recent KG-Phenio build:

15:58:41  Normalizing nodes and categories...
15:58:41  Found these graph files:['data/merged/merged-kg_edges.tsv', 'data/merged/merged-kg_nodes.tsv']
15:58:41  Retrieving entity names in data/merged/merged-kg_nodes.tsv...
15:58:41  Found 25441 unexpected identifiers.
15:58:41  Will normalize 24004 identifiers.
15:58:41  Wrote IRI maps to data/merged/update_id_maps.tsv.
15:58:41  Retrieving categories in data/merged/merged-kg_nodes.tsv...
15:58:41  Traceback (most recent call last):
15:58:41    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/run.py", line 98, in <module>
15:58:41      cli()
15:58:41    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
15:58:41      return self.main(*args, **kwargs)
15:58:41    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/click/core.py", line 1078, in main
15:58:41      rv = self.invoke(ctx)
15:58:41    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
15:58:41      return _process_result(sub_ctx.command.invoke(sub_ctx))
15:58:41    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
15:58:41      return ctx.invoke(self.callback, **ctx.params)
15:58:41    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/click/core.py", line 783, in invoke
15:58:41      return __callback(*args, **kwargs)
15:58:41    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/run.py", line 94, in merge
15:58:41      normalize()
15:58:41    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/normalize.py", line 16, in normalize
15:58:41      clean_and_normalize_graph(filepath="data/merged/",
15:58:41    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/universalizer/norm.py", line 108, in clean_and_normalize_graph
15:58:41      remap_these_categories, remove_these_edges = make_cat_maps(
15:58:41    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/universalizer/norm.py", line 361, in make_cat_maps
15:58:41      len(update_cats[subj_node_id]) > 0
15:58:41  KeyError: 'CHEBI:100147'

This subject node ID hasn't been added to update_cats, presumably because the existing category didn't require any changes. This may mean it doesn't need to be checked at all, or it may mean there's something else going on with its category assignment.