Knowledge-Graph-Hub / universalizer

The KG-Hub Universalizer provides functions for knowledge graph cleanup and identifier normalization.
BSD 3-Clause "New" or "Revised" License
3 stars 2 forks source link

Run on KG-Phenio corrects OntologyClass but doesn't update to more specific category #10

Closed caufieldjh closed 2 years ago

caufieldjh commented 2 years ago

The issue: nodes with the biolink:OntologyClass category get mapped to biolink:NamedThing, but there are more specific mappings available for them in the input graph edges.

A run of universalizer on kg-phenio like this:

$ universalizer run ~/kg-phenio/data/merged/ -u

produces this output:

Input path: /home/harry/kg-phenio/data/merged/
Will update categories.
Found these graph files:['/home/harry/kg-phenio/data/merged/merged-kg_nodes.tsv', '/home/harry/kg-phenio/data/merged/merged-kg_edges.tsv']
Retrieving entity names in /home/harry/kg-phenio/data/merged/merged-kg_nodes.tsv...
Found 28758 unexpected identifiers.
Will normalize 27106 identifiers.
Wrote IRI maps to /home/harry/kg-phenio/data/merged/update_id_maps.tsv.
Retrieving categories in /home/harry/kg-phenio/data/merged/merged-kg_nodes.tsv...
Found 430307 unexpected categories.
Will normalize 171124 categories.
Wrote category maps to /home/harry/kg-phenio/data/merged/update_category_maps.tsv.
Updated 171124 nodes.
Complete.

The processed edgefile does not contain the biolink:category edges, as expected. The category mapfile, however (update_category_maps.tsv) doesn't include any category maps beyond biolink:NamedThing. Correspondingly, the processed nodefile doesn't include more specific category mappings than that (one node here):

ENVO:01001570   biolink:NamedThing      terrestrial ecoregion   An ecoregion which is located on a landmass.    Graph

That should be biolink:EnvironmentalFeature.

caufieldjh commented 2 years ago

Fixed in 5c5eb03a0ba857e7d22315a1577a79dba19eccef (should have been a PR, but these things happen)