Knowledge-Graph-Hub / universalizer

The KG-Hub Universalizer provides functions for knowledge graph cleanup and identifier normalization.
BSD 3-Clause "New" or "Revised" License
3 stars 2 forks source link

Node categories: if starting with OntologyClass, don't always replace with the last assigned category by edges #32

Closed caufieldjh closed 1 year ago

caufieldjh commented 1 year ago

Not quite done with assignment of node categories correctly in KG-Phenio. The fix in #31 prevents node categories from getting overwritten if we've included them in RETAINED_CAT_LIST and they're already defined in a nodefile. So if HP:0000123 has biolink:PhenotypicFeature before normalization, it will still have that category after normalization. But if a node comes in with biolink:OntologyClass, we still try to rewrite it, and that means accepting whatever the last detail we have about the category is (with the assumption that it's the most detailed). e.g., for MP:0000018:

$ grep MP:0000018 phenio_edge_sources_edges.tsv
urn:uuid:3ac59c2d-f036-4d58-8fa6-5d3e3dab5936   MP:0000018      biolink:subclass_of     MP:0002177                      infores:phenio  infores:mp
urn:uuid:12e28a25-26dc-4907-a66b-a619bce16690   MP:0000018      biolink:subclass_of     UPHENO:0069196                  infores:phenio  infores:upheno
urn:uuid:b5b423a3-358a-45ee-9802-6bea72dc0a6b   MP:0000018      biolink:category        biolink:NamedThing                      infores:phenio  infores:mp
urn:uuid:bb446966-587e-4403-b7ca-e4a9640ef201   MP:0000018      biolink:category        biolink:OntologyClass                   infores:phenio  infores:mp
urn:uuid:9a717007-7dca-411e-bf17-5ccd1fc9e4f3   MP:0000018      biolink:category        biolink:PhenotypicFeature                       infores:phenio  infores:mp
urn:uuid:9c87d961-4e56-49e9-b816-6eba0ef9daa8   MP:0000018      biolink:category        biolink:PhenotypicQuality                       infores:phenio  infores:mp
urn:uuid:fee39504-b41f-4ba0-84c7-1142e065e510   MP:0030297      biolink:subclass_of     MP:0000018                      infores:phenio  infores:mp

The last edge is biolink:category of biolink:PhenotypicQuality, so the node category for MP:0000018 becomes biolink:PhenotypicQuality. Not correct in this case, though it may be correct in others.