Knowledge-Graph-Hub / universalizer

The KG-Hub Universalizer provides functions for knowledge graph cleanup and identifier normalization.
BSD 3-Clause "New" or "Revised" License
3 stars 2 forks source link

Category edges persist, with incorrect objects #26

Closed caufieldjh closed 1 year ago

caufieldjh commented 1 year ago

Processing of the most recent KG-Phenio build (20221215) assigns a correct entity type to nodes (or at least more specific than OntologyClass) but retains the old biolink:category edges, with incorrect objects. Example:

$ grep CHEBI:88187 merged-kg_nodes.tsv
CHEBI:88187     biolink:MolecularEntity penicillin allergen     Any penicillin which causes the onset of an allergic reaction.  Graph
$ grep CHEBI:88187 merged-kg_edges.tsv
urn:uuid:9a2bcaff-9657-41e5-a21b-2a66edc97751   CHEBI:18208     biolink:subclass_of     CHEBI:88187             rdfs:subClassOf Graph
urn:uuid:65b4fc60-f935-4d7a-9fd0-ded719863534   CHEBI:2676      biolink:subclass_of     CHEBI:88187             rdfs:subClassOf Graph
urn:uuid:44064677-5081-4383-96db-e6d87fd01dae   CHEBI:28971     biolink:subclass_of     CHEBI:88187             rdfs:subClassOf Graph
urn:uuid:8f01c165-65dc-441a-8179-01a12dcf849b   CHEBI:3393      biolink:subclass_of     CHEBI:88187             rdfs:subClassOf Graph
urn:uuid:1fbb62fe-1544-455e-a57e-7402e378af75   CHEBI:6827      biolink:subclass_of     CHEBI:88187             rdfs:subClassOf Graph
urn:uuid:30c69fb3-6b78-4712-942d-6e5d0c98e5e1   CHEBI:8232      biolink:subclass_of     CHEBI:88187             rdfs:subClassOf Graph
urn:uuid:60a63d19-dc79-4637-8198-f065871d4618   CHEBI:88187     biolink:subclass_of     CHEBI:17334             rdfs:subClassOf Graph
urn:uuid:7ca9bf59-8aad-4385-8ed8-25a3bd274d69   CHEBI:88187     biolink:has_attribute   CHEBI:50904             RO:0000087      Graph
urn:uuid:78732a9c-510f-4024-b135-3bcd74cabb1a   CHEBI:88187     biolink:category        biolink:NamedThing              biolink:category        Graph
urn:uuid:16689390-e28e-4323-b8d5-0da39f584a1a   CHEBI:88187     biolink:category        biolink:OntologyClass           biolink:category        Graph
urn:uuid:e1d7ce72-246e-43bc-b655-cac55c17f4e8   CHEBI:9587      biolink:subclass_of     CHEBI:88187             rdfs:subClassOf Graph

Or:

$ grep PATO:0001447 merged-kg_nodes.tsv
PATO:0001447    biolink:PhenotypicQuality       calcified       A composition quality inhering in an bearer by virtue of the bearer's being encrusted or impregnated with calcium carbonate (CHEBI:3311).       Graph
$ grep PATO:0001447 merged-kg_edges.tsv
urn:uuid:b2279890-4492-4f94-9fdb-d854117902dc   PATO:0001447    biolink:subclass_of     PATO:0000025            rdfs:subClassOf Graph
urn:uuid:c0adaddd-4723-4965-8d3c-0b1b4ce7f183   PATO:0001447    biolink:category        biolink:NamedThing              biolink:category        Graph
urn:uuid:dc3b4c34-688c-4f37-85c0-fe582732e697   PATO:0001447    biolink:category        biolink:OntologyClass           biolink:category        Graph
urn:uuid:c2f809d8-47c0-41e8-ada2-5d55fca526b7   UPHENO:0067765  biolink:has_part        PATO:0001447            BFO:0000051     Graph

The biolink:category edges shouldn't be present at all, and they definitely shouldn't refer to incorrect categories.

caufieldjh commented 1 year ago

Does not appear to be fixed in most recent KG-Phenio (20230302):

$ grep PATO:0001447 merged-kg_nodes.tsv
PATO:0001447    biolink:PhenotypicQuality       calcified       A composition quality inhering in an bearer by virtue of the bearer's being encrusted or impregnated with calcium carbonate (CHEBI:3311).       Graph
~/kg-phenio/data/merged/20230302$ grep PATO:0001447 merged-kg_edges.tsv
urn:uuid:bc483055-65da-4abd-ba42-47ab00248bef   PATO:0001447    biolink:subclass_of     PATO:0000025                    infores:phenio  Graph   infores:pato
urn:uuid:9dcef768-d736-4c31-a695-fce57c28ac9c   PATO:0001447    biolink:category        biolink:NamedThing                      infores:phenio  Graph   infores:pato
urn:uuid:4e176049-fdc2-4577-b498-dee0a6d212b2   PATO:0001447    biolink:category        biolink:OntologyClass                   infores:phenio  Graph   infores:pato
urn:uuid:9874c639-a33a-4ea9-9e5a-b1ab797730cb   UPHENO:0067765  biolink:related_to      PATO:0001447                    infores:phenio  Graph   infores:upheno
caufieldjh commented 1 year ago

This does appear to be fixed in the most recent KG-Phenio (20230808):

$ grep PATO:0001447 merged-kg_nodes.tsv
PATO:0001447    biolink:PhenotypicQuality       calcified       A composition quality inhering in an bearer by virtue of the bearer's being encrusted or impregnated with calcium carbonate (CHEBI:3311).       Graph   calcareous|calcification
$ grep PATO:0001447 merged-kg_edges.tsv
urn:uuid:134278ea-d0ce-4b74-a7b6-ec683d1451dc   PATO:0001447    biolink:subclass_of     PATO:0000025                    infores:phenio  Graph   infores:pato
urn:uuid:18d1f573-1671-41df-ac45-98162d07e94e   UPHENO:0067765  biolink:related_to      PATO:0001447                    infores:phenio  Graph   infores:upheno