Knowledge-Graph-Hub / kg-phenio

A Graph for experiments doing ML on ontologies.
BSD 3-Clause "New" or "Revised" License
6 stars 4 forks source link

Re-add missing node xrefs #141

Closed caufieldjh closed 3 months ago

caufieldjh commented 3 months ago

In reference to https://github.com/monarch-initiative/monarch-app/issues/652 -

MONDO identifiers don't appear to have their corresponding OMIM IDs, or at least they don't end up in the Monarch KG. They appear to be populated in PHENIO as expected - see https://github.com/monarch-initiative/phenio/issues/67 Are they going missing in the KG-Phenio build?

caufieldjh commented 3 months ago

Stats for the 20240313 KG build say that only 3 OMIM nodes are there, and they are these:

OMIM:has_inheritance_type       biolink:has_attribute                   Graph                   http://purl.obolibrary.org/obo/OMIM_has_inheritance_type
OMIM:has_manifestation  biolink:superclass_of                   Graph                   http://purl.obolibrary.org/obo/OMIM_has_manifestation
OMIM:manifestation_of   biolink:manifestation_of                        Graph                   http://purl.obolibrary.org/obo/OMIM_manifestation_of

Not even real nodes!

Strangely enough, a build from the same time last year (20230313) contains the same OMIM nodes and no more.

This is what MONDO:0026765 looks like in the most recent KG-Phenio:

MONDO:0026765   biolink:Disease congenital disorder of glycosylation, type IIr          Graph   CDG IIr|CDG2R|CONGENITAL DISORDER OF GLYCOSYLATION, TYPE IIr|congenital disorder of glycosylation, type IIr, X-linked recessive         http://purl.obolibrary.org/obo/MONDO_0026765

If I look at a build from waaaay back in 2022 - xrefs are there!

MONDO:0026765   biolink:NamedThing      congenital disorder of glycosylation, type IIr                  Graph   CONGENITAL DISORDER OF GLYCOSYLATION, TYPE IIr|CDG2R|CDG IIr                                                               OMIM:301045                                                                                                                                                                                                                                 owl:Class

For comparison, this is what that node looks like in the most recent KG-OBO version of MONDO:

MONDO:0026765   biolink:Disease congenital disorder of glycosylation, type IIr          OMIM:301045     mondo.json      CONGENITAL DISORDER OF GLYCOSYLATION, TYPE IIr|CDG IIr|CDG2R|congenital disorder of glycosylation, type IIr, X-linked recessive     http://purl.obolibrary.org/obo/MONDO_0026765    https://omim.org/entry/301045   rare|inferred_rare

So somewhere along the way, the xrefs (and also skos?) started being ignored.

caufieldjh commented 3 months ago

I think the transform step in which node sources are added is not including xrefs. See https://github.com/Knowledge-Graph-Hub/kg-phenio/blob/master/kg_phenio/transform_utils/phenio/phenio_node_sources.yaml

caufieldjh commented 3 months ago

OMIM xrefs (and others) are back in kg-phenio:

$ grep MONDO merged-kg_nodes.tsv | grep OMIM | wc -l
10359