INCATools / ubergraph

Integrated OBO ontology store
BSD 3-Clause "New" or "Revised" License
42 stars 3 forks source link

adding knowledge graph links with different semantics #117

Open balhoff opened 1 year ago

balhoff commented 1 year ago

Currently the semantics of the edges in the two relation graphs is:

If a user is querying the "nonredundant" relation graph, they can propagate these relations in a consistent way down and up the class hierarchy within SPARQL, e.g.:

?x results_in_development_of:/rdfs:subClassOf* ?y

It could be useful to add more kinds of links that would have different semantics from those:

All of those have different propagation characteristics with regard to the class hierarchy, compared to the existing types of edges.

We can compute the complete closure and a nonredundant version of each of those links (maybe as new features in relation-graph). Should those triples go in different graphs? Or just the same two relation graphs and add the expansion semantics to the docs?

cmungall commented 1 year ago

There are two ways to get these kinds of things in

  1. shadow annotation properties, with the upstream ontology responsible for injecting these
  2. mapping rules from owl axioms to simple triples in the ubergraph loader

Note we do 1 to a certain extent with never-in-taxon and present-in-taxon anyway, and we have a very partial framework for one way conversion of these based on SPARQL (need to document this better). So 1 is most conservative. There is already a mechanism for filtering annotation assertions vs "true edges".

1 is also a little ad-hoc, in that not every upstream materializes these, and it's somewhat arbitrary which patterns we have defined shortcuts for.

I think with route 2, there is the potential for a lot of confusion. If we go this route, I am very strongly of favor of making this as a separate standard that is decoupled from ubergraph, and that the owl interpretation is completely transparent.

I made a start on a proposed standard for this:

It uses plain RDF reification to store the interpretation on the triple (rdfstar gives syntactic sugar to query this, but it can all be done with plain reification, and rdfstar has issues anyway). This makes the everything transparent, and in theory the user can choose what to filter (in practice there are performance implications).

We can imagine a variant or extension of owlstar where we use NGs to store the interpretation. This is more in keeping with how ubergraph handles things, but IMO there are severe limitations here, in that we are overloading the quad argument with both source and interpretation. Even now in ubergraph it's impossible to query for RG triples asserted in a particular ontology.

All of those have different propagation characteristics with regard to the class hierarchy, compared to the existing types of edges.

are you also considering materializing these? Won't this explode for anything involving ComplementOf / "downward" propagation.

If we do implement this I think it's really important to follow a standard here.

But overall I am not convinced as to the need. But it really depends on who the consumers are. My own perspective is that the vast majority of common operations can be done with materialized standard RG triples plus joins.

I am guessing TCs is a driver here. I have not yet watched your tutorial on this, but I have been meaning to demonstrate how OAK can do most useful TC inferences by using the plain taxon plus source ontology RG table and a join between the two.

balhoff commented 1 year ago

Thanks! My main question was about whether it would be confusing to users to add more kinds of edges that have their own background semantics. For the taxon constraints I would have some dedicated software that just computes the complete materialization using a reasoner, with special knowledge of mapping never_in_taxon and present_in_taxon to OWL (code like this already exists in OBO taxon constraints plugin for Protégé and gaferencer.

are you also considering materializing these? Won't this explode for anything involving ComplementOf / "downward" propagation.

Yes... Ubergraph already has a theme of exhaustive materialization. :-) But maybe for never_in_taxon it would get too big.

cmungall commented 1 year ago

But maybe for never_in_taxon it would get too big

yep, it would almost saturated - most anatomy, most diseases, most terms x majority of taxa