INCATools / dead_simple_owl_design_patterns

A simple system for specifying OWL class design patterns for OBO-ish ontologies.
http://incatools.github.io/dead_simple_owl_design_patterns/
GNU General Public License v3.0
42 stars 5 forks source link

Explore relationship between templates and RDF Shapes/ShEx #51

Open cmungall opened 5 years ago

cmungall commented 5 years ago

There are similarities and differences in semantics and use cases between templates (dosdps, robot, ottr) and shapes (shex, shacl).

We should explore these and formalize the linkages, and possibly even explore if there is a possible subsuming framework.

Some background: This is being driven in part by the go-shapes schema which is used to validate GO-CAMs but is increasingly becoming a general source of all truth about GO. Originally we had shapes only for obo-core level classes such as BiologicalProcess, CellComponent. But we are seeing the need for deeper subclasses; eg a transport subclass that we can parameterize with start-location and end-location.

This is obviously partly duplicative with the dosdp templates for go. This is not super-satisfying. Aside from duplication of effort, the worst effect is duplication of mindshare and confusion over not having one source of truth.

A current very rough proposal:

E.g.

<Transport> <BiologicalProcess> AND EXTRA a {
  has-start-location: <CellComponent> // dosdp:var "start"
  has-end-location: <CellComponent> // dosdp:var "end"
} // rdfs:comment "this is for transport"
     dosdp:labelGen "transport [from {{start}}] [to {{end}}]"
`    dosdp:textdefGen "..."

no need for an equiv axiom generator: all the information is in the abox pattern

You could feed this either tuples (with optional fillers) or actual subgraphs, in order to do class generation

I am also assuming in the future many tools for doing things like driving form interfaces from shex/shacl (which are partly interconvertible)

I think there are many advantages to doing this for GO. We are becoming more abox-based. A lot of the standard tooling in ShEx is really nice, and it's a widely adopted standard.

This could just be creating busy work for other uses of dosdps, e.g. they have been phenomenally successful for phenotype reconciliation.

The counterpoint to all of this is skepticism about finding the One True Framework to bind them all (biolinkml?)

See Also

cc

@vanaukenk @dosumis @matentzn @balhoff @goodb @ukemi @jamesaoverton @beckyjackson

matentzn commented 5 years ago

We will make this the topic of our next ODK call. I must admit that I lack background to really understand what your are proposing here, but I generally want to start using shapes for the phenotype reconciliation effort soon so it makes sense to coordinate with GO and DOSDP.

dosumis commented 5 years ago

Makes sense. This was, of course, one of the motivating use-cases for DOSDPs in the first palce - see instance_graph spec on DOSDP-schema.

cmungall commented 5 years ago

Interesting, I didn't know you were already considering using shapes.

On Tue, Nov 12, 2019, 23:13 Nico Matentzoglu notifications@github.com wrote:

We will make this the topic of our next ODK call. I must admit that I lack background to really understand what your are proposing here, but I generally want to start using shapes for the phenotype reconciliation effort soon so it makes sense to coordinate with GO and DOSDP.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/INCATools/dead_simple_owl_design_patterns/issues/51?email_source=notifications&email_token=AAAMMOLUBRKD6AKBQTEFKADQTOSKLA5CNFSM4JMJJUHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED5D7TY#issuecomment-553271247, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMONVWLGNKAXXMEUBEDDQTOSKLANCNFSM4JMJJUHA .

dosumis commented 5 years ago

May only be of historical interest, but spec here:

https://github.com/INCATools/dead_simple_owl_design_patterns/blob/master/spec/DOSDP_schema_full.yaml#L411

& here:

https://github.com/INCATools/dead_simple_owl_design_patterns/blob/master/spec/DOSDP_schema_full.yaml#L153

@balhoff - did you ever get around to wirting code for this. Think we discussed it at the time.

wdduncan commented 5 years ago

This is quite interesting. I'm a little lost on details. I think you are proposing t-shex to be be the ground truth ... right? That is, dosdp would be transformed to t-shex. Or is it the other way round: t-shex would be transformed to dosdp?

cmungall commented 5 years ago

I think you are proposing t-shex to be be the ground truth ... right?

Correct

t-shex would be transformed to dosdp?

Correct

(of course there may be a bootstrapping and synchronization step where we iterate with the reverse)

And to be clear "t-shex" is nothing more than standard shex with some conventions as to how it is annotated (hmm, can we model that in shex itself, that's the kind of meta question @hsolbrig loves)

wdduncan commented 5 years ago

Ok. So you are proposing to use t-shex to generate data by translating the t-shex into dosdp, and then the dosdp to OWL/RDF?

cmungall commented 5 years ago

This is possibly the most expedient path.

But note that t-shex->dosdp is compilation/translation

There isn't really a dosdp->owl translation as such. The dosdp specifies how to translate tuples/rows to OWL.

On Thu, Nov 14, 2019 at 11:32 AM Bill Duncan notifications@github.com wrote:

Ok. So you are proposing to use t-shex to generate data by translating the t-shex into dosdp, and then the dosdp to OWL/RDF?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/INCATools/dead_simple_owl_design_patterns/issues/51?email_source=notifications&email_token=AAAMMOPUAR2QT5TEGH5IZ6TQTWRTNA5CNFSM4JMJJUHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEDAQ6I#issuecomment-554043513, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOPZ2EZ73BTTABPXXSLQTWRTNANCNFSM4JMJJUHA .

dosumis commented 5 years ago

I think this appraoch is fine if you're willing to limit design pattern expressivity: patterns entirely EquivalentClass with no nested class expressions. The one case where I think this would be a loss for GO is GCIs used to align branches. e.g. I still think patterns with GCIs are the best way to align CC organization/assembly/dissasembly in BP with the CC heirarchy. IIRC, I even wrote patterns for this.

dosumis commented 5 years ago

Think this approach has the advantage that it should be reasonably transparent to those used to building GO-CAM models in a way that perhaps DOSDPs have failed to be. OTOH - isn't there a danger that it will result in unsafe patterns - that apply to some broad subset of cases but cause misclassification outside of these? To prevent this I think you'd still need a strong editorial step between deriving DOSDPs derived from ShEx patterns and implementing them in the ontology.

cmungall commented 5 years ago

Do you still have those GCI examples? I don't see in the current ones: https://github.com/geneontology/go-ontology/blob/master/src/design_patterns/cc_disassembly.yaml

My so far vague thoughts are that we can always bring across any aspect of dosdps into t-shex annotations, and just treat as an alternate syntax for dosdps.

But this isn't ideal if we want to embrace the abox shape as being the 'source of truth', we end up mixing the two in a slightly redundant way

I think the GCIs might be expressible in a more abox-centric way that can then be autogeneralized to tboxes, but this remains to be determined.

isn't there a danger that it will result in unsafe patterns

would this be in the tbox generalization step? Quite possibly, need to think of some examples..

dosumis commented 4 years ago

Do you still have those GCI examples?

See https://github.com/geneontology/go-ontology/blob/master/src/design_patterns/cc_organization.yaml#L49

cmungall commented 4 years ago

Another possibility here is to build this in to biolinkml yaml, cc @hsolbrig

https://github.com/biolink/biolinkml -- note completely independent of biolink itself

A related ticket: https://github.com/biolink/biolinkml/issues/128

classes:
  transport:
    is_a: biological process
 slots:
   - start location
   - end location
 templates:
   name:
    as string value: "transport from {start location} to {end location}"
   definition:
    as string value: "...."
...

with equivalence/logdef pattern inferred automatically

for GCIs, how about just specifying these directly as abox rules and inferring a SPARQL update?

e.g.

?cp results-in-org-of ?c1, ?c1 part-of ?c
->
exists: ?p
?cp part-of ?p
?p a :organization, ?p has-input ?c

there is a deterministic translation of this structure to an ugly sparql tbox update command

cmungall commented 4 years ago

Thinking more about using the abox representation as primary (and using something like uml or biolinkml or shex) with derivations of tbox equiv axioms, @matentzn posed the question of what to do about complex patterns where the desired tbox expression employs nesting

I would do this through simple composition of standard class definitions

e.g for subq case, we may have

classes:
  phenotype:
    slot_usage:
      has part:
        range: atomic phenotype
     to_str: "{atomic phenotype}"
   atomic phenotype:
     slots: [inheres in, type, qualifier]
  morphology phenotype:
     is_a: atomic phenotype
     slot_usage:
       type:
         range: morphology class
       inheres in:
         range: anatomical structure
       to_str: "{inheres in} morphology"
  abnormal morphology phenotype:
     is_a: morphology phenotype
     slot_usage:
       qualifier:
         range: abnormal class
       to_str: "abnormal {inheres in} morphology"
etc

this constrains the shape of aboxes and gives string gen/parse. E.g. "morphology of patient123s left femur".

the shape of tboxes follows directly from this, together with patterns for equivalence axioms, no need for writing owl in macros.

cmungall commented 4 years ago

Here is an example of using biolinkml as a template language for a chemical ontology: https://github.com/cmungall/chemistry-ontology