Closed dosumis closed 7 years ago
Just to clarify, the actual yaml will not have $
s. However, there will be %s
interpolated in the standard way for dosdps?
So once interpolated we end up with a yaml representation of an abox, with the built in assumption that the keys in the nodes
dictionary map to blank nodes
minor comment, we now have 3 semantically more or less equivalent ways to specify an abox fragment:
Just to clarify, the actual yaml will not have
$
s.
Correct. Just using this as a compact way to specify. (Also working on JSON-LD spec)
However, there will be
%s
interpolated in the standard way for dosdps?
No need for string interpolation as there are no strings to be interpolated. This is the key bit of spec:
$type is either a quoted name that is a key in the pattern's class dictionary OR a var name
It's up to maintainers of the Design pattern / template to avoid any name key clashes between owl entities and variable names.
So once interpolated we end up with a yaml representation of an abox, with the built in assumption that the keys in the
nodes
dictionary map to blank nodes.
Don't see any need for blank nodes. nodes
in the dict are individuals in the LEGO model.
minor comment, we now have 3 semantically more or less equivalent ways to specify an abox fragment:
I did wonder about re-using an existing standard, but want to keep the easily human-readable pattern. I could make the edges look more like OBO graphs though - having explicit keys subj, pred & obj to represent triples rather than just using tuples. These might be slightly intimidating to non-geeks though.
So once interpolated we end up with a yaml representation of an abox, with the built in assumption that the keys in the nodes dictionary map to blank nodes. Don't see any need for blank nodes. nodes in the dict are individuals in the LEGO model.
Sorry, I wasn't very clear. I think formalizing in terms of blank nodes/existentials has some advantages. There is no need to invent any new syntactic elements. The entire interpolated pattern is representable in any other RDF syntax. Of course, we choose to replace blank nodes with unique fresh IRIs on the server but this can be separated as an implementation decision.
MinervaJSON also ends up re-inventing blank nodes too. In my mind it would have been cleaner to map directly to RDF, not sure if @balhoff agrees.
Can you give an example - perhaps transducer from MF refactor?
Will post transducer example shortly. Still don't follow re: blank nodes. If everything in LEGO is an individual, we don't need to represent existentials.
Of course, the another alternative is to continue to use Manchester Syntax strings + interpolation. Just seems more efficient to represent simple OPAs using a data structure.
Still don't follow re: blank nodes. If everything in LEGO is an individual, we don't need to represent existentials
You have to name your individuals (assuming you have co-references)
You can't name them using URIs (well you could, but we want the server to mint URIs)
blank nodes provide this ability
Did you also ask about the minerva JSON format? @kltm or @balhoff do you have the link?
You have to name your individuals (assuming you have co-references)
They are named internally for the purpose of specifying their use in the pattern:
$name is a readable name assigned a node solely for the purpose of specifying this pattern
You can't name them using URIs (well you could, but we want the server to mint URIs) blank nodes provide this ability
Isn't this just an implementation issue? It should be up to the software using a template (e.g. noctua/minerva) to generate URIs for individuals when a template is inserted into a model.
Am I missing something?
@balhoff May have a better memory for what's in there, but I don't believe that the format had a "formal" specification for the graph parts, rather it grew out the boundaries we imposed on the nested response format. Copious tests and examples (at least for the client) are currently held at https://github.com/berkeleybop/bbop-graph-noctua/tree/master/tests. The library is a subclass of the grpha library that we otherwise use.
@cmungall
MinervaJSON also ends up re-inventing blank nodes too. In my mind it would have been cleaner to map directly to RDF, not sure if @balhoff agrees.
What blank nodes are currently in use?
Also, we made the specific design decision to not model it on RDF.
This is what I had in mind: https://github.com/berkeleybop/bbop-manager-minerva/wiki/MinervaRequestAPI
The request part of this specifies how a client constructs an ABox graph on the server.
Ah! I thought you were thinking of the response, rather than the request--nevermind.
From this: https://github.com/berkeleybop/bbop-graph-noctua/blob/master/tests/minerva-01.json#L475
instance_graph:
nodes:
$name : $type
...
edges:
[ [$subj_node, $rel, $obj_node], ... ]
In minerva json (as yaml) would be something like:
individuals:
-
id: $name
type: $type
-
...
facts:
-
subject: name1
property: rel1,
object: name2
-
...
Not too bad, but slightly more complicated than my suggestion.
I'll expand more on my concern but maybe lets start with a concrete example.
I'm still not fully grokking:
No need for string interpolation as there are no strings to be interpolated. This is the key bit of spec: $type is either a quoted name that is a key in the pattern's class dictionary OR a var name
Using a potential pattern for receptor activity as an example
pattern: receptor_activity
relations:
'has sensor': RO_...
'has effector': RO_...
'internally regulates': RO_...
classes:
'receptor activity': GO_...
'biochemical activity' : GO_...
'binding': GO_...
vars:
ligand_binding : "'binding'"
effector: "'biochemical activity'"
EquivalentTo:
text: "'receptor activity' that 'has sensor' some %s and 'has effector' some %s"
vars:
- ligand_binding
- effector
instance_graph:
nodes:
receptor1: 'receptor activity'
effector1: effector
sensor1: ligand_binding
edges:
- ['receptor1', 'has effector', 'effector1']
- ['receptor1', 'has sensor', 'sensor1']
- ['sensor1', 'internally regulates', effector1]
In the minerva version:
nodes
would become individuals
with value a list of objects with keys id
and type
. edges
would become facts
with value a list of objects with keys subject
, property
, object
OK, good.
Btw for anyone following, this is the structure:
What you have is pleasing in that it's quite yaml introspectable. And it does away with the need for geeky %s
s. However, there may be some disadvantages.
so I was thinking something like:
pattern: receptor_activity
relations:
'has sensor': RO_...
'has effector': RO_...
'internally regulates': RO_...
classes:
'receptor activity': GO_
'biochemical activity' : GO_....
'binding': GO_...
vars:
ligand_binding : "binding"
effector: "'biochemical activity'"
EquivalentTo:
text: "'receptor activity' that 'has sensor' some %s and 'has effector' some %s"
vars:
- ligand_binding
- effector
instance_graph:
text: |
_:t a 'transducer activity' .
_:e a %s .
_:s a %s .
_t: 'has effector' _:e .
_t: 'has sensor' _:s .
_s: 'internally regulates' _:e .
|
vars:
- effector
- ligand_binding
Note that blank nodes are used. It's up to the generator to decide how to handle these. We'd mint IRIs.
This has the advantage of requiring virtually no extension to the dosdp spec. The only change is that ttl is the format (we could do omn but not really important).
OK the interpolation with %s
is a bit geeky but it's the same for the rest of dosdps. E.g if we do something for multiple slots as for #16 it will work equally well here (e.g. protein complexes).
And I think it will deal with edge cases better. What if we want to inject an existing IRI into the pattern (e.g. PMID)? What about evidence? Do you reinvent reification with your list model? What about annotations on individuals? Those with literals vs IRI annotations? Negative property assertions (OK I can't think of a use case for that in a pattern right now, but you never know).
It's tempting to use the more yamly format, but I worry it's overly specific to today's use case.
I feel we went this path with the create subset of minervaJSON, and now it's harder to extend for other things. Now we'll have two mappings to a subset of RDF with their own special features, it's just additional cognitive burden.
(sorry, change transducer in mine to receptor, that was an unintentional change)
And just to expand on the protein complex example. We already have this in noctua:
And I love it.. but we can imagine this being easily genericized and driven by a dp (@kltm's 'super grebe'). However, for that we need to solve the cardinality of >1 problem. Shouldn't be hard, we have a ticket #16 for that. If we treat aboxes the same as tboxes, we only have to solve this once..
I liked the original proposal but started feeling more convinced by the generality of @cmungall's turtle. But a consumer would need to write a custom turtle parser for it. Or do some careful string replacement. And is turtle really any different than the lists of triples that @dosumis had? In either case you have to go to the same effort to handle reification.
I guess the rdf:type
declarations could be handled as additional triples in the list-of-list form. So I guess if there was an optional interpolation option for the LoL form they'd be interchangeable.
But I'm not really worried about the authoring. It'll be us geeky sorts doing most of it for the core patterns. And most patterns won't have co-references (ie graphs) which means you can nest the ttl to reflect the tree structure.
We will want to make it easier to author later. One thing I always wanted for TG was a "make me one like this" button, i.e. the prototyping approach. In an existing model, click on a node, and use the graph as a model (you can even generate the subsuming class expression provided it's a tree).
Reification - not nice in any format. But if dosdps can support different W3 syntaxes then you can take your pick of tradeoffs.
most patterns won't have co-references (ie graphs)
From last week's discussions, I think co-references may be the rule rather than the exception for compound functions.
The blank node business still feels like an implementation issue to me (the internal identifiers in my proposal work just as well as the turtle _t etc). In your proposal for the individual level, we'd specify chunks of turtle containing many axioms + annotations on them. This is appealing in that it doesn't require much more spec to be designed and written in order to be completely expressive. It's worth noting that this is quite different from the current DOSDP spec, in which each axiom is specified separately and there is an optional field for specifying annotations on an axiom (see below for example). This could potentially be re-used in the instance graph:
instance_graph:
nodes:
receptor1: 'receptor activity'
effector1: effector
sensor1: ligand_binding
edges:
-
edge: ['receptor1', 'has effector', 'effector1']
annotations:
-
annotationProperty: database_crossreference
text: "template:fu" # Example lacks vars
-
...
This has the advantage of avoiding annoying axiom reification patterns. Add in an extra boolean for negation and I think we have full expressiveness.
Annotation on an axiom - example in dosdp core (bit verbose - need to spec a more compact OBO pragma version).
data_var:
ref: xsd_string
annotation_axioms:
-
annotation_property: 'definition'
text: "Any %s that has a %s as a part"
vars:
- fu
- bar
annotations:
-
annotation_property: database_crossreference
text: '%s'
vars:
- ref # spec needs some more work here. Better to allow data_vars to take lists.
@cmungall can you post an example of axiom annotation in turtle?
turtle owl-reification is ugly (see for example https://github.com/geneontology/noctua-models/blob/master/models/586fc17a00000961.ttl#L88-L94); the main point is we don't have to reinvent.
OK, you may have convinced me. We should still be sure to work out the edge cases (we are re-inventing rdf syntax...). I think we can probably just piggy back off of JSON-LD conventions. I'm thinking of cases such as where the AnnotationValue is something other than plain literal (rare for us, but you never know). And it has the advantage of being more easily introspectable.
The asymmetry between ABox and TBox seems a little inelegant. I don't want to block us on this but I'm wondering if we were to do the whole thing from scratch we may have gone for a direct YAML representation of class axioms too. The asymmetry may stop us using the same approaches for aboxes and tboxes (e.g. protein complex multi-cardinality slot example).
I like @dosumis's example. I do think we should consider the case you mention where the annotation value is not text. We're already mashing resources into literals in LEGO, e.g. <http://purl.org/dc/elements/1.1/contributor> "http://orcid.org/0000-0003-2689-5511"^^<http://www.w3.org/2001/XMLSchema#string>
. This makes it awkward to query people's contributions with SPARQL.
We could replace the text
property with value
. But we need to allow both literals and resources, so we can't use JSON-LD to say that the value of value
is always a resource. So a resource value would always need to be an object like {"@id": "http://orcid.org/0000-0003-2689-5511"}
. Kind of verbose.
Alternatively we could keep text
and add another property like resource
(is there a better term?). People could use one or the other.
draft json schema spec (in YAML)
instance_graph:
type: object
additionalProperties: False
required: [nodes, edges]
properties:
nodes:
type: object
edges:
type: object
additionalProperties: False
required: [edge]
properties:
edge:
type: array
items: { type: string }
annotations:
{ $ref: '#/definitions/printf_annotation' }
not:
type: boolean
$ref refers back to core field type def.
The one thing missing in this proposal is a way to specify types as anonymous classes. This is (I think) beyond the scope of LEGO, but might be useful expressiveness elsewhere. (I don't have a use for these in VFB yet, but we type using anonymous classes very extensively). Supporting this would require the nodes field to revert to Manchester syntax sprintf.
I do think we should consider the case you mention where the annotation value is not text.
Here's the current spec for the annotation field. It assumes printf + var sub. When used in regular annotations (e.g. for a label or a def) the assumption is that if an OWL entity is specified (by a var) then the readable identifier will be used (label in our case) will be used in the sub. This field can also take strings specified in data_vars.
printf_annotation:
type: object
additionalProperties: False
required: [annotationProperty, text, vars]
properties:
annotationProperty:
description: >
A string corresponding to the rdfs:label
of an owl annotation property. If the annotation property has no label,
the shortForm ID should be used. The annotation property must be listed
in the annotation property dictionary.'
type: string
annotations:
items: {$ref: '#/definitions/printf_annotation'}
type: array
text:
description: A print format string.
type: string
vars:
description: >
An ordered list of variables for substitution into the accompanying
print format string. Each entry must correspond to the name of a variable
specified in either the 'vars' field or the data_var field of the pattern.
Where an OWL entity is specified, the label for the OWL entity should be
used in the substitution.
items: {type: string}
type: array
We'd need a new type of annotation field in order to support the value of an annotation being an OWL entity (e.g. as in subset declarations). Presumably this would be passed as a URI string?
value_annotation:
type: object
additionalProperties: False
required: [annotationProperty, value]
properties:
annotation_property:
type: string
value:
type: string # a string in JSON but taking var specifying a URI...
annotations:
type: array
items: { oneOf: [{ $ref: '#/definitions/printf_annotation' },
$ref: '#/definitions/value_annotation' }]
The annotations field on printf_annotation should be updated to take both types of annotation too.
On 14 Feb 2017, at 11:05, Jim Balhoff wrote:
I like @dosumis's example. I do think we should consider the case you mention where the annotation value is not text. We're already mashing resources into literals in LEGO, e.g.
<http://purl.org/dc/elements/1.1/contributor> "http://orcid.org/0000-0003-2689-5511"^^<http://www.w3.org/2001/XMLSchema#string>
. This makes it awkward to query people's contributions with SPARQL.
ugh, yes.
another thing to watch for, when translating directly to OWL, we need to know if the predicate is an AP or an OP or a DP. Of course this is also the case in going from any RDF translation too.
We could replace the
text
property withvalue
. But we need to allow both literals and resources, so we can't use JSON-LD to say that the value ofvalue
is always a resource. So a resource value would always need to be an object like{"@id": "http://orcid.org/0000-0003-2689-5511"}
. Kind of verbose.Alternatively we could keep
text
and add another property likeresource
(is there a better term?). People could use one or the other.
that's probably simplest.
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/dosumis/dead_simple_owl_design_patterns/issues/24#issuecomment-279802783
I've implemented the following solution in spec/DOSDP_schema_full.yaml. (I'm planning to split this file and use JSON schema imports later. Need to switch to JSON for that to work.)
Instance graph:
instance_graph:
type: object
additionalProperties: False
required: [nodes, edges]
properties:
nodes:
description: >
Key = name of individual within this pattern doc
Value = Type of individual specified using either
the quoted name of a class in the class dictionary of this pattern
or a var name. This field does not support typing via
anonymous class expressions
type: object
edges:
type: object
additionalProperties: False
required: [edge]
properties:
edge:
description: >
A triple specified as an ordered array with 3 elements
[subject, rel, object]
* rel must be the quoted name of a relation from the relations
(object property) dictionary.
* subject and object must be the name of an individual
specified in the nodes field.
type: array
items: { type: string }
minItems: 3
maxItems: 3
annotations:
type: array
items: { $ref: '#/definitions/annotation' }
not:
description: "Optional field for negated OPAs"
type: boolean
This uses a generic solution for annotating axioms:
printf_annotation:
type: object
additionalProperties: False
required: [annotationProperty, text, vars]
properties:
annotationProperty:
description: >
A string corresponding to the rdfs:label
of an owl annotation property. If the annotation property has no label,
the shortForm ID should be used. The annotation property must be listed
in the annotation property dictionary.'
type: string
annotations:
items: { $ref: "#/definitions/annotation" }
type: array
text:
description: A print format string.
type: string
vars:
description: >
An ordered list of variables for substitution into the accompanying
print format string. Each entry must correspond to the name of a variable
specified in either the 'vars' field or the data_var field of the pattern.
Where an OWL entity is specified, the label for the OWL entity should be
used in the substitution.
items: {type: string}
type: array
list_annotation:
type: object
additionalProperties: False
required: [annotationProperty, value]
properties:
annotationProperty:
description: >
A string corresponding to the rdfs:label
of an owl annotation property. If the annotation property has no label,
the shortForm ID should be used. The annotation property must be listed
in the annotation property dictionary.'
type: string
value:
description: >
A single list variable (list_var or data_list_var). Each item in this list
should be used to generate a separate annotation axiom.
type: string
annotation:
oneOf:
- { $ref: "#/definitions/printf_annotation" }
- { $ref: "#/definitions/list_annotation" }
var specification is getting a bit complicated:
vars:
type: object
description: >
A dictionary of variables ranging over OWL classes.
Key = variable name, value = variable range as manchester syntax string.
list_vars:
type: object
description: >
A dictionary of variables referring to lists of owl classes.
Key = variable name, value = variable range of items in list specified as a valid OWL
data-type.
data_vars:
type: object
description: >
A dictionary of variables ranging over OWL data-types.
Key = variable name, value = variable range specified as a valid OWL
data-type.
data_list_vars:
description: >
A dictionary of variables referring to lists of some specified OWL data-types.
Key = variable name, value = variable range of all items in list,
specified as a valid OWL data-type.
This could potentially be simplified to just vars and list with the specification of range for each variable working to distinguish types, but I think this is probably too much of a burden on development. I've also designed some OBO convenience fields for axiom annotation, but these are not (so far) permitted in the instance graph.
@dosumis shouldn't edges:
be of type array
? I think you need to define an Edge
object type to go in the array.
@dosumis shouldn't edges: be of type array? I think you need to define an Edge object type to go in the array.
Ooops. You're right.
I would like to give @DoctorBud something to work with for implementing a generic annoton pattern
pattern: basic_annoton
relations:
enabled by: RO:0002333
occurs in: BFO:0000066
part of: BFO:0000050
classes:
gene product or complex: TODO
molecular function : GO:0003674
biological process: GO:0008150
cellular component: GO:0005575
vars:
gene product: "'gene product'"
molecular function : "'molecular function'"
biological process: "'biological process'"
cellular component: "'cellular component'"
instance_graph:
nodes:
gp: gene product
mf: molecular function
bp: biological process
cc: cellular component
edges:
- edge: [mf, 'enabled by', gp]
annotations: ?
- edge: [mf, 'occurs in', cc]
annotations: ?
- edge: [mf, 'part of', bp]
annotations: ?
This is for: https://github.com/geneontology/noctua/issues/461
I wonder if it's necessary to specify the full evidence model every time. Can we just have a generic placeholder for 'insert evidence here'.
We need to co-ordinate the semantics of LEGO models with each other and with the semantics of the various ontologies used. To support this, curators need access to templates which they can insert into their models and which include variable slots with specified constraints. These templates should be specified in the same documents as design patterns - allowing them to share dictionaries and variables. As for design patterns, that aim here is to maximise ease of reading and editing while maintaining ease of parsing. To this end, the spec uses lookup dicts to allow quoted, human readable names to be used to specify axioms.
The following draft specification extends DOS-DP schema core:
Where:
CC @cmungall @balhoff @thomaspd