Open helenp opened 9 years ago
seems good to me. +1.
versioning: in OWL the correct way to do this is to use versionIRI. Not all OWL ontologies do this, but the ones that do not are becoming outliers.
the versionIRI can be any IRI. In the context of the OBO Library, the versionIRI follows a standard pattern (base PURL of the ontology followed by an optional releases
followed by the ISO-8601 YYYY-MM-DD). Furthermore, any versionIRI that follows this standard pattern has a standard reduced form in obo format.
For example:
http://purl.obolibrary.org/obo/pato/releases/2015-04-09/pato.owl
Resolves to OWL with the following in the header:
<owl:versionIRI rdf:resource="http://purl.obolibrary.org/obo/pato/releases/2015-04-09/pato.owl"/>
If you look in the corresponding obo file:
data-version: releases/2015-04-09
Not all ontologies of relevance to GA4GH provide resolvable versionIRIs. However, I have been working on a mechanism that makes this easy for any ontology managed in github, and I have been successful in migrating many ontologies to GH. Is there a list or registry, e.g. in yaml format that shows all ontologies of interest? I can annotate this list with the version policy for each ontology, and with the help of folks here like @helenp and @mellybelly encourage movement to a well-defined system. Comments on the OBO version system are also more than welcome.
There is a potential problem with the current ontologies.avdl.
Currently the OntologyTerm has an identifier
which might reasonably be expected to be a primary key (or in programmatic terms can be used as a key in a lookup table; or if we were to use JSON-LD this would be the @id
that denotes the RDF-resource).
The record also has a version field. Regardless of the format of this version field, we have a potential major problem because the same identifier may denote different versions of an OntologyTerm within a single GA4GH compliant source. It will be difficult to define coherent services this way.
Some options:
i.e. using the same ID would always return the same OntologyTerm object.
This has some nice properties, but the boat has long sailed on this one.
We could implement this by folding the version into the identifier... but this would be highly impractical
An OT would be uniquely identified by a (ID, ontologyVersion) tuple. But this would be unintuitive, and undesirable for various reasons - the JSON-LD would be fairly impractical if we go that route
introduce an extra layer, like this:
record OntologyReference {
String ontologyVersion;
OntologyTerm ontologyTerm;
}
record OntologyTerm {
String id;
String label; /* mutable */
}
other parts of the schema would use ontologyReferences to make annotations
In other words, it is the act of referencing that is associated with an ontology version. "I used the version of HP:1234 from 2015-01-01".
(it may be the case that OntologyReference will be extended into a generic oban-style annotation object, but I'd like to separate that discussion for now)
@cmungall Regarding version conflicts: While this could not be enforced through the schema, one could just describe the recommended order of precedence (i.e. idWithVersion overrides separate version).
One problem with the reference is that we would be nesting even deeper:
The fast edit experimental working metadata version of ontologies.avdl
resides on the metadata branch https://github.com/ga4gh/schemas/blob/metadata/src/main/resources/avro/ontologies.avdl
Questions:
@cmungall Thanks for the comments @mbaudis suggest we change spec to indicate where version info can be found and EBI will collate these with the CURIES as proposed by Chris.
@cmungall @helenp So, could you please provide an example (pseudo code is fine)?
Be easier to do this with a picture but I'll try here (keep in mind this is overly simplified and lots of things are being left out):
Patient-object:id1 ---exhibitsphenotype--> Association-object:(includes date/version)---classifiedby--> OntologyTerm:idX
Nesting per se is not a problem. If it is need, then it is needed.
Following discussion on the MTT here is a proposal refining the definition and representation of ontologies and annotations in ontology.avdl.
Intent: The GA4GH Ontology schema provides structures for unambiguous references to ontological concepts and/or controlled vocabularies within AVRO. The structures provided are not intended for de novo modeling of ontologies, or representing complete ontologies within AVRO. References to e.g. classes from external ontologies or controlled vocabularies should be interpreted only in their original context i.e. the source ontology.
Usage Multiple ontology terms can be supplied e.g. to describe a series of phenotypes for a specific sample. The ontology.avdl is not intended to model relationships between terms, or to provide mappings between ontologies for the same concept. Should an OntologyTerm be unavailable, or terms unmapped then an 'annotation' can be provided which can later be mapped to an ontology term using a service designed for this. Using OntologyTerm is preferred to using Annotation. Though annotations can be supplied with related ontology terms if desired. A use case could be when a free text annotation is very specific and a more general OntologyTerm is supplied.
New: Annotation - A free text annotation which is not an ontology term describing some attribute. Annotations have associations with OntologyTerms to allow these to be added after annotations are captured. OntologyTerms are preferred over Annotations in all cases. Annotations can be used in conjucntion with OntologyTerms
Newly defined OntologyTerm - the preferred term for the class in question. For example http://purl.obolibrary.org/obo/HP_0011927 preferred term is 'short digit' and synonym is 'VERY SHORT DIGIT'. 'short digit' is the term that should be used.
Newly defined OntologyTerm identifier - An identifier for a single ontology term from a single ontology source specified as a CURIE (preferred) or PURL
Newly defined OntologySource - the name of ontology from which the term is obtained. e.g. 'Human Phenotype Ontology'
Newly Defined OntologySource identifier - the identifier -a CURIE (preferred) or PURL for an ontology source e.g. http://purl.obolibrary.org/obo/hp.obo
Newly defined OntologySource version - the version of the ontology from which the OntologyTerm is obtained. E.g. 2.6.1. There is no standard for ontology versioning and some frequently released ontologies may use a datestamp, or build number.