OBOFoundry / OBOFoundry.github.io

Metadata and website for the Open Bio Ontologies Foundry Ontology Registry
http://obofoundry.org
Other
165 stars 203 forks source link

What are the minimal ontology-level metadata fields we would expect to see in an OBO ontology? #1365

Open cmungall opened 3 years ago

cmungall commented 3 years ago

We have a good schema for what metadata about an ontology we collect centrally in this repo

But what are the minimum fields we expect to see in an ontology header for a good ontology? I know we have various checks for this in the dashboard but do we have a more declarative specification (e.g. shex?)

Use case: The Alliance wants to show information about each ontology they have loaded in the database. While they could just show this in an open-ended manner (e.g. as OLS does it https://www.ebi.ac.uk/ols/ontologies/go), it is better if there is a predictable structure.

I feel there should be a doc we can point groups like this to! Is this in the realm of OMO?

I will have a go at my answer here, but I think this should be doc'd outside a ticket

One caveat to the above is that the two IRI fields are not very user friendly. Clicking on them renders a giant OWL file. This is not something we would want to show on a portal aimed at biologists.

Surprisingly, there is not a field that yields the ontology prefix (GO, OBI). This has to be done programmatically by munging the ontology IRI. This seems far from ideal.

Same for version. At least here we have the more informative versionInfo (this is what bioportal uses). However, this is not populated for many ontologies #771 -- including GO, oops.

Another caveat is that even foundry ontologies may not have all of the above populated. For example, ZFA (which is used by the Alliance) lacks title, description. They also have two values for license (consistent: one is "CC-BY" the other is the URL). I would stress this is not ZFA's fault - they have simply not been asked to do this or provided tools to check for presence/absence/cardinality of ontology metadata.

I think we need a clear computable schema that can be used (1) by portal developers e.g. in the alliance, as well as OLS, BioPortal (2) can be used to check e.g. in the dashboard but also in robot.

In addition to the above fields, there are other fields that are useful to display but are not consistently populated:

matentzn commented 3 years ago

Amen. I will try to work out how to do that with shex, in the meantime, lets collect thoughts on good metadata fields.

I could prepare a basic shex profile to cover this.

matentzn commented 3 years ago

Ha, this was great fun! The current version of the shape is here:

:OBOOntologyShape CLOSED {{
  a [owl:Ontology];
  owl:versionIRI IRI;
  dc:creator xsd:string*;
  dc:contributor xsd:string*;
  dc:title xsd:string;
  dc:date xsd:dateTime?;
  dc:description xsd:string;
  dcterms:license IRI;
  owl:versionInfo xsd:string;
  protege:defaultLanguage xsd:string?;
  rdfs:comment xsd:string*;
  dc:subject xsd:string?;
  obo:IAO_0000700 IRI*;
  dc:type IRI?
}}

I have implemented it in an example notebook here, running it against envo, wbphenotype and cl as examples. I have never worked with shapes seriously but now that I see them.. loving it (of course we are here only scratching the very outer surface). Keep more such tickets coming @cmungall (and everyone else).

jamesaoverton commented 3 years ago

This is a worthwhile goal, and @matentzn's schema seems like a good start. I don't have anything to add to that, but I had a few related thoughts:

cmungall commented 3 years ago

Yes, I love having banks of sparql queries but I like having this abstracted to a structure like shex - especially with complex constraints that link objects to other objects.

Some shex validation frameworks such as @hsolbrig's PyShEx can work by crawling a triplestore recursively executing SPARQL queries. I think it would also be possible to translate from ShEx to SPARQL, which might be nice too.

The redundancy between this and json-schema based checks over the registry yaml/json is mildly dissatisfying, but I feel there is a path to unification (our yaml is actually yaml-ld, and has an rdf form...)

I just realized I abandoned this a while ago, not much there, but it gives an idea of how shapes could be used to check classes too. This is especially useful for certain profiles of ontologies: https://github.com/cmungall/obo-shapes

We are using shex heavily for aboxes in GO, e.g. here is what a GO MF instance looks like:

https://github.com/geneontology/go-shapes/blob/e05a415d8b5178c4ac2b4662d42171d14f19a1cf/shapes/go-cam-shapes.shex#L364-L386

Once nice feature is that every aspect of the shape can be arbitrarily annotated with annotations, e.g. seeAlso linking to a ticket