OBOFoundry / OBOFoundry.github.io

Metadata and website for the Open Bio Ontologies Foundry Ontology Registry
http://obofoundry.org
Other
166 stars 204 forks source link

What are the minimal ontology-level metadata fields we would expect to see in an OBO ontology? #1365

Open cmungall opened 4 years ago

cmungall commented 4 years ago

We have a good schema for what metadata about an ontology we collect centrally in this repo

But what are the minimum fields we expect to see in an ontology header for a good ontology? I know we have various checks for this in the dashboard but do we have a more declarative specification (e.g. shex?)

Use case: The Alliance wants to show information about each ontology they have loaded in the database. While they could just show this in an open-ended manner (e.g. as OLS does it https://www.ebi.ac.uk/ols/ontologies/go), it is better if there is a predictable structure.

I feel there should be a doc we can point groups like this to! Is this in the realm of OMO?

I will have a go at my answer here, but I think this should be doc'd outside a ticket

One caveat to the above is that the two IRI fields are not very user friendly. Clicking on them renders a giant OWL file. This is not something we would want to show on a portal aimed at biologists.

Surprisingly, there is not a field that yields the ontology prefix (GO, OBI). This has to be done programmatically by munging the ontology IRI. This seems far from ideal.

Same for version. At least here we have the more informative versionInfo (this is what bioportal uses). However, this is not populated for many ontologies #771 -- including GO, oops.

Another caveat is that even foundry ontologies may not have all of the above populated. For example, ZFA (which is used by the Alliance) lacks title, description. They also have two values for license (consistent: one is "CC-BY" the other is the URL). I would stress this is not ZFA's fault - they have simply not been asked to do this or provided tools to check for presence/absence/cardinality of ontology metadata.

I think we need a clear computable schema that can be used (1) by portal developers e.g. in the alliance, as well as OLS, BioPortal (2) can be used to check e.g. in the dashboard but also in robot.

In addition to the above fields, there are other fields that are useful to display but are not consistently populated:

matentzn commented 4 years ago

Amen. I will try to work out how to do that with shex, in the meantime, lets collect thoughts on good metadata fields.

I could prepare a basic shex profile to cover this.

matentzn commented 4 years ago

Ha, this was great fun! The current version of the shape is here:

:OBOOntologyShape CLOSED {{
  a [owl:Ontology];
  owl:versionIRI IRI;
  dc:creator xsd:string*;
  dc:contributor xsd:string*;
  dc:title xsd:string;
  dc:date xsd:dateTime?;
  dc:description xsd:string;
  dcterms:license IRI;
  owl:versionInfo xsd:string;
  protege:defaultLanguage xsd:string?;
  rdfs:comment xsd:string*;
  dc:subject xsd:string?;
  obo:IAO_0000700 IRI*;
  dc:type IRI?
}}

I have implemented it in an example notebook here, running it against envo, wbphenotype and cl as examples. I have never worked with shapes seriously but now that I see them.. loving it (of course we are here only scratching the very outer surface). Keep more such tickets coming @cmungall (and everyone else).

jamesaoverton commented 4 years ago

This is a worthwhile goal, and @matentzn's schema seems like a good start. I don't have anything to add to that, but I had a few related thoughts:

cmungall commented 4 years ago

Yes, I love having banks of sparql queries but I like having this abstracted to a structure like shex - especially with complex constraints that link objects to other objects.

Some shex validation frameworks such as @hsolbrig's PyShEx can work by crawling a triplestore recursively executing SPARQL queries. I think it would also be possible to translate from ShEx to SPARQL, which might be nice too.

The redundancy between this and json-schema based checks over the registry yaml/json is mildly dissatisfying, but I feel there is a path to unification (our yaml is actually yaml-ld, and has an rdf form...)

I just realized I abandoned this a while ago, not much there, but it gives an idea of how shapes could be used to check classes too. This is especially useful for certain profiles of ontologies: https://github.com/cmungall/obo-shapes

We are using shex heavily for aboxes in GO, e.g. here is what a GO MF instance looks like:

https://github.com/geneontology/go-shapes/blob/e05a415d8b5178c4ac2b4662d42171d14f19a1cf/shapes/go-cam-shapes.shex#L364-L386

Once nice feature is that every aspect of the shape can be arbitrarily annotated with annotations, e.g. seeAlso linking to a ticket

nlharris commented 4 days ago

Should this stay open?

matentzn commented 3 days ago

I think this is important, but we do not have an appropriate role for handling issues like this. Its too unspecific to be handled by a project like Monarch, and too work intensive to just do as a side project for someone uninitiated in OBO.

I think looking back at this the validation aspect (all but comment numero uno) are a distraction.

If someone at EWG wants to tackle this issue maybe:

  1. Create a google docs with a table with all important ontology relationships, maybe inspired by an ubergraph query and Chris brain dump above
  2. Have columns for property, Optional/MUST/SHOULD, data type (IRI, string etc)
  3. Share it here for comments
  4. Create a documentation page with that table, and adding it to wherever @nataled things such recommendations belong
nataled commented 3 days ago

I have a non-implemented idea of where this could go, but we could/should push this more quickly than that ultimate page will be developed (could go into a(n) FAQ for now?). For the EWG to work on this, we'd need a list of what should be included (@matentzn should it be all the bulleted points in the first comment, or only the top set, or the list in your ShEx shape above...?)

matentzn commented 1 day ago

I started a table here:

https://docs.google.com/document/d/1fFGeLjRTEBPUXLDGMtvq8lZD2fRBVus4X6VmmNm4Se4/edit?tab=t.0

Maybe you and I do the first round, then we start looping in others?