ga4gh / ga4gh-schemas

Models and APIs for Genomic data. RETIRED 2018-01-24
http://ga4gh.org
Apache License 2.0
214 stars 114 forks source link

Format of ids for GAOntologyTerm #165

Closed cmungall closed 7 years ago

cmungall commented 9 years ago

Current docs state:

  /**
  The ID defined by the external onotology source.
  (e.g. `http://purl.obolibrary.org/obo/OBI_0001271`)
  */
  string id;

This is fairly open ended and we can imagine confusion and inconsistent usage here.

For the ontologies currently referenced in the metadata schema, e.g.

Terms are typically referenced in two ways.

URIs/IRIs

For many biological ontologies these are typically obolibrary purls, which follow:

http://purl.obolibrary.org/obo/<IDSPACE>_<NUMERICFRAGMENT>

See: http://www.obofoundry.org/id-policy.shtml

OBO-Style identifiers

Typically follow the form

<IDSPACE>:<NUMERICFRAGMENT>

Options

  1. The schema should mandate URIs only (using the URI form recommended by the source ontology)
  2. The schema should mandate OBO-Style IDs
  3. The schema should have separate 'id' and 'iri' fields
  4. The schema should have a flexible field

Option 1 is probably the conceptually simplest. Option 2 is not very future proof as it doesn't allow open-ended expansion to any ontology out there on the semantic web. Option 3 is probably overkill.

I would advocate option 4. To elaborate, we allow the field to contain either a URI or a CURIE (https://en.wikipedia.org/wiki/CURIE see also http://www.w3.org/TR/curie/), without the brackets. We then assume the existence of a number of implicit qname prefixes. E.g.

@prefix UBERON http://purl.obolibrary.org/obo/UBERON_
@prefix CL http://purl.obolibrary.org/obo/CL_
@prefix OBI http://purl.obolibrary.org/obo/OBI_
@prefix NCBITaxon http://purl.obolibrary.org/obo/NCBITaxon_

This could potentially live in a separate JSON-LD context file.

This is also consistent with the translation in the OBO-Format spec: http://oboformat.googlecode.com/svn/trunk/doc/obo-syntax.html#5.9.1

I would be happy to branch and make a pull request, but I thought it worthwhile polling for opinions. Need this to be future-proof, consistent - but also not over-engineered.

antbro commented 9 years ago

Was it really so off topic? (any more than other posts in the thread, e.g., questions of how to refer to multiple ontology terms per individual) To be clear, I was not seeking a discussion, just an answer to a question centrally related to ontology IDs... ...so can I take it from your response ("create a new issue") that the answer to my question is "we have not talked about that aspect yet"? Cheers Tony

@antbro https://github.com/antbro Can we keep to the issue of the topic please? =] Do by all means create a new issue!

— Reply to this email directly or view it on GitHub https://github.com/ga4gh/schemas/issues/165#issuecomment-113450486.

mbaudis commented 9 years ago

@antbro What you refer is not part of the ontologyTerm object itself, but could be defined through some kind of "evidence" objects. This is under development in G2P, I think, but should be moved "mainline".

Can we maybe start this over, through a PR against https://github.com/ga4gh/schemas/blob/master/src/main/resources/avro/ontologies.avdl ? I have moved the metadata implementations to this branch.

mellybelly commented 9 years ago

Agree with @mbaudis and @Relequestual This ticket has meandered (and I am partially to blame for this above, reporting on Leiden discussion). Can everyone please make new tickets for these individual items and keep this one only to how avro references IDs for OntologyTerm?

@antbro please review G2P schema that was recently accepted and see if this addresses your questions sufficiently https://github.com/ga4gh/schemas/blob/master/src/main/resources/avro/genotypephenotype.avdl#L77

Please make tickets there for gaps/issues, much appreciated.

david4096 commented 8 years ago

After having worked with the existing Ontology model for a while, we've proposed some small changes that should close this issue. https://github.com/ga4gh/schemas/pull/694

david4096 commented 7 years ago

Closed with https://github.com/ga4gh/schemas/pull/694 , we clearly state the term_id instead of id.

Continuing discussion regarding the use of ontology terms continues here.