geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
220 stars 40 forks source link

Implement a system for keeping partonomies in sync (Uberon & CC -> BP) #12658

Open dosumis opened 8 years ago

dosumis commented 8 years ago

We have many cases where we manually maintain partonomies between Uberon and BP (most notably under the development branch, and between CC and BP (most notably in the organization branch). Ideally we'd have some way to keep these part hierarchies in sync automatically.

Chris has been on record in the past as suggesting a script based approach. I still like the GCI approach as it can dynamically adapt to changes in Uberon. I fear a script will run only periodically and will inevitably cause problems when it lags.

Here's an example of the GCI approach (the example comes from a case where the difference in partonomy caused an inconsistency: https://github.com/geneontology/go-ontology/issues/12655)

Current pattern: 'circulatory system development' EquivalentTo: 'anatomical structure development' that 'results in development of' some ('part of' some 'circulatory system') Proposal - Add GCI: 'anatomical structure development' that 'results in development of' some ('part of' some 'circulatory system') SubClassOf part_of some 'circulatory system development' (*)

This is wordy enough that it would need to be implemented as part of a pattern system, i.e. it could be added as part of the pattern used to add any development term.

End users will, of course, want to see the part relationships directly, so we'd need a way to materialize these (non-redundantly) in the release files. @cmungall @baloff - is this possible with existing tools?

CC @cmungall @ukemi @balhoff - comments please.

* or fully denormalised: 'anatomical structure development' that 'results in development of' some ('part of' some 'circulatory system') SubClassOf part_of some 'anatomical structure development' that 'results in development of' some 'circulatory system' ?

cmungall commented 8 years ago

My initial dislike of the GCI system was based on

  1. difficulties keeping them in sync with ontology
  2. inferences are not materialized / easily visible in Protege

The first objection can be taken care of by DOSDPs. Currently I have a rogue extension illustrated here: https://github.com/obophenotype/bio-attribute-ontology/blob/ecca8208294ecbb0bcfea5449a91bea975bc21ae/src/ontology/patterns/entity_attribute.yaml#L42

This uses functional syntax, but I think it would be cleaner to have two manchester expressions for left and right sides (I think you suggested something like this)

The second is now taken care of by the obscure --materialize-gci option in owltools, illustrated here: https://github.com/obophenotype/bio-attribute-ontology/blob/ecca8208294ecbb0bcfea5449a91bea975bc21ae/src/ontology/Makefile#L144

it takes a materialized-expression reasoner to find R some Y direct parents and materializes. We should be able to add this to robot easily (mexr is distributed on maven)

We could run this as part of the per-commit inference pipeline and add so what OE imports, so OE users would see their part-ofs. Protege users would not see these, but then partonomy sucks in Protege anyway (the entailments would of course be there, just not obviously visible). Alternatively we could have something like the old pipeline that injects the inferences (tagged) back into the edit file, but I don't think we want to go back to that.

dosumis commented 8 years ago

Provisional plan

TBD

The former has the advantage that the GCI will be visible under the relevant GO class ('circulatory system development') in Protege 5. This may be better for editing. But perhaps this doesn't matter - the latter can never be wrong and could be entirely generated from imported Uberon classes.

To implement:

* May not be necessary if we use the second GCI pattern above, as we could simply generate relevant GCIs during import of Uberon terms, following some standard set of patterns.

pgaudet commented 6 years ago

@ukemi @dougli1sqrd @cmungall what's the status on this?

dosumis commented 5 years ago

AFAIK all the tools are in place to do this.

TODO:

  1. Update the GO repo to follow the latest ODK => support for pattern-based dev.
  2. Write two sets of patterns for each branch that needs fixing - one that matches current usage, and one with additional GCIs.
  3. For each - use DOSDP tools with the first pattern to generate tsvs. Substitute the second pattern to generate with GCIs.
  4. Add a statement to the MakeFile that generates a new file with materialised 'part of' relationships + an import statement on the editor's file to pull these axioms in.

Maybe worth having a brief meeting with @matentzn & myself if any questions about how to do this?

cmungall commented 5 years ago

Thanks for the summary David, v useful.

We can try putting this into practice in Geneva. Maybe we can call you and Nico then (will be nearly the same timezone).