geneontology / amigo

AmiGO is the public interface for the Gene Ontology.
http://amigo.geneontology.org
BSD 3-Clause "New" or "Revised" License
29 stars 17 forks source link

Refactor annotation extension facet / make more parts of annotation extensions searchable #201

Open cmungall opened 9 years ago

cmungall commented 9 years ago

Currently we expose a facet 'annotation extensions', which is useful for curators but does not directly relate to a meaningful biological question. Also, it assumes classes (see http://jira.geneontology.org/browse/GO-838)

We will refactor this to expose facets that are biological categories. We will start with 'participant' and 'location' (TBD: how should the hierarchy of facets be handled in the UI?)

The majority of the work here is in the golr loader; the only change in this codebase is a trivial addition/replacement in the yaml. This would be a standard 4-tuple (id,label,closure,closure-label), though in fact only the closure parts would be used.

Population of closure

Each field is associated with 1 or more OPs. For example

Call this specified set P_f. Call the union of P_f and inferred subproperties P_f*.

For each gene association, we walk the graph

  1. From the assigned GO class
  2. From the extension class, if the extension relation is in P_f*.

Note that in either case, the walking should only follow OPs in P_f*.

The PoorMansReasoning strategy is to use the OGW graph walking code specifying P_f.

A cleaner approach is to replace steps 1 and 2 with the following:

  1. Translate the combo of the class and extension to an OWL anon class, e.g. C and R some Y, as is done for GAF validation and as specified in the extensions paper. Call this C'
  2. Use the materialized expression reasoner to find all reflexive ancestors of C' over every P in P_f.

This is guaranteed complete for EL. The PMR strategy may have edges but may be good enough. @hdietze to investigate.

Note that in all cases we treat the filler in the annotation extension as a class. For completeness we should ensure that we include SubClassOf some SO:gene | PR:protein etc to allow the GO-838 query in a seamless way.

Examples for 'location' facet

location_closure should contain interneuron, neuron, etc as well as 'nervous system', since the initial relation R is in P_f', and the classes are in the subClass + partOf path (rule 2)

Note also that we expect the same thing if a precomposed term is used

So long as go-plus is loaded the path will be the same (rule 1)

TBD

For completeness, the (implicit or explicit) annotation relationship must be considered. E.g. for the location facet, and a direct annotation to 'axon' (here the location closure blends into the existing isa-partof one).

cc @dosumis

cmungall commented 9 years ago

It should be noted that these fields should provide the obvious anchor points for the somewhat pointless at present amigo pages for CL, CHEBI classes etc

kltm commented 9 years ago

From my initial comments:

Currently in AmiGO, annotation extensions are stored as 1) a blob (annotation_extension_json) and 2) annotation extension classes (including class, class_label, the closures, searchables, etc.).

However, this means that you are unable to do things like find the target of an activity (e.g. has_input(UniProtKB:O89046) by normal searching methods. Moreover, even power tools like gannet are unable to do much here since the blob is not searchable. (The only way would be CLI tools, like the bbop libs, and parsing the output yourself).

The goal here would be to make a satisfying amount of the annotation extension structure available to the standard AmiGO interface.

This also means that we'll need to add more fields to the load.

cmungall commented 9 years ago

@kltm - yes this will require additional closure fields such as participant as specified above.

cmungall commented 8 years ago

Note that the specification here can be generalized to include population of the current isa_partof_closure and regulates_closure fields. This provides the natural solution to https://github.com/geneontology/amigo/issues/267