Closed dosumis closed 5 years ago
I'm not loving it...
I'll look at the proposal for annotation grouping later (offline right now).
Not too surprised. One big negative is that it screws with our ability to add disjoints.
(The proposal at the top of this ticket should be ignored. It's dumb.)
MF design patterns need to cope with the compound nature of many molecular functions while (a) keeping classification that is intuitive to biologists; (b) supporting the curation of LEGO (GO-CAM) models with unbroken chains of causal relations; (c) being easy for LEGO (GO-CAM) curators to use; (d) keeping LEGO (GO-CAM) models as simple as possible: curators should be able to choose whether or not to include subfunctions with as little loss of information as possible.
Earlier attempts at defining compound functions used has_part (or some subProperty of it) for all components of a compound function including effector function. This approach is particularly bad at supporting unbroken causal chains (see this comment for an illustration: https://github.com/geneontology/molecular_function_refactoring/issues/31#issuecomment-282778313). They also suffered from somewhat unintuitive classification. Biologists typically expect classification under and effector function - so an RTK is_a kinase, PKA is_a kinase (not a transducer with parts cAMP sensor with kinase activity) and ATPase coupled K+ transporter is_a K+ transporter, a transcription factor is transcription regulator.
(In reading this proposal, please bear in mind that all inverses are assumed to be automatically inferred)
With this in place, we still get continuous chains of causal relations (and resulting inference) whether the regulatory edge points to the effector function or a subfunction that is causally related to it. We can also eliminate additional regulatory edges internal to compound functions in GO-CAM.
Any additional classification under Paul's new upper level classes (e.g. molecular transducer) should be inferable based on these design patterns.
Sketch:
"molecular transducer activity" EquivalentTo: molecular_function that has_regulatory_component some molecular_function ?
"phosphorylation sensing molecular transducer activity": EquivalentTo: molecular_function that has_regulatory_component some 'phosphorylation sensor activity'
(whether we want such high level classes is something we can decide separately, this just illustrates how we could get inference to them).
We might want an explicit has_component_function to => domain and range restriction to MFs.
We could flesh this out with more relations (has_energy_source ? e.g. for ATPase-coupled transporter example?)
TF activity - modified from one of Astrid's test models: Note that we get complete regulatory chains whether going via the DNA binding component or the effector (genus).
(All inverses are inferred in GO-CAM models, so we could flip has_necessary_component -> necessary_component_of in the opposite direction if that is clearer)
Inferred classic GO annotations:
Can we also get these ? (depends on #49)
Inferred upper level MFs:
(* Assuming we want this class.)
Use of these patterns would, of course, be eased by the definition of GO-CAM template patterns to go along with the ontology design patterns. In this case a design pattern would drive a table something like this allowing input of DP components:
DNA binding | transcription type | regulatory effect |
---|---|---|
{ Sequence specific DNA binding } | {transcription, DNA templated } | { directly_regulates } |
RNA pol II regulatory region sequence specific DNA binding | transcription from RNA pol II promoter | directly_positively_regulates |
First row shows range class/relation; second row shows example fillers.
There has been some discussion of how we might represent logic gates in LEGO. This may be beyond the expressiveness of OWL, but we could, at some point in future, add support internal component nodes representing logic gates, that sit between regulatory component and the effector function that correspond to logic gates.
@cmungall @thomaspd @ukemi Comments please.
Add to agenda for tomorrow?
@vanaukenk
In absence of objections, I'm starting implementation.
[ ] Redraft other patterns to fit new schema:
[x] transducer
[ ] receptors
[ ] ATPase coupled transporters
[ ] TFs
This issue was moved to geneontology/go-ontology#16972
We have many compound functions in GO. Sometimes this is reflected in multiple axes of classification. For example: 'ATPase activity, coupled to transmembrane movement of substances' is classified both under transmembrane transporter activity and under 'ATPase activity'. In other cases one component of the compound function is used for classification while another has a has_part relationship to the compound function.
For automated classification, using multiple has_part relationships would work well. has_part also works well for LEGO templates, as it exposes the individual components so that regulation edges can be linked directly to them. Unfortunately has_part is useless for grouping of annotations (although see this proposal: ). Also, going over to entirely using has_part would break some of the existing heirarchy in places where it feels intuitively right. For example, receptor tyrosine kinases are classified as kinases as well as receptors.
Is there a logical way we can get around this conundrum?
Would this GCI be crazy?
'molecular function that has_component some X' SubClassOf X
?
Could be added programmatically for MFs that are components of other MFs. has_component potentially less damaging as it is non-transitive.
CC @cmungall