molecular transducer - Githubissues

dosumis commented 7 years ago

For more details see refactoring proposal in WebProtege

Proposed class:

molecular transducer activity (we already have this)

Proposed text def:

The molecular function that accepts an input of one form and creates an output of a different form.

DOS comment: This def sucks! Surely covers all enzymes. Maybe define in terms of information or energy transfer?

Proposed classification(s) / example subclasses

This could be the basis for a general design pattern that covers receptor

Usefulness for biologists:

Does this provide a useful grouping of molecular functions that cannot equally be provided by an existing biological process? if so please describe

Usefulness in ontology construction:

Is the class useful for error checking (via disjointness axioms)?

Is this class useful for design patterns?

Is this class useful for curators - e.g. used in templates or guiding template usage?

Sustainability

Does this class require multiple inheritance classification?

If yes please provide details/examples.

Can we use design patterns & reasoning to populate subclasses?

If yes, please provide details

Paul proposes:

This could be a generalization of the design-pattern/template for receptor if we can successfully pin down molecular sensor and specific molecular function reg.

Probably best to return to this once we've worked on the various subclasses.

dosumis commented 7 years ago

New generic pattern for molecular sensor. This uses specialized subproperties of has_part which, by convention, are intended to record upstream (sensor) regulating downstream (effector). As with the original proposal for defining receptors using has_ligand, the internal regulation edge is only implicit on the class level, but is made explicit in the corresponding LEGO template.

The following patterns are from a test model. (atomic function is just the name for a simple (non-compound) function in this test ontology.)

(Note that photoreceptor is not classified as a receptor. Receptors receive molecular signals according to our definition.)

LEGO templates

dosumis commented 7 years ago

Signal transducer.

signal transducer activity is currently defined as: molecular_function that part_of some 'signal transduction'

This term is entirely redundant with BP and so we should consider obsoleting. If not, the class is still not a molecular transducer as defined here, because it has not all signal transducers have a sensing subfunction. For example, MAPK is a signal transducer but is activated by covalent modification rather than sensing some signal via binding (unless anyone want an MF: 'subfunction phosphorylation state detector activity'?...).

dosumis commented 7 years ago

Relationship to sensor activity

We get rid of this grouping class. Nothing is a sensor without an effector. But we need some detector classes (for pH, redox; membrane potential (AKA voltage)). This allows us to have classes like:

calcium sensing transducer activity EquivalentTo: molecular transducer that has_sensor some calcium binding

redox sensing transducer activity EquivalentTo: molecular transducer that has_sensor some redox state detector

A step too far?

We may be able to safely link to regulation of X concentration in Y terms.

For example:

'regulation of cytosolic calcium levels' (SubClassOf) regulates some (calcium sensing transducer activity that occurs_in some cytosol)

dosumis commented 7 years ago

Relationship to ligands: see https://github.com/geneontology/molecular_function_refactoring/issues/30

dosumis commented 7 years ago

Transcription factor design/template pattern is likely to use a compound function in the effector slot, so the general pattern should be broadened to 'molecular function'. I'm slightly worried about letting curators loose with such a powerful pattern though. And we need to think carefully about rules for inference of annotation in GPAD. Do we need to follow part chains transitively? If we do so, will this be opaque/confusing to curators?

thomaspd commented 7 years ago

Regarding sensing classes, here's the latest suggestion: screen shot 2017-02-20 at 10 39 25 am

Importantly it includes terms for sensing a small molecule second messenger (so we can handle indirect regulation via small molecules) and post-translational modification states (so multiple detection events can ultimately be integrated)

dosumis commented 7 years ago

Sensors, transducers, direct regulation and LEGO - discussion doc

dosumis commented 7 years ago

Is it possible for a sensor to exist outside of a transducer? Isn't this like one hand clapping?

dosumis commented 7 years ago

A simpler proposal:

Continue to classify transducers by effector function (as most are now). Record sensor function via directly_regulated_by.

How this could work:

LEGO is an instance graph, so all inverses will be inferred:

directly positively regulated by inverseOf directly positively regulates enabled by inverseOf enables

Inference of inputs: directly positively regulates o enabled by -> has input

'transducer activity': EquivalentTo ('biochemical activity' that 'directly regulated by' some 'biochemical activity')

'phosphorylation state sensing transducer activity': EquivalentTo ('biochemical activity' that 'directly positively regulated by' some 'kinase activity') inferred: SubClassOf 'transducer activity'

'kinase activated kinase activity': EquivalentTo ('kinase activity that 'directly positively regulated by' some 'kinase activity') inferred: SubClassOf 'phosphorylation state sensing transducer activity'

'kinase activator activity': EquivalentTo ('molecular_function' that 'directly positively regulates' some kinase activity')

=> inferences:

GP1 enables (kinase activity 'has input' GP2)
GP1 enables 'kinase activator activity'
GP2 enables 'kinase activated kinase activity'

dosumis commented 7 years ago

simpler proposal example using binding

LEGO is an instance graph, so all inverses will be inferred:

directly positively regulated by inverseOf directly positively regulates enabled by inverseOf enables

Inference of inputs: directly positively regulates o enabled by -> has input

'transducer activity': EquivalentTo ('biochemical activity' that 'directly regulated by' some 'biochemical activity')

'protein binding sensor activity': EquivalentTo ('biochemical activity' that 'directly positively regulated by' some 'protein binding activity') SubClassOf 'has part' some 'protein binding activity' ## Hidden GCI / could use 'has sensor' inferred: SubClassOf 'transducer activity' # Could use

'protein binding activated kinase activity': EquivalentTo ('kinase activity that 'directly positively regulated by' some 'protein binding') SubClassOf 'has part' some 'protein binding activity' inferred: SubClassOf 'protein binding sensor activity'

'kinase activator activity': EquivalentTo ('molecular_function' that 'directly positively regulates' some kinase activity')

=> inferences:

GP1 'enables kinase activator activity'
GP2 enables 'protein binding activated kinase activity'

Missing inference: This infers that the activity enabled by GP2 has part 'protein binding', but it does not infer that GP1 is the input to this. Does this matter?

dosumis commented 7 years ago

Pros:

Simple for LEGO curators to use. Less need to templates, and where they are used they can be simpler.
Simple ontology design patterns
Little or no churn in the ontology (The alternative - going the full route of splitting out compound functions into their own, disjoint hierarchy is a huge change and requires lots of careful re-engineering of MF. It also leads us to drop classifications that users expect (nuclear receptor activity is_a transcription factor activity; cAMP regulated GEF activity is_a GEF activity; RTK activity is_a kinase activity).

Cons:

Lack of clean distinction between compound and simple functions feels like a slightly dangerous route to take, ontology engineering-wise. It rules out at least some disjointness axioms.
If there are some cases where we can't avoid explicitly representing compound functions in LEGO then we need a way to bridge between these and the simple representation.
Some limited, high level manually maintained dual inheritance is needed for receptors - classes under effector function AND receptor activity.
It's not obvious how we would model integrator functions in this context (but it's not obvious how we are going to model these in OWL anyway).

dosumis commented 7 years ago

Small molecule regulated example - transducer

https://www.ncbi.nlm.nih.gov/pubmed/9856955

"a family of cAMP-binding proteins ... that exhibit both cAMP-binding and guanine nucleotide exchange factor (GEF) domains is reported. These cAMP-regulated GEFs (cAMP-GEFs) bind cAMP and selectively activate the Ras superfamily guanine nucleotide binding protein Rap1A in a cAMP-dependent but PKA-independent manner."

property chains:

internally_regulates o enabled_by -> enabled_by

'cAMP binding' EquivalentTo 'binding that 'has input' some cAMP

'cAMP sensing transducer activity': EquivalentTo ('molecular_function' that 'internally positively regulated by' some 'cAMP binding')

'cAMP activated GEF activity': EquivalentTo ('GEF' that 'internally positively regulated by' some 'cAMP binding')

'intracellular cAMP activated GEF activity' EquivalentTo ('GEF' that 'internally positively regulated by' some 'cAMP binding' and occurs in some 'intracellular'

inferences: GP1 enables 'intracellular cAMP activated GEF activity'

(Note - we need 'internally positively regulated to tie activity to one gene product)

small molecule regulated example - receptor

G-Protein coupled cAMP receptor (see cAR1-4 in dicty)

Note: has_ligand is needed as not everything that binds and regulates a receptors activity is necessarily considered a ligand. has ligand is also very hand for classifying receptors in the ontology

GPCR activity:
subClassOf: transmembrane (signaling) receptor activity subClassOf: GEF activity

*(This pattern does require manually managing some dual classifications)

'G-Protein coupled cAMP receptor activity' EquivalentTo: 'GPCR activity' that has_ligand some cAMP inferred: SubClassOf 'cAMP activated GEF activity'

inference: GP1 enables 'G-Protein coupled cAMP receptor activity'

Template versions:

dosumis commented 7 years ago

Receptor example - GP ligand

Pattern 1 - ligand perspective (more consistent with patterns above)

GCI has_ligand => has_part some (binding and has input some X) has_ligand range: (enables some 'receptor agonist activity')

Pattern 2 - receptor perspective

dosumis commented 7 years ago

CC @thomaspd @cmungall @ukemi: New, simpler proposal for transducers starting from https://github.com/geneontology/molecular_function_refactoring/issues/31#issuecomment-282256732 - Please review and comment.

thomaspd commented 7 years ago

It looks to me like these are ways to avoid the dualistic nature of protein binding-- that a single "event" can be represented as two functions, one for each binding partner. I agree the dualistic representation is not necessarily intuitive.

I do like the "reduced representation" model where a curator can skip the sensor function because it's implied by the binding function of the upstream gene product plus the directly_regulates relation. So this could be a useful curation pattern that we could automatically expand into the "dualistic representation" under the hood.
But of course we can't use this same reduced representation when the sensor function detects anything other than gene product binding, so we'd still need both functions for those examples. Because you suggested using internally regulates for these cases, they are still compound.
I agree we should aim to get the expected is_a relations for compound functions. But to do that, I think that even if we use a has_part relation to simple functions, it's proper to consider a compound function to be a subclass of each of its component simple functions. For example, a ligand-activated protein kinase receptor activity is a type of protein-binding activity, and a type of protein kinase activity. It is also true on the set level: all gene products that enable ligand-activated protein kinase receptor activity are all both (a) members of the set of all protein binding gene products, and (b) members of the set of all gene products that enable protein kinase activity.
I like your labels with the pattern "X-activated Y activity."

cmungall commented 7 years ago

Regarding your point 3:

The part about gene products should be uncontroversial. There are a number of clean and straightforward ways to get the inference.

For classification in the ontology: it's less clear this is desirable. In general, overloading of is-a is a bad idea, and can often come back to bite us. Note this means if we wan to implement some kind of model check that involves extracting all the binding nodes; this would pull back the compounds as well as their parts. Of course we can get around this by qualifying the query to remove these, but this has a knock-on effect on complexity. This would also mean that we wouldn't have a clean divide between disjoint branches in the ontology.

thomaspd commented 7 years ago

Yes, the important part is about gene products. I can see how is-a relations could lead to downstream issues. But if we don't have them, then most of the external software for inferring annotations to "indirectly" annotated classes won't work, as they rely on tracing relations in the ontology. This seems like a much bigger issue, and one that is beyond our control.

thomaspd commented 7 years ago

I also think David OS and I are in agreement that we should make annotation as simple as possible for curators. Along those lines, we should aim when possible to have one "overall" function for each gene product/complex, that can be expanded into a more detailed view showing subfunctions. For example, here's my latest on the canonical wnt signaling pathway: screen shot 2017-02-24 at 2 38 00 pm

One could then (optionally) expand any overall functions to see the subfunctions: screen shot 2017-02-24 at 3 12 27 pm

ukemi commented 7 years ago

I agree with Paul. Annotators should only see the necessary detail for an annotation and should be given the ability to drill down to more detail as is shown above. In all of David's models, I like the idea of a receptor agonist activity reflecting the binding of the ligand rather than just binding. It seems clearer for an annotator. Binding to receptors can be other than just agonist activity because lots of things bind to receptors that don't activate them. I think more explicit terminology is always better than less explicit terminology. I don't care for internally positively regulates. It is not clear to me how it differs from directly activates.

dosumis commented 7 years ago

I don't care for internally positively regulates. It is not clear to me how it differs from directly activates.

It is necessary to use different relations for regulation within a molecular function. If we use directly_positively_regulates to link sub-functions of a compound MF, it is automatically classified as its own activator. So, RTKs and PKA would both be classified, incorrectly, under 'protein kinase activator activity', when actually we want to reserve that for annotating proteins that activate the protein kinase activity of other proteins.

ukemi commented 7 years ago

Yup, you're right. So we either have to go this route or think of another way to represent activators. But now that you point this out, the label makes more sense.

dosumis commented 7 years ago

For classification in the ontology: it's less clear this is desirable. In general, overloading of is-a is a bad idea, and can often come back to bite us.

I agree, but the more I look into compound functions, the more I realize how tricky they are to work with in LEGO.

Focussing on how we expect these things to be used in LEGO.

A major aim of LEGO is to capture chains of MFs where each MF in the chain directly regulates the activity of the next. An unbroken chain of regulates relations allows for inference of regulation along the chain. Each link in the same chain allows us to capture inputs for biochemical activities.

With only simple functions this is straightforward:

GP1 enables MF1 directly_regulates MF2 GP2 enables MF2 directly_regulates MF3 GP3 enables MF3 => MF1 regulates MF3 MF1 has_input GP2 MF2 has_input GP3

And from this we can infer activator/inhibitor activities and transducer functions as above.

Compound, transducer functions have their own internal regulatory edge. This allows us (potentially) to generate chains of MFs by relating the effector function of one to the sensor function of another.

GP1 enables MF1 MF1 has_part effector(e1) GP2 enables MF2 GP2 has_part sensor(s2) GP2 has_part effector(e2) s2 internally_positively_regulates e2 e1 ---????---->s2

in the case of binding, one possibility (as we've discussed) is to have only one node for e1 and s2.
(But then what about endogenous receptor antagonists? )

Paul has made the radical suggestion that we have terms for modification state sensors. In that case we'd have:

GP1 enables MF1 MF1 has_part kinase (k) GP2 enables MF2 GP2 has_part phosphorylation state sensor(pss) GP2 has_part effector(e1) s2 internally_positively_regulates e2 k ---????---->pss

Perhaps we shouldn't be displaying this an edge, but as two nodes that can snap/plug together.

And with compound functions, how do/should we expect curators to use directly_regulates?

Being able to record that something directly regulates a binding sensor function is useful (e.g. for annotating a co-receptor,

What about a directly_regulates to the compound function itself (which curators will surely do). Would we want inference to regulation of effector function (directly_regulates o has_part -> directly_regulates ? This wouldn't work because it wouldn't target only the effector function. We'd need directly_regulates o has_effector -> directly_regulates )

The more I look at it, the more I think that displaying compound functions should be avoided as far as possible. The convention should be to use directly_regulates wherever possible and infer the rest. Requiring explicit compound functions feels like a big burden on curators and I can't see an obvious way to do compound functions under the hood.

The is_a overloading is worrying, but it only really affects protein binding.

dosumis commented 7 years ago

New pattern - now in MF refactoring branch:

dosumis commented 7 years ago

CC @thomaspd

pgaudet commented 5 years ago

This is implemented.

geneontology / molecular_function_refactoring

molecular transducer #31

Proposed class:

Proposed text def:

Proposed classification(s) / example subclasses

Usefulness for biologists:

Usefulness in ontology construction:

Is the class useful for error checking (via disjointness axioms)?

Is this class useful for design patterns?

Is this class useful for curators - e.g. used in templates or guiding template usage?

Sustainability

Does this class require multiple inheritance classification?

Can we use design patterns & reasoning to populate subclasses?

LEGO templates

Relationship to sensor activity

A step too far?

A simpler proposal:

simpler proposal example using binding

Template versions: