geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
216 stars 39 forks source link

Representing the same MF's as 2 activities ( from Noctua workshop) #20413

Closed ValWood closed 6 months ago

ValWood commented 3 years ago

Representing the same MF's as 2 activities ( from Noctua workshop) This issue is described. in a number of tickets, but here is a more fully-fledged description of the issue, and possible solutions.

From Noctua modelling yesterday. I modelled MAPK pathway as
MAPKKK-> MAPKK->MAPK (based on existing annotations), but the reasoner infers "activator kinase activities" based on the term and the relationship describing the direction of regulation :

Screenshot 2020-11-18 at 09 08 25

These terms are not related in the ontology, and therefore generate redundant annotations describing the same activities.

Screenshot 2020-11-18 at 09 12 55

This is problematic for a number of reasons. Using fission yeast Cdc2 https://www.pombase.org/gene/SPBC11B10.09 as an example

Cdc2 is a cyclin-dependent protein serine/threonine kinase activity (the major CDK). It has >200 substrates.

For example it phosphorylates and inhibits clp1 (phosphatase)

clp1 involved in negative regulation of exit from mitosis during mitotic metaphase. So here it is acting as a “phosphatase inhibitor captivity”

would also be annotated as cyclin-dependent protein serine/threonine kinase activity has_input clp1 involved in negative regulation of exit from mitosis during mitotic metaphase (directly inhibits)

and the reasoner would generate another version of the same annotation phosphatase inhibitor captivity has_input clp1 involved in negative regulation of exit from mitosis during mitotic metaphase (directly inhibits)

Cdc2 also phosphorylates pom1 kinase so it is a "protein kinase activator or inhibitor" this would generate and additional annotation

plo1 (activator? ) wis1 MAP kinase (inhibitor?) Map kinase (inhibitor?) cmk2 MAPK activated protein kinase (inhibitor ?) rum1 a CDK inhibitor (so it is a cyclin dependent kinase inhibitor inhibitor?)

We would need the actual activity (cyclin-dependent protein serine/threonine kinase activity), AND the “effect” (activator or inhibitory) every time. Since cdc2 has a couple of hundred substrates display is already challenging without additional redundant annotations.

Moreover, it often isn’t even known if the phosphorylation is activator or inhibitory because all we know is: A phosphorylates B and Z occurs. This might be because A activates B. It could also be because A inactivates B and B no longer inhibits C and Z (the same outcome) occurs. In most cases we don’t really know because the data usually comes from genetics initially, not from biochemistry, but activation or inhibition do not indicate the pathway output.

These are possible solutions:

Option 1 Restrict the definitions of kinase activators and inhibitors to catalytically inactive regulatory subunits like cyclins

ADVANTAGES

DISADVANTAGES

Option 2

Keep duplicate annotation and display

ADVANTAGES

But since this is not related to the pathway outcome (i.e. activator kinases can both positively and negatively regulate a pathway), therefore knowing that a kinase activates or inactivates a specific substrate is not useful outside of a pathway context. Also, some kinases are both activators and inhibitors (cdc2 for example) so is this information is this actually useful in the context of go use for retrieving gene lists or for enrichment?

DISADVANTAGES

Option 3

Add ancestors for terms which are always “activator” or “inhibitory” for example MAP kinase phosphatase -> serine/threonine kinase inhibitor activity MAP kinase kinase activity -> protein serine/threonine kinase activator activity MAP kinase kinase kinase activity -> protein kinase activator activity

AND

Create specific terms as children covering both parents: protein phosphatase inhibitor cyclin-dependent protein serine/threonine kinase activity proton kinase activator cyclin-dependent protein serine/threonine kinase activity

DISADVANTAGES Crazy (i’m not suggesting we do this!)

cyclin-dependent protein serine/threonine kinase activity phosphorylates cdc18 , orc2 involved in negative regulation of mitotic DNA replication initiation during mitotic G2 phase

cyclin-dependent protein serine/threonine kinase activity phosphorylates ase1 , klp9 involved in negative regulation of mitotic spindle elongation during mitotic anaphase A

None of these substrates are kinases or phosphatases, but you can imagine a situation where you could not group things regulating the same process easily because they are annotated to different activities.

ADVANTAGES

ValWood commented 3 years ago

@RLovering @pgaudet @colinlog @thomaspd @cmungall @vanaukenk @hattrill @Antonialock I did not tag everyone, but can post a link to the chat later if anyone is interested.

ValWood commented 3 years ago

Also some of the existing terms are quite difficult to parse: cyclin-dependent protein kinase activating kinase activity This is a good illustration:

https://www.pombase.org/gene/SPAC1D4.06c Csk1 can be described in multiple ways depending on the substrate

Screenshot 2020-11-18 at 10 30 39

It is a cyclin dependent protein serine/threonine kinase activity But because it phosphorylates another cyclin dependent protein serine/threonine kinases it is also a
cyclin-dependent protein kinase activating kinase activity and because it phosphorylates RNA polymerase heptad repeats it's a RNA polymerase II CTD heptapeptide repeat kinase activity

we might not need to remove all axis of classification but what's have now is overly complicated for curators and annoying for end users.

vanaukenk commented 3 years ago

Thanks @ValWood

Since starting to make GO-CAMs, I've increasingly felt that wherever possible we should try to more accurately describe the MFs of gene products described as 'regulators'.

For example, is a GP a positive regulator because it acts as a scaffold to bring an enzyme in closer proximity to its substrate?

In another case, might a GP be a positive regulator of a channel because it acts as a 'chaperone' to properly transport and insert the channel into the membrane?

If we can select a more descriptive term, then we can use the GO-CAM framework to ask questions about what activities regulate other activities without necessarily having to directly, or by inference, annotate to a 'regulator' MF term, as well.

I suggest we form a smallish working group to systematically look at the consequences of the different options we have here to then make recommendations to the ontology editors and curators.

pgaudet commented 6 months ago

I think this is not an issue anymore, since the logical definition of regulator activity was changed. The annotations are not in the Annotation Preview: http://noctua.geneontology.org/workbench/annpreview/?model_id=gomodel:5fadbcf000000632

@ValWood can we close?

ValWood commented 6 months ago

agreed