geneontology / annotation_extensions

Documentation, tickets & usage reports for annotation extension relations.
2 stars 2 forks source link

Why are 'regulates' relations not allowed in annotation extensions #63

Open dosumis opened 8 years ago

dosumis commented 8 years ago

If any process can regulate another, and this is very often context dependent, why can we not record this using regulates relations in annotation extensions?

As far as I can see, this would fit nicely with LEGO.

CC @rachhuntley @RLovering @ValWood @rebeccafoulger @ukemi

(Discussion arising from work on SynGO project)

rebeccafoulger commented 8 years ago

There were recent discussions on the annotation calls about how 'causally_upstream_of' should be translated into annotations- I think 'regulates' was one option. E.g. if a translation factor results in more of protein X, and protein X is involved in cholesterol metabolism, then the translation step is causally_upstream_of 'cholesterol metabolism' (and the translation factor is regulating cholesterol metabolism).

ValWood commented 8 years ago

I have some STRONG opinions on this as I've been thinking about it for nearly a decade. I think I can now describe what we do and why, but we haven't documented this yet... will post something in a few days.

dosumis commented 8 years ago

Hi all,

Opening up this discussion again as need resolution for the synapse project. Also important for alignment of annotation extensions with ontology and LEGO.

Here's the current relations heirarchy:

image

So annotating to regulates => inference to 'causally upstream of' (not the other way round).

These relations are all available for use in LEGO and the ontology.

We also have this term in GO:

image

In line with the textual definition, I plan to switch the logical definition to: biological_process that regulates some 'biological process'.

Making this change =>

The latter case (use of regulates in AE), would be very useful for inference from annotation extensions in Synapse Project annotation. It also sets the stage for translation of synapse annotations using AEs to LEGO.

Are there any objections to this?

If annotators are OK with this, I'll edit gorel-edit.owl to allow the use of regulates relations in annotation extension.

(Braces himself for STRONG Val opinions....)

CC @pdthomas @cmungall

ValWood commented 8 years ago

Nope I'm good with this. I would like one extra thing. Could 'directly inhibits' and 'directly activates' be made descendants of "has_direct_input" (for use in LEGO and extensions).

This would resolve the issues I talked to Kimberly about:

cdc2 f-p

So here, all of these annotations are describing the same 'activity' (different substrates). The splitting of 'protein kinase' into different 'types' (annoying) can be discussed elsewhere.

What I want to be able to do, is capture the fact that: cdc2(CDK1) is a protein kinase, which phosphorylates and inhibits the phosphatase clp1 in a single annotation.

See here we have: protein kinase cdc2 has_substrate( our translation for has_direct_input) clp1 AND protein serine/threonine phosphatase inhibitor has_substrate(our translation for has_direct_input) clp1

Instead I'd like to be able to say protein kinase cdc2 has_substrate(directly_inhibits) clp1 and you can figure out that, because clp1 is a phosphatase its inhibiting the phosphatase activity.

What we really want to know is how this affects the process which clp1 is involved in (positive or negative regulation). Gradually we are linking up the MF and BP in this way to make them "LEGO ready". You can see that we have already done this for a number of cdc2 substrates, sometimes we don't know, or its different processes in different places at different times.

I'm still a bit confused how far upstream we would use "causally upstream of" but thats probably going to be a parallel discussion. For instance in Becky's example above, I wouldn't annotate anything in "translation" as "causally upstream", because its causally upstream of everything.

PomBase rule of thumb, is that we need to know that a gene product is 'real biological regulation' in the normal cell.

ukemi commented 8 years ago

But activities activate and inhibit other activities. An activity has direct input of a thing.

ValWood commented 8 years ago

Hi David I don't understand ;) This would capture that....

dosumis commented 8 years ago

Could 'directly inhibits' and 'directly activates' be made descendants of "has_direct_input" (for use in LEGO and extensions).

No. The general relation for covering providing input AND regulation is 'causally upstream of'. It looks like we need to reflect this in the direct versions of the relations:

'directly provides input for' should also be under 'immediately causally upstream of'.

Agree it would be nice to be able to group activities 'immediately causally upstream of' some specified MF. Not sure we want to roll classes for these though, so maybe an interface issue.

ValWood commented 8 years ago

But if we use "causally upstream of" then we lose the fact that is also substrate.

dosumis commented 8 years ago

What I want to be able to do, is capture the fact that: cdc2(CDK1) is a protein kinase, which phosphorylates and inhibits the phosphatase clp1 in a single annotation.

The kind of nested annotation LEGO is designed to cope with...

protein kinase cdc2 has_substrate(directly_inhibits) clp1 and you can figure out that, because clp1 is a phosphatase its inhibiting the phosphatase activity.

  • What about a GP with multiple activities? In that case you wouldn't know which activity was being inhibited. Certainly the OWL model doesn't 'know'.
  • The range of directly_inhibits is process (in the GO context, MF). So it can't point at a protein.

In LEGO: (protein kinase activity, enabled_by CDK1 , has_substrate CLP1) directly_inhibits (phosphatase activity enabled_by CLP1)

(Note - no need for 'causally upstream of' in this case. 'provides input for' is for cases where an enzyme produces catalyses production of some product that is consumed by some other process. This is the relationship between steps in a metabolic pathway.)

ValWood commented 8 years ago

I still don't see why we can't do this in the GAF (not that we need to be able to represent this on the gene pages, not only in the LEGO diagrams, because this is how our users mainly consume this data)

I still don't understand why a single relationship cannot be available to capture this in extensions.

We will not be curating in LEGO immediately, we will do this AT THE END of the normal GO curation. It isn't practical for us to work in LEGO. Take cdc2 as an example, it has around 200 substrates, it has over 20 annotated processes so far (non-redundant), and will have many, many more. It has different substrates at different times. We won't be able to curate in LEGO because the information is fragmented and spread across nearly 700 publications.

The ONLY practical way we will be able to do LEGO curation is once we have processed all of the papers for a specific gene or process. It would be really helpful if we could capture at annotation time whether a particular activity was directly inhibiting or activating its substrate in a single annotation.

ValWood commented 8 years ago

Ah I see. That's a shame.

dosumis commented 8 years ago

There are limits to what you can feasibly squeeze into annotation extensions. Specifically, nesting is not allowed in annotation extensions, but is in LEGO. I can't see how you can say what you want to say without nesting. shortcut relations relations can't get around this.

ValWood commented 8 years ago

I can do it like this:

tmp

We need to be able to use extensions in this way, to group together the annotations for a specific target. Our users aren't going to be able to go to the 100 or so lego diagrams that will eventually be required to describe all of the processes cdc2 is involved in, so we need a non-redundant representation of each target on the gene pages.

(corrected image, to be directly inhibits protein phosphatase activity)

ukemi commented 8 years ago

core annotation: CDK1 protein kinase activity annotation extensions: has_input CLP1, directly_inhibits phosphatase activity, regulates_o_has_agent CLP1 (should probably be switched to regulates_o_enabled_by)

ValWood commented 8 years ago

remind me what this means again: regulates_o_has_agent

ukemi commented 8 years ago

A regulates_o_has_agent B, it means that a process A is regulating another process that is carried out by (a gene product) B. In the past we have always used the chains for regulation terms, but I think it might still work here. It still would be better if we could nest annotations.

dosumis commented 8 years ago

David's solution works.

Without nesting, it's still, strictly speaking, as set of unlinked assertions:

dosumis commented 8 years ago

interesting point on 'has agent/enabled by' Need to think about this link some more.

ValWood commented 8 years ago

Why are they unlinked? They are comma separated which means they are linked/dependent?

cmungall commented 8 years ago

they are linked in that they all refer to the same instance of cdc2 kinase activity

however, formally, the fillers (clp1 and phosphatase activity) are not connected to each other. They are only connected via the shared cdc2 kinase activity.

It requires either human biological intuition or some kind of rules or inferences to say: you see the phosphatase activity that is being inhibited, and you see that clp1 that is kinase is targeting? We're talking about the same thing, the phosphatase belongs to the clp1.

That may sound obvious or dumb, but if you consider cases with some other relations, it's not always safe to assume we're talking about the same thing.

So I think what we're converging on is that we need to implement some kind of rule (probably a heuristic as it's hard to encode this in OWL) along the lines of

a has-input m, g type G
a directly-inhibits a2, a2 type A2
-->
a2 enabled-by g
ukemi commented 8 years ago

Yup. They still aren't nested, just linked at the CDC2 kinase hub, but that's the best I can come up with given the limitations of the old format.

Just for fun I made this model in Noctua. We really need to be moving to this, so I may as well throw out something for us to edit. Sooner or later this is all going to come up in discussion. Spombe-cdc2-extensions

dosumis commented 8 years ago

OK. So the main action item seems to be to add regulates relations to the set of permitted relations for annotation extension.

I can start out by adding the old regulates relations, but it seems to me that everything under regulates in this graph should be valid:

image

Is it OK to go ahead with changing for the basic regulation relations now? I could present this on an annotation call before adding the rest.

ValWood commented 8 years ago

Hi David,

We just looked at your models. We don't know what the difference is between them....are they all equivalent?

val

ukemi commented 8 years ago

I think we should discuss this on a call. I would like to get feedback on which of the three people would have come up with and whether they can think of another way. They are not all the same.

Antonialock commented 8 years ago

is the middle statement like what you get with the AE strings?

cmungall commented 8 years ago

Note quite. With AEs without any special collapsing rules you would get another configuration. It would look like the middle one, but without the enabled-by link in the lower right

ukemi commented 8 years ago

Shall we also discuss these on the call on Monday? I think it would be very useful to go over what information is and isn't in each one and how they map on to conventional annotations. I think that it would be reasonable to assume that a curator might come up with any of these, they are all correct. Can we tease out what is the same and what is different? How does this affect our ideas of consistency?

ValWood commented 8 years ago

Could you send PomBase curators call details? We intended to have some of us participate in the LEGO calls soon. Antonia and I will join Monday for this one. Val

http://noctua.berkeleybop.org/editor/graph/gomodel:5716c41300000003

ukemi commented 8 years ago

Hi Val, The details of the calls are on the wiki. http://wiki.geneontology.org/index.php/Annotation_Advocacy_and_Coordination#LEGO_calls I will try to put up an agenda by the end of the day, but I have to do the consistency exercise paper first!

-D

ValWood commented 8 years ago

Are there instructions what to do? I don't see them......

On 22/04/2016 17:50, ukemi wrote:

Hi Val, The details of the calls are on the wiki. http://wiki.geneontology.org/index.php/Annotation_Advocacy_and_Coordination#LEGO_calls I will try to put up an agenda by the end of the day, but I have to do the consistency exercise paper first!

-D

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/geneontology/annotation_extensions/issues/63#issuecomment-213507573

Cambridge University PomBase http://www.pombase.org/ Cambridge Systems Biology Centre http://www.sysbiol.cam.ac.uk/Investigators/val-wood

ValWood commented 8 years ago

This is for the LEGO call. Where do the annotations go for the consistency exercise? I don't see any links for previous consistency exercises for this. What is the usual procedure? Val

On 22/04/2016 17:50, ukemi wrote:

Hi Val, The details of the calls are on the wiki. http://wiki.geneontology.org/index.php/Annotation_Advocacy_and_Coordination#LEGO_calls I will try to put up an agenda by the end of the day, but I have to do the consistency exercise paper first!

-D

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/geneontology/annotation_extensions/issues/63#issuecomment-213507573

Cambridge University PomBase http://www.pombase.org/ Cambridge Systems Biology Centre http://www.sysbiol.cam.ac.uk/Investigators/val-wood

rachhuntley commented 8 years ago

Hi David, @dosumis We can now use "regulates" in the extensions in Protein2GO, which is great, but I was wondering if you are planning to also add the negative and positive versions too? Thanks, Rachael.

dosumis commented 8 years ago

We can now use "regulates" in the extensions in Protein2GO, which is great, but I was wondering if you are planning to also add the negative and positive versions too?

I thought they were already in. I certainly attempted to add them. Checking now.

dosumis commented 8 years ago

The good new is - I think I've fixed the problem: The relations needed an extra tag used in the Jenkins build job. The bad news is, this and other builds on Jenkins are broken due to refactoring of the general GO release pipeline. If this gets fixed today, the fix should be visible in P2G tomorrow. MIght take until next week though.

rachhuntley commented 8 years ago

Great, thanks. I'll look out for it.

ValWood commented 7 years ago

We can probably close this based on https://github.com/geneontology/go-ontology/issues/12811 ?