Open dosumis opened 8 years ago
There were recent discussions on the annotation calls about how 'causally_upstream_of' should be translated into annotations- I think 'regulates' was one option. E.g. if a translation factor results in more of protein X, and protein X is involved in cholesterol metabolism, then the translation step is causally_upstream_of 'cholesterol metabolism' (and the translation factor is regulating cholesterol metabolism).
I have some STRONG opinions on this as I've been thinking about it for nearly a decade. I think I can now describe what we do and why, but we haven't documented this yet... will post something in a few days.
Hi all,
Opening up this discussion again as need resolution for the synapse project. Also important for alignment of annotation extensions with ontology and LEGO.
Here's the current relations heirarchy:
So annotating to regulates => inference to 'causally upstream of' (not the other way round).
These relations are all available for use in LEGO and the ontology.
We also have this term in GO:
In line with the textual definition, I plan to switch the logical definition to: biological_process that regulates some 'biological process'.
Making this change =>
The latter case (use of regulates in AE), would be very useful for inference from annotation extensions in Synapse Project annotation. It also sets the stage for translation of synapse annotations using AEs to LEGO.
Are there any objections to this?
If annotators are OK with this, I'll edit gorel-edit.owl to allow the use of regulates relations in annotation extension.
(Braces himself for STRONG Val opinions....)
CC @pdthomas @cmungall
Nope I'm good with this. I would like one extra thing. Could 'directly inhibits' and 'directly activates' be made descendants of "has_direct_input" (for use in LEGO and extensions).
This would resolve the issues I talked to Kimberly about:
So here, all of these annotations are describing the same 'activity' (different substrates). The splitting of 'protein kinase' into different 'types' (annoying) can be discussed elsewhere.
What I want to be able to do, is capture the fact that: cdc2(CDK1) is a protein kinase, which phosphorylates and inhibits the phosphatase clp1 in a single annotation.
See here we have: protein kinase cdc2 has_substrate( our translation for has_direct_input) clp1 AND protein serine/threonine phosphatase inhibitor has_substrate(our translation for has_direct_input) clp1
Instead I'd like to be able to say protein kinase cdc2 has_substrate(directly_inhibits) clp1 and you can figure out that, because clp1 is a phosphatase its inhibiting the phosphatase activity.
What we really want to know is how this affects the process which clp1 is involved in (positive or negative regulation). Gradually we are linking up the MF and BP in this way to make them "LEGO ready". You can see that we have already done this for a number of cdc2 substrates, sometimes we don't know, or its different processes in different places at different times.
I'm still a bit confused how far upstream we would use "causally upstream of" but thats probably going to be a parallel discussion. For instance in Becky's example above, I wouldn't annotate anything in "translation" as "causally upstream", because its causally upstream of everything.
PomBase rule of thumb, is that we need to know that a gene product is 'real biological regulation' in the normal cell.
But activities activate and inhibit other activities. An activity has direct input of a thing.
Hi David I don't understand ;) This would capture that....
Could 'directly inhibits' and 'directly activates' be made descendants of "has_direct_input" (for use in LEGO and extensions).
No. The general relation for covering providing input AND regulation is 'causally upstream of'. It looks like we need to reflect this in the direct versions of the relations:
'directly provides input for' should also be under 'immediately causally upstream of'.
Agree it would be nice to be able to group activities 'immediately causally upstream of' some specified MF. Not sure we want to roll classes for these though, so maybe an interface issue.
But if we use "causally upstream of" then we lose the fact that is also substrate.
What I want to be able to do, is capture the fact that: cdc2(CDK1) is a protein kinase, which phosphorylates and inhibits the phosphatase clp1 in a single annotation.
The kind of nested annotation LEGO is designed to cope with...
protein kinase cdc2 has_substrate(directly_inhibits) clp1 and you can figure out that, because clp1 is a phosphatase its inhibiting the phosphatase activity.
- What about a GP with multiple activities? In that case you wouldn't know which activity was being inhibited. Certainly the OWL model doesn't 'know'.
- The range of directly_inhibits is process (in the GO context, MF). So it can't point at a protein.
In LEGO: (protein kinase activity, enabled_by CDK1 , has_substrate CLP1) directly_inhibits (phosphatase activity enabled_by CLP1)
(Note - no need for 'causally upstream of' in this case. 'provides input for' is for cases where an enzyme produces catalyses production of some product that is consumed by some other process. This is the relationship between steps in a metabolic pathway.)
I still don't see why we can't do this in the GAF (not that we need to be able to represent this on the gene pages, not only in the LEGO diagrams, because this is how our users mainly consume this data)
I still don't understand why a single relationship cannot be available to capture this in extensions.
We will not be curating in LEGO immediately, we will do this AT THE END of the normal GO curation. It isn't practical for us to work in LEGO. Take cdc2 as an example, it has around 200 substrates, it has over 20 annotated processes so far (non-redundant), and will have many, many more. It has different substrates at different times. We won't be able to curate in LEGO because the information is fragmented and spread across nearly 700 publications.
The ONLY practical way we will be able to do LEGO curation is once we have processed all of the papers for a specific gene or process. It would be really helpful if we could capture at annotation time whether a particular activity was directly inhibiting or activating its substrate in a single annotation.
Ah I see. That's a shame.
There are limits to what you can feasibly squeeze into annotation extensions. Specifically, nesting is not allowed in annotation extensions, but is in LEGO. I can't see how you can say what you want to say without nesting. shortcut relations relations can't get around this.
I can do it like this:
We need to be able to use extensions in this way, to group together the annotations for a specific target. Our users aren't going to be able to go to the 100 or so lego diagrams that will eventually be required to describe all of the processes cdc2 is involved in, so we need a non-redundant representation of each target on the gene pages.
(corrected image, to be directly inhibits protein phosphatase activity)
core annotation: CDK1 protein kinase activity annotation extensions: has_input CLP1, directly_inhibits phosphatase activity, regulates_o_has_agent CLP1 (should probably be switched to regulates_o_enabled_by)
remind me what this means again: regulates_o_has_agent
A regulates_o_has_agent B, it means that a process A is regulating another process that is carried out by (a gene product) B. In the past we have always used the chains for regulation terms, but I think it might still work here. It still would be better if we could nest annotations.
David's solution works.
Without nesting, it's still, strictly speaking, as set of unlinked assertions:
CDK1 enables CDK1 protein kinase activity:
But this may be fine for PomBase use cases.
interesting point on 'has agent/enabled by' Need to think about this link some more.
Why are they unlinked? They are comma separated which means they are linked/dependent?
they are linked in that they all refer to the same instance of cdc2 kinase activity
however, formally, the fillers (clp1 and phosphatase activity) are not connected to each other. They are only connected via the shared cdc2 kinase activity.
It requires either human biological intuition or some kind of rules or inferences to say: you see the phosphatase activity that is being inhibited, and you see that clp1 that is kinase is targeting? We're talking about the same thing, the phosphatase belongs to the clp1.
That may sound obvious or dumb, but if you consider cases with some other relations, it's not always safe to assume we're talking about the same thing.
So I think what we're converging on is that we need to implement some kind of rule (probably a heuristic as it's hard to encode this in OWL) along the lines of
a has-input m, g type G
a directly-inhibits a2, a2 type A2
-->
a2 enabled-by g
Yup. They still aren't nested, just linked at the CDC2 kinase hub, but that's the best I can come up with given the limitations of the old format.
Just for fun I made this model in Noctua. We really need to be moving to this, so I may as well throw out something for us to edit. Sooner or later this is all going to come up in discussion. Spombe-cdc2-extensions
OK. So the main action item seems to be to add regulates relations to the set of permitted relations for annotation extension.
I can start out by adding the old regulates relations, but it seems to me that everything under regulates in this graph should be valid:
Is it OK to go ahead with changing for the basic regulation relations now? I could present this on an annotation call before adding the rest.
Hi David,
We just looked at your models. We don't know what the difference is between them....are they all equivalent?
val
I think we should discuss this on a call. I would like to get feedback on which of the three people would have come up with and whether they can think of another way. They are not all the same.
is the middle statement like what you get with the AE strings?
Note quite. With AEs without any special collapsing rules you would get another configuration. It would look like the middle one, but without the enabled-by link in the lower right
Shall we also discuss these on the call on Monday? I think it would be very useful to go over what information is and isn't in each one and how they map on to conventional annotations. I think that it would be reasonable to assume that a curator might come up with any of these, they are all correct. Can we tease out what is the same and what is different? How does this affect our ideas of consistency?
Could you send PomBase curators call details? We intended to have some of us participate in the LEGO calls soon. Antonia and I will join Monday for this one. Val
http://noctua.berkeleybop.org/editor/graph/gomodel:5716c41300000003
Hi Val, The details of the calls are on the wiki. http://wiki.geneontology.org/index.php/Annotation_Advocacy_and_Coordination#LEGO_calls I will try to put up an agenda by the end of the day, but I have to do the consistency exercise paper first!
-D
Are there instructions what to do? I don't see them......
On 22/04/2016 17:50, ukemi wrote:
Hi Val, The details of the calls are on the wiki. http://wiki.geneontology.org/index.php/Annotation_Advocacy_and_Coordination#LEGO_calls I will try to put up an agenda by the end of the day, but I have to do the consistency exercise paper first!
-D
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/geneontology/annotation_extensions/issues/63#issuecomment-213507573
Cambridge University PomBase http://www.pombase.org/ Cambridge Systems Biology Centre http://www.sysbiol.cam.ac.uk/Investigators/val-wood
This is for the LEGO call. Where do the annotations go for the consistency exercise? I don't see any links for previous consistency exercises for this. What is the usual procedure? Val
On 22/04/2016 17:50, ukemi wrote:
Hi Val, The details of the calls are on the wiki. http://wiki.geneontology.org/index.php/Annotation_Advocacy_and_Coordination#LEGO_calls I will try to put up an agenda by the end of the day, but I have to do the consistency exercise paper first!
-D
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/geneontology/annotation_extensions/issues/63#issuecomment-213507573
Cambridge University PomBase http://www.pombase.org/ Cambridge Systems Biology Centre http://www.sysbiol.cam.ac.uk/Investigators/val-wood
Hi David, @dosumis We can now use "regulates" in the extensions in Protein2GO, which is great, but I was wondering if you are planning to also add the negative and positive versions too? Thanks, Rachael.
We can now use "regulates" in the extensions in Protein2GO, which is great, but I was wondering if you are planning to also add the negative and positive versions too?
I thought they were already in. I certainly attempted to add them. Checking now.
The good new is - I think I've fixed the problem: The relations needed an extra tag used in the Jenkins build job. The bad news is, this and other builds on Jenkins are broken due to refactoring of the general GO release pipeline. If this gets fixed today, the fix should be visible in P2G tomorrow. MIght take until next week though.
Great, thanks. I'll look out for it.
We can probably close this based on https://github.com/geneontology/go-ontology/issues/12811 ?
If any process can regulate another, and this is very often context dependent, why can we not record this using regulates relations in annotation extensions?
As far as I can see, this would fit nicely with LEGO.
CC @rachhuntley @RLovering @ValWood @rebeccafoulger @ukemi
(Discussion arising from work on SynGO project)