geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
223 stars 40 forks source link

how will annotations to terms connected by has_part be represented in the GAF #12345

Closed ValWood closed 8 years ago

ValWood commented 8 years ago

This ticket https://github.com/geneontology/go-ontology/issues/12122#issuecomment-196823233

made me realise that

is you annotate to GO:0016236 macroautophagy you are not transitively annotated to autophagic cell death.

That doesn't seem right to me? Its a good illustration of why I worry about the way we are increasingly using has_part

paolaroncaglia commented 8 years ago

Hi Val,

Macroautophagy does not always occur as part of autophagic cell death. The latter brings to death, the former usually not (or not necessarily). But David H was recently mentioning similar concerns, so maybe he'd like to comment @ukemi

ValWood commented 8 years ago

Of course it doesn't . This is not a good example. Non issue :)

ukemi commented 8 years ago

Paola's correct. If we want the annotation to propagate, we would need to make the specific part_of parent. Noctua takes care of this because the part of relation can be represented at the instance level.

ValWood commented 8 years ago

Yes but a curator would still need to remember to make GO:0016236 macroautophagy and to GO:0048102 autophagic cell death for "macrautophagy involved in autophagic cell death" just as we do now...

And it isn't clear to me how this will be represented in the GAF to ensure that the annotation is 'connected" when we represent it on on the gene pages.

This is one of my outstanding questions too.

In fact I'll reopen and change the subject here.

ukemi commented 8 years ago

In the conventional GAF, we don't connect annotations. That's why Noctua is better! In a conventional GAF, this would just show up as two annotations unless we created the involved in term.

ukemi commented 8 years ago

Kimberly and Heiko and I are actively working on genereating GAFs from Noctua models. One of the things we realize we need is to generate individual annotations across part_of relations that are not represented in the ontology. If the part_of is represented in the ontology, then we don't need to make two annotations. It wouldn't be wrong, but it would be redundant. In cases where the evidence differs, then we would probably want to make the individual annotations. I do that when I annotate conventionally.

cmungall commented 8 years ago

well, you could do it with an extension, but the LEGO model is most straightforward and allows any number of levels (e.g. autophagic cell death part-of macroautophagy part-of response to X...)

ValWood commented 8 years ago

Currently PomBase would connect functions to processes with extensions, and request pre-composed terms to represent specific instances of processes (except for "response to" which is largely not very useful as a process, and the modification terms like "phosphorylation" which are covered by the MF parentage).

So in this instance, I would have requested "macroautophagy involved in cell death" (except I wouldn't because it is not a process in pombe, and it seems from the comments that the existence of autophagic cell death is still being debated).

We will still need to see the all the info in the Noctua model in the GAF to represent it on the gene pages. MOD gene pages are important, that's how PomBase 15,000 monthly users routinely consume GO data, and many more at other MODs. We aren't going to have Noctua models for everything for a long time.....We can't just switch until we know we won't have gaps in our annotation.

Even when we do have noctua models, for genes which are involved in many processes (like cdc2 http://www.pombase.org/spombe/result/SPBC11B10.09#go-molecular_function ) which has >150 target genes and regulates many processes, you will get a better overview from a well organized and non-redundant gene page, than looking at many many noctua models. It would be impossible (or cluttered) to capture all of this info in a single model.

I don't know of any examples for PomBase where we have not been able to annotate a single process with a single term. We in the process of linking functions to processes, when there are multiple functions and/or processes. This is in progress. Our users will usually be able to tell quite quickly from the annotations, but we will link them up. Hopefully when we have done this we can go from GAF -> Noctua ,automatically, but we need to be able to go from Noctua -> GAF.

ValWood commented 8 years ago

One of the things I'd love to see here: http://www.pombase.org/spombe/result/SPBC11B10.09#go-molecular_function Is to get rid of all of the residue specific terms and have only "protein kinase activity" we use them because they are there but its not particularly useful at this point, and would improve this view a lot (there are better ways to capture this)

However, we can't get away from the fact, that no matter how good Noctua is, or how we use it in the future, we still need the most comprehensive (and non redundant) possible term based textual representation of GO on MOD gene pages.

ukemi commented 8 years ago

Hi Val. I agree there are better ways to capture the views. I have ideas about how the complexity can be captured and represented as derivations of the LEGO models. I am still not comfortable with the idea that kinase activity is equivalent to phosphorylation. I know I have read about cases where proteins require the addition of phosphate groups on multiple residues in order to have a biological effect. In some of those cases, more than one kinase gene product is required. I would consider these multiple events to be grouped as a process. I think this would also occur sometimes as multiple steps in a single pathway. I am still out on whether we care about that grouping, but I think it exists.

ValWood commented 8 years ago

I know I have read about cases where proteins require the addition of phosphate groups on multiple residues in order to have a biological effect.

This is true. In fact its true for cdc2. Its still phosphorylation though "introducing a phosphate group into a molecule" whether it activates the target or not. Sometimes the phosphorylation is only activatory for a process once a threshold of phosphorylation is reached that is high enough to activate a positive feedback loop. But individual gene products in a population are activated or inhibited by phosphorylation events (on one, or more residues).

I'm not complaining about the grouping term existing. Only that we don't need to display it on gene pages if we have a MF to protein kinase because its implicit from the MF annotation.

The process here which these gene products need to be grouped to isn't phosphorylation, its whatever process the 2 hypothetical kinases are regulating (positive regulation of chromosome segregation or whatever).

ukemi commented 8 years ago

Yup. I agree.

cmungall commented 8 years ago

Might be time to split this ticket into separate tickets, I'm having a hard time following the different threads

uninformative groupings

if 'protein serine/threonine kinase activity' is uninformative, we could put it in goantislim_grouping, stopping it being used in enrichment analyses, and also allowing us to roll up in the display (optionally unfolding to 'kinase activity' and has-substrate some serine/threonine'). Although if it's truly uninformative why not go the whole hog and create a ticket for its obsoletion?

has-parts

it sounds like this is resolved

noctua->gaf

We should have this such that you never end up with less information than had you done a GAF annotation in the first place. @ukemi and @hdietze are working on this.

ValWood commented 8 years ago

Uninformative groupings. This isn't really a grouping term, or an anti-slim problem. Its more specific than the useful term (protein kinase activity). I'm compiling a list of terms and reasons and I'll submit soon-ish.

ukemi commented 8 years ago

I think we need to be careful about how we define 'uninformative' groupings. For semantic richness, in some cases I still think that we need to have terms that we might not use for annotation. If we don't, I think it will have a large impact on the folks who do computational semantic analysis. I think that if we care about a differentia, then we should include all the classes required to make it different from other terms.

ValWood commented 8 years ago

The terms I'm referring to aren't really grouping terms in this context, they are mainly leaf nodes.

cmungall commented 8 years ago

Look like this is assigned to me, still not sure what I'm meant to do :-)

ValWood commented 8 years ago

It sounds as though it is all in hand and I was worrying unnecessarily about has_part. You have it all under control ;)