geneontology / go-annotation

This repository hosts the tracker for issues pertaining to GO annotations.
BSD 3-Clause "New" or "Revised" License
32 stars 10 forks source link

What annotations have value? #2570

Closed RLovering closed 1 year ago

RLovering commented 5 years ago

This is a discussion that was initiated in an ontology ticket https://github.com/geneontology/go-ontology/issues/17643

The points raised relevant to an annotation discussion are pasted below

Started with @RLovering well yes but there are 2 downstream processes here:

ABCA2 regulates the expression of low-density lipoprotein receptor, which leads to a lower level of cholesterol import. This is part of the process of cholesterol homeostasis

ABCA2 regulates the distribution of the sphingolipid ceramide which inhibits acylCoA:cholesterol acyltransferase (ACAT) activity; as ACAT esterifies cholesterol, cholesterol esterification is reduced.
Regulation of cholesterol esterification is a different part of the process of cholesterol homeostasis, than the regulation of the expression of LDLR.

Please note that I am aware that examplar cases of GO annotation and how GO-CAM models are being exported to GPAD files do exist. In which case, I would be very grateful if someone would point me to them so that Barbara and I can improve our GO annotations.

Following my previous annotation practice I would annotate ABCA2 as negatively regulating cholesterol esterification. Note that the current definition of almost all regulation terms starts the same, for example: ANY PROCESS that modulates the frequency, rate or extent of cholesterol esterification. There are currently 15 human proteins associated with the GO term regulation of cholesterol esterification, so there is no suggestion of over association of this term with 'every' human protein.

I think the approaches that are being taken by Pascale, Kimberly and Val to review GO annotations are great, and at UCL we are doing our best to keep up with the revisions that are being requested. But more needs to be done to provide examples of annotating specific processes so that these can be discussed and agreed upon and then used as examplar GO annotation approaches.

The recent wiki page that provides examples of the relations to use in annotation extensions and GO-CAM models is very helpful. But worked examples would be very helpful. I could then go back to all the human proteins currently annotated to processes like cholesterol esterification and the regulation of this process and revise these annotations in line with the 'new' system. In fact this might be a great basis for a grant application because reviewing all the annotations associated with human proteins is not a 5 min job. But other than reviewing all annotations I agree with Val that it is important to be trying to create annotations now that won't get deleted in the future. With examplar systems established and promoted we would have more chance of creating new annotations which are following these rules.

RLovering commented 5 years ago

From @ValWood

I agree that the guidelines are currently unclear.

My take n regulation is that there needs to be some evidence for "control" i.e activation (via signalling), or a rate limiting step. Not just "affects" by upstream. This is why to me this seems like an acts_upstream_of annotation, rather than regulation.

However, lots of people do use regulation when a mutation affects a process. My understanding is that acts_upstram_of was to provide a better way to annotate "upstream of" but not regulating.

I think a lot of the confusion here is that the GO definition of "regulation" is quite vague, as you point out here

"ANY PROCESS that modulates the frequency, rate or extent of cholesterol esterification."

To me, this definition is a little weak, because it doesn't really exclude any "acts upstream of" The word "modulate" was used to imply "controlling influence" , possibly wording indicating "control" more explicitly would be an improvement. I have mentioned this before.

I only commented on this ticket because I am interested in the boundaries between "regulation" and acts_upstream_of. At the moment, we seem to have thrown a lot of new annotation syntax into the mix without defining precisely what the differences are.

This paper abstract says It is important to study the role of ABCA2 in regulating cholesterol homeostasis in neuronal-type cells because ABCA2 has been identified as a possible genetic risk factor for Alzheimer's disease. In this study, the effects of ABCA2 expression on cholesterol homeostasis were examined in mouse N2a neuroblastoma cells. ABCA2 reduced total, free- and esterified cholesterol levels as well as membrane cholesterol but did not perturb cholesterol distribution in organelle or lipid raft compartments.

So "cholesterol homeostasis" seemed appropriate.

Because these experiments are performed in cell lines overexpressing ABCA2 endogenously (which might change LOTS of levels of all sorts of substrates, but does not necessarily imply that these levels are physiologically controlled by this gene product)

The authors then say "These results are consistent with the results observed in CHO cells and show that ABCA2 expression results in reduced levels of esterification of lipoprotein-derived cholesterol by ACAT in the endoplasmic reticulum." But they don't say "regulates" because this would be an over-interpretation of the experiments.

I guess what I am saying is that you can't infer from this particular paper that this is bona fida regulation. Which is why "act_upstream_of" or "provides_substrate_for" seems more appropriate. However, we are probably pickier about when we use regulation at PomBase- because such a lot of the work is genetics we see what a large knowdown effect knockdown or overexpression can have without being regulatory.

RLovering commented 5 years ago

From @RLovering Hi Val I have created a GO-CAM model for this and also revised some of the annotations, by reading the papers through again. Role of ABCA2 in cholesterol esterification gomodel:5d29221b00000268

http://noctua.berkeleybop.org/editor/graph/gomodel:5d29221b00000268

The annotations that derive from the GO-CAM model do not include any regulation of cholesterol esterification annotations, I added these in the model as causally_upstream_of. In my annotation set I have added regulation of cholesterol esterification with the acts_upstream_of qualifier. I believe that the maintenance of ceramide at the outside of the plasma membrane neg regulates sphingomyelin biosynthesis by restricting the access of the enzymes to the ceramide substrate. I can see that provides substrate for (or prevents providing substrate for) would be appropriate, but I think that the impact of the activity of ABCA2 to the sphingomyelin biosynthesis is close enough to state this is regulating it. Especially now I have discovered that BP terms connected to the model with causally_upstream_of or directly_provides_input_for relations do not get associated with the entity annotated when created in GO-CAM. In the model I have used directly_provides_input_for and part_of for 2 different 'regulation of a biosynthetic process' terms and only one of them is exported to the annotation file.

Best

Ruth

RLovering commented 5 years ago

from @ValWood (abbreviated) So I still don't understand if we should use regulation or not here. The acts upstream of seems to cover it for me , so it would be good to get precise guidelines. @vanaukenk Could we do this one in an annotation call?

Cheers

Val

RLovering commented 5 years ago

from @vanaukenk Hi @BarbaraCzub @RLovering @ValWood I'm just back from vacation and catching up on tickets.

This is a really interesting discussion and very important for how we think about annotating regulation of processes and functions going forward. I think we do want to move towards a more restrictive application of regulation terms such that we only use those terms and relations when we understand the mechanism of the regulation. I also prefer to think of regulation as a proximal, or direct, effect on an MF, although this is definitely an area where we need to discuss more examples to be sure this will accurately capture the important biology.

Here are some more specific thoughts on the cholesterol esterification example, all in the general vein of trying to elucidate what is known about mechanism:

Wrt both conventional annotation and GO-CAM, the best starting point is to think about the MFs for each of the gene products wrt the BP you're trying to model. In this particular case, then, what is the MF of ABCA2 wrt cholesterol esterification? Is it a transporter? A floppase? Is it even known? The function of ACAT seems straightforward.

For each MF, where does it occur? Again, here ACAT seems straightforward, but what about ABCA2? Wrt its effect on cholesterol esterification, is it acting in the plasma membrane? the late endosome? the lysosome? Is it known?

If we can determine the MF and location of that MF for ABCA2, how many steps then are in between the ABCA2 MF and the MF of ACAT? Can we connect the dots directly, or are there other MFs in between? If there are other MFs, do we know what they are?

I haven't read through all of the reported experiments in the papers, but from reviewing the results and reading over the discussions, I'm inclined to agree with Val that, right now, the more appropriate annotation for ABCA2 would be 'acts upstream of' (and I would even suggest we include directionality and use 'acts upstream of, negative effect') to 'cholesterol esterification'. This would be in addition to annotations to 'cholesterol homeostasis' which it sounds like you've already made. The reason I would choose this qualifier over annotating to a regulation term is that I didn't see exactly where and how the MF of ABCA2 directly affects the MF of ACAT.

The issue of how to handle enrichment analyses with these new qualifiers is extremely important and, as @hattrill reminds us, even consistently applying these qualifiers is still very much a work-in-progress for GO. So, it's good to discuss these examples (and maybe this ticket should be moved to go-annotation instead), to crystallize our concepts of regulation vs acting upstream (and how far upstream we think is reasonable to go). And, in the meantime, we may also want to seriously consider keeping the 'acts upstream of' annotations in a separate GAF or GPAD so users can decide if and how they want to use those annotations.

RLovering commented 5 years ago

I realise that many of the points raised by Kimberly have been discussed in GO-CAM calls and probably also in Reactome2GO mapping discussions. However, I had not appreciated that the GOC is moving substantially away from the previous annotation practice and wondered how many other groups have been involved in these decisions. I have many concerns about this and the impact this is going to have on users. Although I also appreciate that groups analysing high-throughput datasets (HTP) prefer to get a few pathways (ie from Reactome) than lists of often 1000 'unrelated' enriched GO terms. Also that when I show GO-CAM to researchers the majority agree that this is a useful view. Although, I think that the problem users have with these long lists of GO terms is that it is:

  1. it is difficult to view the list of terms within the ontology and therefore to appreciate which GO terms can be ignored because they are to general to be informative, and to appreciate that many genes have roles in similar processes in divergent cells/tissues. ie enrichment of 'heart development' when you are looking at lung tissue reflects that the same genes are used in these two different tissues.
  2. GO annotation is not complete for human. I am currently writing a grant for more funding and the lack of descriptive annotations for many biological domains is very easy to demonstrate. This will also mean that HTP data investigating these under annotated areas will return terms that appear to be unrelated to the area of biology under investigation. Despite this I am seeing more and more evidence that appropriate GO terms are enriched in the analyses I am undertaking.
  3. I always am concerned when authors describe the enrichment of a specific signaling pathway identified often in Reactome, when they perhaps have not appreciated that the proteins associated with the pathway and the query list are also in other pathways but that these other pathways have not been curated.

To get back to the suggestions by Kimberly which I have listed as summary points below:

  1. Restrict the application of regulation terms to gps with a direct effect on an MF.
  2. Use the qualifier 'acts upstream of' (and 'acts upstream of, negative/positive effect') to biological process. Here 'acts upstream of' is being used rather than a regulation term.
  3. Consider keeping the 'acts upstream of' annotations in a separate GAF or GPAD so users can decide if and how they want to use those annotations.
  4. Another point to include here is the use of 'causally upstream of' in GO-CAM models. Current GOC practice is that none of these annotations will be exported to GPAD/GAF files. 
RLovering commented 5 years ago

Response to: 1 Restrict the application of regulation terms to gps with a direct effect on an MF. I would be happy for the involved in qualifier to only be used when applying a regulation term to gps with a direct effect on an MF.

RLovering commented 5 years ago

Response to: 2. Use the qualifier 'acts upstream of' (and 'acts upstream of, negative/positive effect') to biological process. Here 'acts upstream of' is being used rather than a regulation term. I have considerable concerns about this proposal. So many that it is difficult for me to put all of these down, as I feel more concerns will arise as I have a chance to consider this. I hope others will also consider other impacts of this, including the positive impact. I do not think that the qualifier 'acts upstream of' should be used as a proxy for 'regulation'. This will lead to:

  1. very few gps associated with the regulation terms, making the regulation terms effectively unusable for HTP analysis.
  2. It is not clear what this will lead to. My assumption is that this will prevent the grouping of multiple gps that regulate a process, ie the gps that directly regulate a process will not be grouped with gps that regulate, but not directly, a process.
  3. The 'involved in' files will only contain annotations for processes a gps is directly involved in. ie insulin will be annotated only to: insulin signaling pathway.
  4. The 'acts upstream of' files will not be compatible with the 'involved in' files without considerable manipulation, because the annotations will be 'acts upstream of (negative effect)' BP (not regulation of BP). So the simple manipulation of removing all qualifiers will convert this annotation into just the BP annotation.
  5. I would rather that the acts upstream of negative and positive effect options are removed and instead the 'acts upstream of' qualifier could be associated with the regulation/pos reg/neg reg terms. Stripping out the qualifiers will then lead to regulation term annotations. This potentially will pollute the 'only direct regulation 'involved in'' annotations but this would be a more appropriate 'pollution' than including regulators in the BP term.
  6. Perhaps the problem is just human specific, because I don't think there is human database (other than GO) that will capture that insulin regulates glucose transport or that the change in ceramide membrane distribution mendiated by ABCA2 affects cholesterol esterification?
RLovering commented 5 years ago

Response to 3. Consider keeping the 'acts upstream of' annotations in a separate GAF or GPAD so users can decide if and how they want to use those annotations. Based on my comments to point 2, I would rather there are 2 files: 1 with only the 'involved in' annotations and one with the 'involved in' and the 'acts upstream of' annotations. A file with only the acts upstream of annotations will be of no use to anyone unless the annotations are converted in some way (as described above) to the equivalent regulation (if this is appropriate) term.

RLovering commented 5 years ago

Response to 4. No 'causally upstream of' annotations from GO-CAM models will be exported to GPAD/GAF files. Sad!