geneontology / go-annotation

This repository hosts the tracker for issues pertaining to GO annotations.
BSD 3-Clause "New" or "Revised" License
34 stars 10 forks source link

inferring MFs from annotations to complex portal IDs to individual complex participants #1662

Closed bmeldal closed 7 months ago

bmeldal commented 6 years ago

Follow up from GOC meeting in Cambridge 2017:

The working group agreed that the CC (but see #1639 with regards to the required qualifiers!) and BP annotations would be equally appropriate for the complex subunits as they are for the complex so we'll extract them in the new Complex Portal GDAP.

However, extrapolating to MF is tricky:

We decided that we could infer the activity of the active subunit if it has been annotated as 'enzyme' in the CP. However, we could we also extrapolate something about the in-active complex members. In the past people used contributes_to but it has been proposed that this relationship should be obsolete or at least used with great care (#1650 - proposal guidelines). Can we say anything else about the MF of an inactive complex member?

Please post your use cases and proposals here.

bmeldal commented 6 years ago

Initial thoughts from breakout group:

  1. Remove contributes_to, replace with either enables_activity_in, part_of, localises_in or found_in,...
  2. In Noctua: GPs with function go into GPAD, those without known function (emerging function) will be filtered out.
  3. Known function: MF and AE with complex function
  4. For unknown/emerging function annotations use MF root term and AE with complex MF enabled_by CP ID
  5. Don’t annotate unknown/emerging function at all as it confuses users
  6. Use Complex Portal ID enables_activity_in subcellular location
  7. Consider combinatorial evidences
vanaukenk commented 6 years ago

GO-CAM models of different types of complexes and associated functions of their respective members are here:

http://noctua.berkeleybop.org/editor/graph/gomodel:59c8885900000281

These are proposed models for different types of use cases, but we will likely want to model more before coming to any decisions.

Note that the rules for GPAD annotation outputs have not been formulated yet, so what is currently in the annotation preview is not final.

bmeldal commented 6 years ago

related to https://github.com/geneontology/go-annotation/issues/1661

Collecting use cases here: https://docs.google.com/document/d/1ZtAcjIyIQ_ycbuMHyvLA-KIJQtGenh82lxS-MKC6a_A/edit#

bmeldal commented 6 years ago

After discussing this for a little while yesterday (30/11/17) we decided it would be good to get user input to what they need/want.

bmeldal commented 6 years ago

2017_12_01_complex_go_annotation_relationships

I've tried to summarise the situation regarding the way we extract annotations from complexes and from individual GPs:

Along the black lines, the top relationships are those used currently, the ones added in red pen are the ones we are discussing.

Points:

Please review the picture and post your MF class related comments here.

Please post CC class related comments in #1639

bmeldal commented 6 years ago

https://github.com/geneontology/go-ontology/issues/14847 an example of where contributes_to was discussed as a possible relationship

bmeldal commented 6 years ago

Call from 13/2:

We need to have examples of a complex: 1: 1 catalytic subunit + other subunits --> Val wouldn't annotate to non-catalytic subunits 2: complex that requires all members to have catalytic activity --> "contributes_to" - but was tricky in it's usage. https://github.com/geneontology/go-annotation/issues/1650

bmeldal commented 6 years ago

@ValWood @RLovering @ukemi @sylvainpoux @vanaukenk You've all had comments on the subject, please add them if not already captured.

Collecting use cases here: https://docs.google.com/document/d/1ZtAcjIyIQ_ycbuMHyvLA-KIJQtGenh82lxS-MKC6a_A/edit# (that's the link Kimberly shared at the call yesterday)

Draft survey: https://docs.google.com/document/d/1P_VLM9g13kj9lu3CRAgotAI3cUmWkS1yWaVg95u2Vbk/edit?usp=sharing

Thanks.

bmeldal commented 6 years ago

Minutes from WG call on 22/2/18:

Present: @judyblake , Li (sorry, don't have GH ID), @hdrabkin , @sandraorchard , @ValWood , @NancyCampbell , @tberardini , @RLovering , @deustp01 , @vanaukenk , @nataled , @ggeorghiou , Pascale, @edwong57 , @bmeldal (I hope I haven't forgotten anyone! - in no order!)

Sides: 2018_02_22_inferring_GOannotations_from_complex_to_GPs.pptx [link updated on 27/2/18, had the wrong file!]

Complex Names: Harold: Complex names can be very long and cumbersome which makes them hard to search for, can we find short forms? Sandra: There are short labels in the DB but we don't display them. Birgit: If a shorter form exists it's in the synonyms which are in the search index. Also can search with gene symbols (which make up the systematic name).

Use of contributes_to : also https://github.com/geneontology/go-annotation/issues/1650 - ticket for guidelines proposal for contributes_to We spent most of the call on this!

  1. How do users use qualifiers and annotation extensions? Val (about PomBase users):
    • Enrichment usually run over BP, maybe CC, so loss of MF qualifiers for enrichment tools not too dramatic.
    • Qualifiers are displayed on Gene Pages where people can see them in context and is useful.
    • Searches for GPs with X function. If regulatory subunits$ annotated with contributes_to MF and qualifier is stripped resulting list is strange - users can filter them out but they need to know about it!

$ regulatory subunits: any GP (protein or otherwise) that has not been identified to be carrying out the enzymatic activity of the complex but are consistently found as complex member. They may or may not be essential (we don't distinguish essential subunits in the CP as most experiments don't go into that detail consistently).

  1. How GO annotators use the qualifier contributes_to:

Ruth: What about homodimers? Annotate directly. Discussion highlighted issue that we can never know if the function is carried out by the monomer or homodimer (or even homomultimer) if protein selfassembles in solution. AI: Birgit to add PDGF examples

Summary: Different groups use slightly different guidelines (and it may even vary within groups) either annotating all regulatory subunits of a complex with contributes_to or only in cases where the catalytic subunit has not been identified. Solution: Draw up new annotation guidelines (https://github.com/geneontology/go-annotation/issues/1650) and revisit all annotations. Birgit: to provide a list of GPs that have NOT enzyme as biological role in complexes in CP as a guide (list won't be comprehensive as DB has not got full coverage yet!).

  1. How can we automatically infer MF annotations to complex ACs from Complex Portal to GP?

Rational WHY I want to do this:

Summary from GOC meeting in Cambridge (Oct 2017): image (AE suggestion was Kimberly's)

Val: Annotations on Gene Pages link MFs to complexes using occurs_in Birgit: Are the MF and CC annotations connected? If not we have a list of functions and a list of complexes but no link. Can we have an example (screenshot) please, @ValWood ?

Ruth (initial gut feeling): export for catalytic subunits but not regulatory subunits. Pascale/Kimberly: make the distinction between catalytic and regulatory subunits Ruth: is there a clear line between what is a catalytic subunit?

Birgit: Who would use these annotations??? What do they really need??? Judy: may know some power users that may be able to make use of these complex annotations.

NO SOLUTION YET!!! Options:

Going forward:

@judyblake (@hdrabkin /@deustp01 ) to pass on details of users to Birgit, Birgit to get some feedback before next call (8/3/18).

Birgit

(I've probably forgotten something or someone so please add your comments. I'll be updating the contributes_to guideline ticket later.)

bmeldal commented 6 years ago

Ok, forgot one thing:

"X binding":

So far we have discussed what to do with catalytic activity but we also have MF annotations to "binding". We don't use "protein/complex" binding, that sort of data is captured by IntAct and exported from there, but any other type of "binding", e.g.:  image

Caveat: We don't know which subunit binds the target, unfortunately, we haven't captured that yet (but I just got an idea how I could do it so I could go back and add it in if we want it!). [Note to self: either by using the reference column with pipe or adding with/from as new field to our editor - which would be helpful anyway for creating our files.]

"Homework" for everyone:

Think about binding terms that the user might want/need. @RLovering can you think about this in the context of GREEKC, please?

bmeldal commented 6 years ago

Not discussed on the call:

but I'll run the issues by the users as well when I speak to them.

bmeldal commented 6 years ago

Example for homodimers: https://www.ebi.ac.uk/complexportal/complex/EBI-2881436 Platelet-derived growth factor AA complex Homodimer of P04085 PDGF subunit A Only exists as dimer and functions a ligand for PDGF receptors I haven't looked for the experiments for the activity as the complex evidence came from a crystal but from memory PDGF ligands are well described as obligate dimers.

PDGF ligands come in 5 flavours: AA, AB, BB, CC, DD.

And, the receptors (alpha-alpha, beta-beta or alpha-beta) don't dimerise until the ligand complex binds, forming an obligate heterotetrameric receptor-ligand complex!

So, the activities rely both on the dimeric ligand and the tetrameric receptor-ligand.

Food for thought how you would annotate that!

Added to google doc as well.

bmeldal commented 6 years ago

Google sheet for list of GPs as participants of catalytic complexes and their biol role annotations in the CP: https://docs.google.com/spreadsheets/d/1-9PdAJ8BvrjhPWLx5pB9N0rS6hDNgYifO4_3CmbplEo/edit?usp=sharing

bmeldal commented 6 years ago

Summary from call on 15/3/18:

Present: Lauren-Philip, Pascale, Peter, Edith, Kimberly, Harold, Val, Tanya, Ruth, Sandra, Birgit

  1. Lauren-Philip has a protein-protein interaction background and introduced his interest in complexes as drug targets.

    • We already link to ChEMBL www.ebi.ac.uk/chembl which might be a way of collecting drug data (ChEMBL incl links to Drugbank).
    • GO has a class "drug binding" but we don't use it in the CP.
  2. We looked through the Google sheet that contains all GPs from complexes annotated to "catalytic activity" and split by their biological roles: enzyme, enzyme regulator and unspecified role.

Points to consider:

Decision 1: Infer MF only to GPs annotated with biological role=enzyme!

Thought: can we capture the "regulator activity" by walking through the ontology?

E.g. CPX-1001(EBI-13638510) Calcineurin-Calmodulin complex, gamma-R1 variant has GO:0033192 calmodulin-dependent protein phosphatase activity. Its parent GO:0004723 calcium-dependent protein serine/threonine phosphatase activity has a FUNCTION regulates child GO:0008597 calcium-dependent protein serine/threonine phosphatase regulator activity which would be applicable for the complex's 2 regulatory subunits, Calmodulin and Calcineurin.

Decision 2: Try and infer MF to GPs annotated with biological role=enzyme regulator to the class of regulator activity

  1. X binding

Decision 3: At the moment we can't infer these as we didn't annotate the X binding property to the specific complex member but only the complex. Ruth: Could MODs annotates missing binding evidences from papers CP curators find as part of their curation? - add to GOC mtg agenda for Complex topic

Thank you all for your valuable input!!! I have something to work on now (well, Noe and Tony :) )

bmeldal commented 6 years ago

Update: Rather than sending papers with missing GP annotation evidence to MODs we could add those directly through P2GO --> AI: Complex curators to be trained in P2GO!

bmeldal commented 6 years ago

Update from NYU GOC mtg:

What do we do if there are 2 or more enzyme subunits and 2 or more function annotations on one complex? A script doesn't know which subunit has which function. --> Manually annotate in P2GO?

pgaudet commented 7 months ago

This translation is now done. GO MFs are NOT inferred from Intact/Complex portal annotations, only BPs.