geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
223 stars 40 forks source link

dimerization MF terms #11094

Open gocentral opened 10 years ago

gocentral commented 10 years ago

Wasn't there a plan to obsolete these MF terms ? (they are basically terms describing subunit composition defned as molecular functions)

GO:0042803 protein homodimerization activity
GO:0046982 protein heterodimerization activity GO:0046983 protein dimerization activity
GO:0051260 protein homooligomerization

I'm sure this was actioned at one of the consortium meetings...

Reported by: ValWood

Original Ticket: geneontology/ontology-requests/10909

gocentral commented 10 years ago

Will search collective editors memories on next call.

Original comment by: tberardini

gocentral commented 10 years ago

Original comment by: tberardini

gocentral commented 10 years ago

Harold - do you remember?

Original comment by: tberardini

gocentral commented 10 years ago

We did NOT agree to implement this officially as far as I can remember

I still don't see what the problem is with homodimerization activity as a type of protein binding where the property of the protein is that it binds itself.

Original comment by: hdrabkin

gocentral commented 10 years ago

OK, we (PomBase) thought there was some action for these. We have them in our collection of terms not to use in annotation.

They do seem to be a problem though, because this is describing more than a binding activity (how does it differ from the parent 'protein self binding ?)

the def "Interacting selectively and non-covalently with an identical protein to form a homodimer" seems to be more about the subunit composition, rather than the activity....

So the current guidance is to keep using these terms?

v

Original comment by: ValWood

gocentral commented 10 years ago

Maybe it was the process terms which were a problem?

http://wiki.geneontology.org/index.php/Annotation_Conf._Call_June_11,_2013

I think the discussion was that this practice resulted in the proteins which from oligomers also get annotated to the process 'oligomerization'

which is not really correct.

Original comment by: ValWood

gocentral commented 10 years ago

Val; protein self-binding doesn't infer a stop at dimer Interacting selectively and non-covalently with a domain within the same polypeptide.

homodimer specifies a limit

Also remember that these terms are designating an activity by the protein all by itself. No caperones, etc.

Original comment by: hdrabkin

gocentral commented 10 years ago

Hi

it is very useful to be able to state that a protein is a homo or heterodimer. Many proteins only function when in the homo or heterodimer state eg nuclear receptors, RXR and LXRs. In addition many receptors that homodimerise bind to homodimerised ligands. Furthermore some genes encode a variety of isoforms and therefore the homodimerisation or heterdimerisation state of these protein complexes is not straightforward.

For many protein complexes is is often just as important to be able to capture that a protein binds to itself (or a similar protein) as it is to capture that a protein binds an unrelated protein. If we remove these homo/heterodimerization activity terms then we are implying that some protein interactions are more important than others.

Ruth

Original comment by: RLovering

gocentral commented 10 years ago

OK, so what about the processes "protein homotrimerization" etc. How do these differ from "protein complex assembly"

Original comment by: ValWood

gocentral commented 10 years ago

Hi,

Why can't those be types of complexes? The fact that a protein is a multimer is neither a process or a function. Can we create CC terms 'dimer', homodimers, etc, and use that instead ? "protein homodimerization activity" is not very informative as a function.

Thanks,

Pascale

Original comment by: pgaudet

gocentral commented 10 years ago

Hi Pascale

Please read my comments that I had included in the SF item

it is very useful to be able to state that a protein is a homo or heterodimer. Many proteins only function when in the homo or heterodimer state eg nuclear receptors, RXR and LXRs. In addition many receptors that homodimerise bind to homodimerised ligands. Furthermore some genes encode a variety of isoforms and therefore the homodimerisation or heterdimerisation state of these protein complexes is not straightforward.

For many protein complexes is is often just as important to be able to capture that a protein binds to itself (or a similar protein) as it is to capture that a protein binds an unrelated protein. If we remove these homo/heterodimerization activity terms then we are implying that some protein interactions are more important than others.

Ruth

From: Pascale Gaudet pgaudet@users.sf.net<mailto:pgaudet@users.sf.net> Reply-To: "[geneontology:ontology-requests]" 10909@ontology-requests.geneontology.p.re.sf.net<mailto:10909@ontology-requests.geneontology.p.re.sf.net> Date: Friday, 6 June 2014 12:04 To: "[geneontology:ontology-requests]" 10909@ontology-requests.geneontology.p.re.sf.net<mailto:10909@ontology-requests.geneontology.p.re.sf.net> Subject: [geneontology:ontology-requests] #10909 dimerization MF terms

Hi,

Why can't those be types of complexes? The fact that a protein is a multimer is neither a process or a function. Can we create CC terms 'dimer', homodimers, etc, and use that instead ? "protein homodimerization activity" is not very informative as a function.

Thanks,

Pascale


[ontology-requests:#10909]http://sourceforge.net/p/geneontology/ontology-requests/10909/ dimerization MF terms

Status: open Group: None Created: Wed Jun 04, 2014 10:30 AM UTC by Valerie Wood Last Updated: Fri Jun 06, 2014 10:28 AM UTC Owner: Harold J. Drabkin

Wasn't there a plan to obsolete these MF terms ? (they are basically terms describing subunit composition defned as molecular functions)

GO:0042803 protein homodimerization activity GO:0046982 protein heterodimerization activity GO:0046983 protein dimerization activity GO:0051260 protein homooligomerization

I'm sure this was actioned at one of the consortium meetings...


Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/geneontology/ontology-requests/10909/

To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

Original comment by: RLovering

gocentral commented 10 years ago

Hi Ruth,

I meant to say, instead of using

Do you mean to say that doesn't capture what you need with respect to the multimeric status of the active form of the protein ?

Pascale

Original comment by: pgaudet

gocentral commented 10 years ago

Hi Pascale

Would you rather annotate to MF receptor binding, or to CC receptor complex

Ruth

From: Pascale Gaudet pgaudet@users.sf.net<mailto:pgaudet@users.sf.net> Reply-To: "[geneontology:ontology-requests]" 10909@ontology-requests.geneontology.p.re.sf.net<mailto:10909@ontology-requests.geneontology.p.re.sf.net> Date: Friday, 6 June 2014 12:32 To: "[geneontology:ontology-requests]" 10909@ontology-requests.geneontology.p.re.sf.net<mailto:10909@ontology-requests.geneontology.p.re.sf.net> Subject: [geneontology:ontology-requests] #10909 dimerization MF terms

Hi Ruth,

I meant to say, instead of using

Do you mean to say that doesn't capture what you need with respect to the multimeric status of the active form of the protein ?

Pascale


[ontology-requests:#10909]http://sourceforge.net/p/geneontology/ontology-requests/10909/ dimerization MF terms

Status: open Group: None Created: Wed Jun 04, 2014 10:30 AM UTC by Valerie Wood Last Updated: Fri Jun 06, 2014 11:04 AM UTC Owner: Harold J. Drabkin

Wasn't there a plan to obsolete these MF terms ? (they are basically terms describing subunit composition defned as molecular functions)

GO:0042803 protein homodimerization activity GO:0046982 protein heterodimerization activity GO:0046983 protein dimerization activity GO:0051260 protein homooligomerization

I'm sure this was actioned at one of the consortium meetings...


Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/geneontology/ontology-requests/10909/

To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

Original comment by: RLovering

gocentral commented 10 years ago

Hi Ruth,

I was talking about dimerization as a MF versus dimer as CC. I dont see why you couldnt have two annotations:

  1. ProteinA CC= homodimer
  2. ProteinA MF= x ligand binding.

And that's not very different form how you would do it with a MF:

  1. ProteinA MF= homodimerization activity
  2. ProteinA MF= x ligand binding.

Are you talking about the same thing ?

Thanks,

Pascale

Original comment by: pgaudet

gocentral commented 10 years ago

Hi Pascale

Can we just agree to differ on this one. The way binding is progressing Im not sure that these questions will be relevant in 6 months and I have a lot to do.

To conclude I just don't get why saying a protein binding a ligand is more important than saying it binds itself.

So I would suggest if you don't want to capture homo and heterodimerisation why capture ligand binding

The idea of MF is to try to suggest a functional role a protein has in biological process and also a functional role a protein has within a cellular component. Consequently for some proteins that role is to bind another protein which just happens to be a homo or hetero dimer interaction. The other option is that we would annotate

protein A protein binding Protein A-1 rather than protein A heterdimer activity protein A-1

Sorry to not agree with you

Ruth

From: Pascale Gaudet pgaudet@users.sf.net<mailto:pgaudet@users.sf.net> Reply-To: "[geneontology:ontology-requests]" 10909@ontology-requests.geneontology.p.re.sf.net<mailto:10909@ontology-requests.geneontology.p.re.sf.net> Date: Friday, 6 June 2014 13:53 To: "[geneontology:ontology-requests]" 10909@ontology-requests.geneontology.p.re.sf.net<mailto:10909@ontology-requests.geneontology.p.re.sf.net> Subject: [geneontology:ontology-requests] #10909 dimerization MF terms

Hi Ruth,

I was talking about dimerization as a MF versus dimer as CC. I dont see why you couldnt have two annotations:

  1. ProteinA CC= homodimer
  2. ProteinA MF= x ligand binding.

And that's not very different form how you would do it with a MF:

  1. ProteinA MF= homodimerization activity
  2. ProteinA MF= x ligand binding.

Are you talking about the same thing ?

Thanks,

Pascale


[ontology-requests:#10909]http://sourceforge.net/p/geneontology/ontology-requests/10909/ dimerization MF terms

Status: open Group: None Created: Wed Jun 04, 2014 10:30 AM UTC by Valerie Wood Last Updated: Fri Jun 06, 2014 11:32 AM UTC Owner: Harold J. Drabkin

Wasn't there a plan to obsolete these MF terms ? (they are basically terms describing subunit composition defned as molecular functions)

GO:0042803 protein homodimerization activity GO:0046982 protein heterodimerization activity GO:0046983 protein dimerization activity GO:0051260 protein homooligomerization

I'm sure this was actioned at one of the consortium meetings...


Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/geneontology/ontology-requests/10909/

To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

Original comment by: RLovering

gocentral commented 10 years ago

The homotrimerization is a specific type of complex assembly. Note that there could be other proteins, etc, involved in the process, and not an inherant property of the protein itself (ie, an activity).

Original comment by: hdrabkin

gocentral commented 10 years ago

Yes there could be other proteins involved, but that is not how it has ever been used as far as I can tell (especially by any of the numerous IEA mappings). Is there any value in recording that this is a different process? (how would you differentiate
the process, apart from by subunit composition?). More valuable to have the specific complex which is being assembled. I am unconvinced that it has been used in this way.

If this is the case, to be consistent here, then all of the x subunit complexe assembly terms should move under the corresponding protein homooligomerization term.

Original comment by: ValWood

gocentral commented 10 years ago

i.e is it useful to know that a protein is involved in "protein heterodimerization" if you don't know of what?

Most of the proteins annotated to these terms are clearly proteins which are themselves heterodimers etc....which seems a pretty good reason to obsolete them.... (and recommend reannotation to the MF term, or to protein assembly of the specific complex)

Original comment by: ValWood

gocentral commented 10 years ago

Val: But only if they are homo-oligomers; not all protein complexes are homo anything.

Maybe we should ditch the process terms (so we only have complex assembley)?

Original comment by: hdrabkin

gocentral commented 10 years ago

That would make sense to me. I thought we had come to that conclusion before. Maybe one to raise at the next GO meeting?

Val

Original comment by: ValWood

gocentral commented 10 years ago

The original post, however was addressing these guys GO:0042803 protein homodimerization activity GO:0046982 protein heterodimerization activity GO:0046983 protein dimerization activity GO:0051260 protein homooligomerization

I would still opt to keep the homodimerication activity since it refers to the property of one protein

Original comment by: hdrabkin

gocentral commented 10 years ago

it was a mixture... also included GO:0051260 protein homooligomerization I was querying all the function terms and the process terms which we have automated mappings to.

I think the case for getting rid of the processes is clearer and we should proceed with that if possible. I'm not convinced that the function terms are in scope as MF curation. However, I agree it is useful to collect subunit composition data, and so the justification for keeping these is stronger, if resources do not have an alternative mechanism to record this information.

v

Original comment by: ValWood

gocentral commented 10 years ago

Val, I would also be happy to have "homodimer" in CC, especially when I have to make PRO ids for various homodimers as complexes, it would help to be able to make an "is_a" to a GO id for that

Original comment by: hdrabkin

gocentral commented 10 years ago

I didn't know this one was going to be a can of worms. Will raise at the next meeting to see what the best solution is.

Val

Original comment by: ValWood

gocentral commented 10 years ago

Original comment by: ValWood

gocentral commented 10 years ago

Hi Val

if you could get a resolution on this issue at the GOC that would be great. I agree that having a BP term for protein complex assembly (and removing the BP oligomerization terms) may enable more consistent annotation to this term, with the proteins facilitating the assembly annotated to this term. Then having the MF terms perhaps expanded to include oligomerization activity etc with the aim that terms such as heterodimerization activity are of limited use without an ID in the with field. Not sure about the CC terms!

Best

Ruth

Original comment by: RLovering

gocentral commented 10 years ago

I'll put it on the agenda later today...I have a list ;)

Original comment by: ValWood

gocentral commented 10 years ago

Paola has already added with a link to http://sourceforge.net/p/geneontology/ontology-requests/11087/

Original comment by: ValWood

pgaudet commented 7 years ago

We need to make a decision on these terms. GO-CAM models suggest they are not useful.

RLovering commented 7 years ago

Hi my understanding of these terms was contradicted in the recent GOC meeting. But looking at the comments listed above I do feel that my interpretation of how to apply these annotations was correct.

The MF terms for dimerization have been applied to a protein subunit to indicate that that subunit binds to an identical or nonidentical subunits (eg GO:0042803 homodimerization: definition Interacting selectively and non-covalently with an identical protein to form a homodimer.) When the homodimer term is applied the WITH field would include the same protein ID as the protein ID annotated. The comments above by Harold/Val/Pascale. These MF terms should not be applied to scaffold proteins (for eg) that facilitate the dimerization of subunits. Or maybe I misinterpreted the comments in the meeting that these terms were not applied to the proteins dimerising. I commented previously (June 2014) about how useful these terms are, and with the application of GO to describe druggable targets the homodimerization information is likely to be useful.

Whereas the BP terms. such as protein complex assembly, are applied to proteins that bring proteins together and therefore have a role in assembly of oligomers. These terms are often incorrectly applied to proteins that are the targets of the process (ie the proteins that make up the oligomer) rather than scaffold proteins (for eg).

I still think this implies that it is more important to capture that a protein binds a different protein than it is to capture that it binds an identical or similar protein. As I have mentioned on many occassions we now have a pipeline that is exporting GO PPI data to PSICQUIC which enables the PPI data we capture to be included in network analysis. If you remove the MF dimer terms then I would hope that these interactions will be captured using the parent protein binding term rather than all of these annotations being deleted. If new CC terms are created then I guess the annotations could be revised automatically from homodimerization activity to homodimer (etc) with the same evidence codes? But this then leaves the question about all the other MF binding annotations. The curator will need to decide if they are going to make an MF annotation to capture protein interaction data, or a CC annotation to capture this data. So there will be a reduction in the consistency of binding data annotation. Although I also appreciate that this system does not support the annotation of complexes made up of multiple identical proteins, there again the term identical protein binding can be used for these (or not if this term is removed too).

I hope that this is not the start of a move to remove all protein interaction data from GO.

Please make a clear statement about whether you are referring to BP or MF terms, and what replacement terms will be suggested, (if any) and what will happen to the existing annotations.

Thanks

Ruth

RLovering commented 7 years ago

Hi

I have had a chat to Sandra and we both agree that homodimer is useful information for drug development. However there is a problem with the application of this term because we have been encouraged to use this term for trimers, as the trimer terms were not created and so we were encourage to assume that 'it must have formed a homodimer before it formed at trimer'. But this means that homodimer will have been applied to trimers so a more accurate interpretation of these annotations would be to say 'identical protein binding'. So perhaps the homodimer terms could be moved up to identical protein binding. I still think heterodimer is useful but at the same time probably these should use be revised to more useful complex statements

Ruth

pgaudet commented 7 years ago

Hi Ruth,

How about using GO:005515 protein binding with the same protein, instead of protein homodimerization activity? (same for dimerization activity, we could just specify the partner).

Thanks, Pascale

RLovering commented 6 years ago

Hi

I am fine with MF: homodimerization being CC homodimer.

However, can we keep MF: GO:0042802 identical protein binding

Ruth

thomaspd commented 3 years ago

Pascale's suggestion to just annotate to "protein binding" and put the same protein in the enabled_by slot and has_input slot makes sense, and would also work in GO-CAM. All pairwise protein-protein interactions would be treated the same way, then, whether they are between two different proteins, or two molecules of the same protein.

If there's interest from users in retrieving the set of all proteins that bind another protein of the same type, we can do that as a SPARQL query. I don't think the set of "all proteins that form homomultimers" is very useful for enrichment use cases, but if we change our minds later we can just add back the class and populate it with the SPARQL query.

RLovering commented 3 years ago

sounds like you have a plan, the only group that seems to have found a use for these annotations published in 2009 https://pubmed.ncbi.nlm.nih.gov/19640831/ hopefully bioinformatics research has moved on from this level of analysis.

I guess I would still prefer to have the term identical protein binding rather than just protein binding as at least this makes a statement that can be propagated to orthologous proteins. An annotation to protein binding with the same protein in the has_input slot will not get propagated as the has_input ID would have to change for each different species.

Ruth

pgaudet commented 3 years ago

I dont see a problem keeping 'identical protein binding' - I guess we'd need a check on the 'with' to make sure people don't use 'protein binding' - the point being, if we dont use the term consistently, it's less valuable.

ValWood commented 3 years ago

I guess I would still prefer to have the term identical protein binding rather than just protein binding as at least this makes a statement that can be propagated to orthologous proteins.

but we don't propagate 'protein binding' do we?

bmeldal commented 3 years ago

IntAct only outputs "protein binding" irrespective of whether it's to another protein, an identical protein or itself. We make that distinction in IntAct but the GAF script ignores it.

bmeldal commented 3 years ago

PS: the binding partner is in the with/from field.

ValWood commented 3 years ago

I still think a query is the best way to receive this information. Especially since most annotation groups will not specify, a query would be comprehensive for existing annotation, but using the term would only give a partial dataset.

This is the sort of information (how to retrieve self binding proteins) that could be in the FAQ, which was once started, but abandoned? This could also be pointed to in answer to the increasingly frequent twitter storms. People don't come to GO for help, they pan GO on twitter. https://twitter.com/KathrynCrouch81/status/1358716940429254656

ValWood commented 3 years ago

Ah the FAQ is still active and quite extensive http://geneontology.org/docs/faq/ It would be useful if outreach could point the twitteratia to GO answers in the FAQ as and when questions arise.

That would have the advantage of bringing lots of people to the info.

RLovering commented 3 years ago

Hi Val

for human there are nearly 20,000 annotations based on ISS, IBA or IEA evidence for child terms of protein binding, https://www.ebi.ac.uk/QuickGO/annotations?goUsage=descendants&goUsageRelationships=is_a,part_of,occurs_in&goId=GO:0005515&evidenceCode=ECO:0000250,ECO:0000247,ECO:0000266,ECO:0000318,ECO:0000319,ECO:0000501&evidenceCodeUsage=descendants&taxonId=9606&taxonUsage=descendants

Of which over 500 proteins are associated with the homodimer term and 716 proteins are associated with identical protien binding.

For human proteins with manual evidence annotations (over 250,000) https://www.ebi.ac.uk/QuickGO/annotations?goUsage=descendants&goUsageRelationships=is_a,part_of,occurs_in&goId=GO:0005515&taxonId=9606&taxonUsage=descendants&evidenceCode=ECO:0000352&evidenceCodeUsage=descendants

The identical protein binding term is associated with 1,503 proteins.

I realise this has no value in enrichment analysis, I just wonder if this has some value for other research projects - eg for drug development it would be useful to know that the target dimerises or multimerises. Although I also appreciate that there are other ways of identifying proteins with this capacity. Just seems like a lot of data to dump.

Ruth

thomaspd commented 3 years ago

About documentation/FAQ, we have some text from our Jan 2018 NAR paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210579/) that could be a place to start for text about direct annotations to protein binding: Protein binding annotations are only useful if they include the specific protein binding partner. With the addition of the IntAct database (10) as a GO annotation provider, the number of specific protein binding annotations has increased dramatically (Table ​(Table2,2, first column). Only high-confidence annotations are incorporated into GO from IntAct. Combined with annotations from hypothesis-driven, small-scale experiments that have been contributed to GO from multiple different annotation providers, IntAct annotations help make the GO knowledgebase a useful resource for high-confidence protein interaction network data. To create protein interaction networks, users need to utilize the ‘with’ field (column 8) of the GO Association Files (GAF), which contains the identifier of the interacting partner.

Ideally, we'll have the binding partner in the has_input extension in the near future.