geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
220 stars 40 forks source link

Merge 'protein complex' term into macromolecular complex, rename 'protein-containing complex' (was: MP: GO:0005945 6-phosphofructokinase complex -> high level protein complex/macromolecular complex term) #12782

Closed ValWood closed 6 years ago

ValWood commented 7 years ago

does not have the parent protein complex

krchristie commented 7 years ago

The term "transcription ternary complex", the one child term of "protein-DNA-RNA complex", is the one thing I thought of that would be a complex of protein, DNA, and RNA, but it doesn't have any annotations either. That doesn't seem very surprising since I've only heard the term "ternary complex" when I worked in a transcription lab, not in papers doing routine analysis of what transcription factors are important for their favorite gene. Perhaps it would be useful to add "transcription elongation complex" as a synonym for "transcription ternary complex".

Anyway, I can't think of any reason why it would be a problem to make "protein-DNA-RNA complex" a type of "protein-DNA complex".

ValWood commented 7 years ago

Hi Karen, how doe it relate to existing grouping term? http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0008023#term=ancchart val

krchristie commented 7 years ago

The "ternary elongation complex" is the complex of the RNA polymerase, the DNA template, and the RNA transcript. A "transcription elongation factor complex" might bind to the "ternary elongation complex" to regulate the elongation properties of the RNA polymerase, but a "ternary elongation complex" is not a type of "transcription elongation factor complex".

ValWood commented 7 years ago

So which gene products would you annotate to a ternary complex other than the RNA polymerase subunits? why not have it as a related synonym of RNA polymerase?

NancyCampbell commented 7 years ago

I agree with David Hill's suggestion

Rename macromolecular complex to be protein-containing complex

Merge protein complex with protein-containing complex

My interest is in the telomerase protein-RNA complex or Telomerase (holoenzyme) complex for which the hierarchy as it currently stands is_a GO:0030529 intracellular ribonucleoprotein complex which is_a GO:1990904 ribonucleoprotein complex which is_a GO:0032991 macromolecular complex.

Yes it would be nice if the hierarchy higher up showed that telomerase falls under 'protein-containing' complex.

tberardini commented 7 years ago

wrt step 2, based on discussion during GO eds call this morning.

Why not move 'protein complex' (meaning protein only complex) to be a child of 'protein-containing complex' (was 'macromolecular complex') and make sure that children of 'protein complex' that should really be under protein-RNA or protein-DNA or protein-ligand complex are moved to those parents instead?

What are the cons of doing it this way? Too much work for too little gain?

bmeldal commented 7 years ago

Tanya,

The reason we are trying to come up with an overall term is because users filter for 'protein complex' expecting to find ALL complexes, incl those containing non-protein participants.

And, the whole class of 'protein complex' is not really consistent. Each time we find an example with a non-protein group, the whole branch gets moved directly under 'macromolecular complex' and users lose it when filtering by 'protein complex'.

NancyCampbell commented 7 years ago

ok. here is a "slightly" bonkers idea:

rename macromolecular complex to protein-containing complex (or simply protein complex) rename current protein complex to protein only complex

does this not mean that all current complexes would still fall under the correct branch (without more work) and, all complexes (whether protein only, protein plus prosthetic groups, protein plus nucleic acid) will be pulled out if searching for protein(-containing) complexes???

(mind you probably what I suggested is more than 'slightly' bonkers and I am missing something huuuuge)

bmeldal commented 7 years ago

Nancy, that was pretty much discussed above. I thought we were going to re-name 'macromolecular complex' to 'protein-containing complex'?

However, 'protein-only complex' doesn't work for the current 'protein complex' class as it still contains many examples of not-only-protein protein complexes (and none has the time to investigate every term, we fixed them if and when we found them...). Hence why we were asking to obsolete that class and merge it with 'macromolecular complex'.

ukemi commented 7 years ago

And there are many children of macromolecular complex that should be children of protein-only complex. So at the end of the day is it worth the work to go through all the direct children of macromolecular (protein-containing) complex and sort them with respect to whether they only contain proteins, and go through the current children of protein (protein-only) complex and try to pull out all the ones that have members that contain more than just protein?

If we think that it is important for our users to find complexes that contain only proteins then we should keep the protein complex class. Perhaps then we should just go through and move all the current children of macromolecular complex that we suspect contain only proteins to protein complex. This will also require the clean up of a lot of axioms that currently are not necessary and sufficient because the genus should be 'protein complex' rather than 'macromolecular complex'. If we then discover that a child of protein complex has members that contain more than just proteins, we create an SOP where we either move it to be a child of macromolecular complex and create new children of the protein-only and other class, or an SOP where we rename the existing complex to be 'protein-only-containing complex X' or a better name if it exists and we create a new 'protein-Somethingelse complex' as a child of the appropriate sub-type of macromolecular complex.

In either case, if we decide it is valuable to have the distinct protein-only class, we need to do a clean-up of all of the current direct children of macromolecular complex. I think the proetin-only class would be required for true exhaustiveness in the ontology, just an aside.

cmungall commented 7 years ago

My 2c, keep the ontology simple. But at the same time ensure annotations are as complete as possible. We can simply have templated amigo queries like these http://amigo.geneontology.org/grebe to retrieve complexes with non-protein members vs no known non-protein members.

ukemi commented 7 years ago

This is a good idea in general, but since we don't annotate the non-protein parts, they aren't available for query. This will work for other cases where we do make distinctions in the annotations. See this ticket for a current discussion:

12832

I think it might be a good strategy to not distinguish GO complexes by membership and bring them to the level of functional conservation. But I waiver on this.

hdrabkin commented 7 years ago

The RNA of RNase P enzymes might be annotated (eg, Rpph1; which I will annotate today!).

tberardini commented 7 years ago

@bmeldal and @ukemi : thanks for the additional information. No more questions (or objections) from me about the potential merge. That seems like the most pragmatic course of action.

ukemi commented 6 years ago

@pgaudet Since you are our expert at merging terms, let's put this on at the top of our list for the ticket workshop. OK?

pgaudet commented 6 years ago

Hi,

Getting ready to start on this: The action is to merge: 'GO:0043234 protein complex ' GO:1990904 ribonucleoprotein complex' 'GO:0032992 protein-carbohydrate complex' 'GO:0032993 protein-DNA complex' 'GO:0001114 protein-DNA-RNA complex' 'GO:0032994 protein-lipid complex' 'GO:1990684 protein-lipid-RNA complex'

into macromeolecular complex- Correct ?

@ukemi @bmeldal @vanaukenk

Thanks, Pascale

ukemi commented 6 years ago

I thought we were just going to merge 'protein complex' and all associated terms (regulates, assembly etc) into 'macromolecular complex' terms and rename 'macromolecular complex' to 'protein containing complex'.

pgaudet commented 6 years ago

OK, so you want to keep the other terms - is all relevant complexes are under the correct parent ? (it seems we might have the same issue). Either way is fine for me.

pgaudet commented 6 years ago

I am not sure we should rename 'protein-containing complex' - doesn't it depend on what you annotate ?

bmeldal commented 6 years ago

Hi all,

Above we agreed to rename macromolecular complex to protein-containing complex and making macromolecular complex exact synonym. That way users will hopefully find this term when searching for the obsoleted protein complex complex term (which should have a comment to refer them to protein-containing complex).

We agreed to keep the protein-X complex classes and move terms in and out of these specific sub-classes if and when we have new knowledge of non-protein molecules now being members of a complex.

One of the main reasons to do this was because users were searching for protein complex and missing things like ribosomes and then complaining (@ValWood 's standard example :) )

Once this change is done we may want to send out a message to users (tweet?) that this change has occurred as it's quite signification.

Birgit

PS: I thought you guys are in Denver, have you reached insomnia stage???

bmeldal commented 6 years ago

PPS: Personally, I'd be happy to simply merge protein complex into macromolecular complex but there were arguments above for keeping the term "protein" in this top-level class name.

pgaudet commented 6 years ago

macromolecular complex to protein-containing complex and making macromolecular complex exact synonym How about 'macromolecular complex ' being a 'related' synonym? We cannot say it's exact.

(I'm in Geneva, but I am also worrried about David!)

bmeldal commented 6 years ago

I hope you don't have to work til late at night to keep in touch with them in Denver ;-)

ValWood commented 6 years ago

I don't mind what it's called as long as there are synonyms.

I don't think we really need any of the terms to classify complexes by the type of molecule they contain. You could get this information another way if you really wanted it.... you would just query for rRNA or whatever, and then "Macromolecular complex". It's not a distinction I have been aware of users ever wanting to make though...but it would be information that would be pretty trivial to obtain bioinformatically using GO (and much more accurately than using the current annotation).

pgaudet commented 6 years ago

Hello,

Here's what I did:

MERGED

 id: GO:0032984 -name: macromolecular complex disassembly +name: protein-containing complex disassembly GO:0043241 protein complex disassembly +alt_id: GO:0043241

 id: GO:0034622 -name: cellular macromolecular complex assembly +name: cellular protein-containing complex assembly +alt_id: GO:0043623 cellular protein complex assembly

 id: GO:0065003 -name: macromolecular complex assembly +name: protein-containing complex assembly +alt_id: GO:0006461 protein complex assembly

 id: GO:0044877 -name: macromolecular complex binding +name: protein-containing complex binding +alt_id: GO:0032403 protein complex binding

 id: GO:0043933 -name: macromolecular complex subunit organization +name: protein-containing complex subunit organization +alt_id: GO:0071822 protein complex subunit organization

RENAMED  id: GO:0034367 -name: macromolecular complex remodeling +name: protein-containing complex remodeling

id: GO:0043933 -name: macromolecular complex subunit organization +name: protein-containing complex subunit organization

 id: GO:0044877 -name: macromolecular complex binding +name: protein-containing complex binding

 id: GO:0065003 -name: macromolecular complex assembly +name: protein-containing complex assembly

 id: GO:0097695 -name: establishment of macromolecular complex localization to telomere +name: establishment of protein-containing complex localization to telomere

 id: GO:1904913 -name: regulation of establishment of macromolecular complex localization to telomere +name: regulation of establishment of protein-containing complex localization to telomere

 id: GO:1904914 -name: negative regulation of establishment of macromolecular complex localization to telomere +name: negative regulation of establishment of protein-containing complex localization to telomere

 id: GO:1904915 -name: positive regulation of establishment of macromolecular complex localization to telomere +name: positive regulation of establishment of protein-containing complex localization to telomere


@bmeldal @ukemi @ValWood

  1. Please let me know if it's OK

  2. I would be happy to further merge 'cellular xxx' if needed ;)

Thanks, Pascale

bmeldal commented 6 years ago

Thanks, @pgaudet

id: GO:0034622 -name: cellular macromolecular complex assembly +name: cellular protein-containing complex assembly +alt_id: GO:0043623 cellular protein complex assembly id: GO:0065003 -name: macromolecular complex assembly +name: protein-containing complex assembly +alt_id: GO:0006461 protein complex assembly

There are extracellular complexes so I guess the distinguishing terms were developed for that???

pgaudet commented 6 years ago

@bmeldal A naive question: Are there extracellular assembly factors ? I assumed that these get assembled intracellularly and exported.

I can see the value in 'intracellular protein complex' and 'extracellular protein complex', but not relly in their assembly.

(anyhow if the rest seems OK I'll go ahead and merge - please let me know :)

Pascale

bmeldal commented 6 years ago

No idea, I've not come across it but never looked at it either.

Maybe @deustp01 knows as Reactome curate the assembly steps, CP doesn't.

ukemi commented 6 years ago

Hi @pgaudet ,

The above all seem correct. But, I thought there were even more terms that had to do with protein complexes like, regulation of protein complex stability. regulation of protein complex disassembly etc. That's why I've been putting off this ticket for so long. :)

pgaudet commented 6 years ago

Terms that remain containing 'protein complex' :

  1. DONE GO:0031503 protein complex localization -> rename "protein-containing complex localization"

  2. DONEGO:0034629 cellular protein complex localization -> rename "protein-containing complex localization"

  3. DONEprotein complex scaffold activity -> rename "protein-containing complex scaffold activity"?

  4. DONEGO:0090126 protein complex assembly involved in synapse maturation -> rename "protein-containing complex assembly involved in synapse maturation" ?

  5. protein complex oligomerization: 12 EXP by SGD (@srengel and UniProt @ggeorghiou ) -> Can I perhaps merge into 'protein oligomerization? WAIT FOR FEEDBACK

  6. 'protein complex involved in cell adhesion': 12 direct EXP: dictyBase (@pfey), MGI (@ukemi), IntAct (@bmeldal ) 'protein complex involved in cell-cell adhesion': 2 direct EXP: dictyBase (@pfey) 'protein complex involved in cell-matrix adhesion': = direct annotations.

I think these need to go away.

  1. Slightly unrelated: 'protein complex biogenesis' – 0 EXP 'chloroplast ribulose bisphosphate carboxylase complex biogenesis' 5 EXP TAIR 'mitochondrial respiratory chain complex I biogenesis' 3 EXP 'mitochondrial respiratory chain complex II biogenesis' 1 EXP 'mitochondrial respiratory chain complex III biogenesis' 8 EXP 'mitochondrial respiratory chain complex IV biogenesis' 12 EXP 'proton-transporting ATP synthase complex biogenesis' 2 EXP

These should also be obsoleted.

@ukemi @ValWood @bmeldal what do you think ?

Thanks, Pascale

bmeldal commented 6 years ago
  1. protein complex scaffold activity -> rename "protein-containing complex scaffold activity"?

Yes, I think so.

I think these need to go away

Although no objections, but why? I though biogenesis is a whole branch...

ValWood commented 6 years ago

'mitochondrial respiratory chain complex I biogenesis

Yes, if its assembly, these terms can be used https://www.ebi.ac.uk/QuickGO/term/GO:0033108 if its something else (transcription, etc) , the appropriate expression terms can be used.

pgaudet commented 6 years ago

@bmeldal 'Biogenesis' for a protein is translation, isn't it? Unless there is something special about the translation of the protein making up these complexes ???

Or were you talking about 'complex involved in process'? This is clearly a dangerous path....

Thanks, Pascale

pgaudet commented 6 years ago

I could merge x biogenesis into assembly for those, not sure that this is what the annotations were trying to capture. Looks rather like regulation of expression.

bmeldal commented 6 years ago

I haven't used the biogenesis terms so better ask those who have. If it's just translation than there shouldn't be any specific x protein biogenesis terms unless something special happens :)

ValWood commented 6 years ago

historically biogenesis terms were created for some processes when they knew that the production of something was affected but were not sure whether it was the transcription, translation, assembly etc.

The only strong case for keeping is "ribosome biogenesis" which researchers use to include rRNA processing, assembly and export from the nucleus because some of the steps don't appear to be separable (at least currently).

hdrabkin commented 6 years ago

Well, I suppose that 'biogenesis' of a protein could mean other things besides translation depending on what you were referring to (ie, posttranslational events)

pgaudet commented 6 years ago

But then we have 'protein modification.... '

hdrabkin commented 6 years ago

which, again, depending on what protein form you are referring to, would be included in biogenesis. It's a fairly broad grouping term.

ValWood commented 6 years ago

yes it's historic, we shouldn't need them. If you can't be sure which process , don't make the annotation....

ValWood commented 6 years ago

WooHoo. I agree with @bmeldal this is quite a big change, maybe a post on go friends and the consortium list just as a heads up?

ukemi commented 6 years ago

Thanks @pgaudet for taking this on. It was a very complicated merge/rename.

pgaudet commented 6 years ago

No problem!

I create 3 new tickets for the outstanding issues. Closing this one.

bmeldal commented 6 years ago

Thank you everyone!!! I feel like celebrating! I think I first discussed this topic with the then EBI editors 5 years ago :)

bmeldal commented 6 years ago

We just had a IntAct/CP release but I will tweet about these changes next week. Leaving our release tweets on top of the news feed for a few days.

ValWood commented 6 years ago

95 comments!

deustp01 commented 6 years ago

No objections from here. We annotate the assembly of a complex to capture distinct functions mediated by the complex at various stages of its assembly, or to capture interactions with other physical entities that affect distinct steps of the assembly process, and we treat the assembly process as part of whatever process the complex itself mediates, not as a distinct process in its own right, so these changes in GO should not affect us,

bmeldal commented 6 years ago

As I can't see the changes until they go public: @pgaudet

  1. Did you move the "comment" from the old protein complex term to the renamed "protein-containing complex" term?
  2. Have you updated the synonyms? protein complex [narrow] protein-protein complex [narrow]
pgaudet commented 6 years ago

Yes and yes