geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
218 stars 40 forks source link

Merge 'protein complex' term into macromolecular complex, rename 'protein-containing complex' (was: MP: GO:0005945 6-phosphofructokinase complex -> high level protein complex/macromolecular complex term) #12782

Closed ValWood closed 6 years ago

ValWood commented 7 years ago

does not have the parent protein complex

ukemi commented 7 years ago

@dosumis Is this due to a pattern that isn't specific enough?

deustp01 commented 7 years ago

The hierarchy now is transferase complex is_a catalytic complex is_a macromolecular complex. It's hard to see how to get protein complex into that hierarchy without also asserting that all catalytic complexes have only protein subunits, which seems dangerously restrictive.

ukemi commented 7 years ago

Yes. But certainly many of the current children of macromolecular complex are protein complexes. It seems they were mis-classified en mass at some point.

bmeldal commented 7 years ago

This issue comes up all the time! As soon as one child has a non-protein member the whole branch gets moved to macromolecule complex and can't be found under protein complex. We widened the def for protein complexes but only to include prosthetic groups.

See the latest edits here, this might explain some of the problems:

12574 moved the whole branch of catalytic complex out of protein complex because some children of endoribonuclease complex are ribonucleoprotein complexes. (start halfway down with Val's comment on 11/8/16)

12620 made more generic changes.

I'm afraid, we are going round in circles here, folks, and it needs sorting :( I've been banging my head against the wall over this for the past 3+ years...

@dosumis @paolaroncaglia @mcourtot

Birgit

ValWood commented 7 years ago

I still like my crazy suggestion in this ticket

I have a bigger issue......many people use the "protein complex" term, and would expect that to retrieve complexes like the ribosome and the spliceosome and telomerase (I suspect)

Is it possible to define a protein complex as a complex which has only proteins, or protein and RNA components?

so protein complex --ribonucleoprotein complex

Would that be crazy? then everything can go under protein complex, unless we know that it has an RNA component, then it moves down...

..it might not be possible but to me it's similar to saying that a glycoprotein is_a protein....

dosumis commented 7 years ago

"I'm afraid, we are going round in circles here, folks, and it needs sorting :( I've been banging my head against the wall over this for the past 3+ years..."

Indeed.

As far as I'm concerned, we've already agreed that 'macromolecular complex' is the general term. We defined it as having at least one protein component and it has the synonym ''protein containing complex".

All complex classes defined entirely by activity are now under 'macromolecular complex': https://github.com/geneontology/go-ontology/issues/12620. This ensure proper classification where some complexes with an activity are protein (only) complexes and some are RNPs.

I wasn't entirely sure this was the best solution so delayed committing and asked for feedback on it at the time (see ticket).

The obvious way to implement Val's solution would be to obsolete the current 'protein complex' term and rename macromolecular complex to protein complex. If we do this we would be unable to distinguish complexes consisting only of proteins (+ prosthetic groups & covalent modifications; see current protein complex def) from RNPs etc. Is everyone happy with that?

bmeldal commented 7 years ago

The obvious way to implement Val's solution would be to obsolete the current 'protein complex' term and rename macromolecular complex to protein complex. If we do this we would be unable to distinguish complexes consisting only of proteins (+ prosthetic groups & covalent modifications; see current protein complex def) from RNPs etc. Is everyone happy with that?

It's often borderline and down to personal interpretation if something is a prosthetic group or a 'full blown component'... And looking at how users interpret the terms, they expect to retrieve everything that's currently under macromolecular complex when they query on protein complex. But in one of the previous tickets there was hesitation about doing away with the 'protein-only protein complex' class.

Birgit

dosumis commented 7 years ago

obsolete the current 'protein complex' term and rename macromolecular complex to protein complex

Could be done as a merge. The def of macromolecular complex would win. Might cause complaints downstream though if if keeps its ID but gets the name 'protein complex'.

bmeldal commented 7 years ago

Complaints from users or scripts?

paolaroncaglia commented 7 years ago

Would it help, at least in part, to swap primary name and synonym for 'macromolecular complex'? I.e. name it 'protein-containing complex' (and keep 'macromolecular complex' as an exact synonym)

dosumis commented 7 years ago

Complaints from users or scripts?

From consuming databases (see recent complaints from FlyBase). This is just a matter of strategy though. I think the most important thing is answering this question:

If we do this we would be unable to distinguish complexes consisting only of proteins (+ prosthetic groups & covalent modifications; see current protein complex def) from RNPs etc. Is everyone happy with that?

deustp01 commented 7 years ago

Coming back to Birgit's last comment, "prosthetic group" can be defined so that it's not a borderline personal interpretation. It's a molecule that is not encoded directly or indirectly in the genome (i.e., not DNA, RNA, protein) that is associated with a protein and required for the enzymatic activity of the protein or the complex of which the protein is a part (Devlin Biochemistry, 4th edition, page 414). Stryer just says "non-protein", but that definition was clearly composed before the significance of ribozymes was understood, so I think we are allowed to ignore it.

Devlin then distinguishes cofactors and prosthetic groups by the strength of their association with the protein - loose / low-affinity for cofactors and tight / high-affinity / possibly covalent for prosthetic groups - but that subdistinction doesn't matter here.

On the Reactome definition of complex, where any association involving two or more molecules at least one of which is a protein, all are complexes. (Does GO require two or more polypeptides? - I think so.) But we all agree that a complex composed entirely of polypeptides can be distinguished from a complex composed of polypeptides and other stuff, be that stuff encoded proteins, peptides, RNAs, etc or unencoded heme, biotin, etc.

Which still doesn't resolve the issue whether it's useful to distinguish purely protein complexes from protein + other stuff ones.

bmeldal commented 7 years ago

If we do this we would be unable to distinguish complexes consisting only of proteins (+ prosthetic groups & covalent modifications; see current protein complex def) from RNPs etc. Is everyone happy with that?

Tbh, we can't do this now either as many classes under macromolecular complex contain protein-only leaves but as they have sibling terms that do contain non-protein members the whole class has been re-classified. If we wanted to be true to any definition of protein-only complexes we'd have to sieve through all the leaves and add in the protein complex parent manually. That ain't gonna happen, is it?

deustp01 commented 7 years ago

If the distinction between protein-only and protein-mixed complexes were discarded ("Val's crazy suggestion" or something close to it) information would be lost but, I think, the problem in this thread would go away. So, who uses the information captured by this distinction? How would they be hurt by the loss?

bmeldal commented 7 years ago

That's is the crucial point, Peter. At the moment, the mixed parentage is definitively causing issues for the users. Should we send a message to GO-discuss and GO-friends and ask what would work better?

bmeldal commented 7 years ago

@ValWood I think we need to change the title so we can find the ticket again in the future as it has little to do with the actual 6-phosphofructokinase complex :(

ukemi commented 7 years ago

This request stemmed from the Noctua workshop. Maybe one practical solution for now is to have @kltm or @cmungall make macromolecular complexes valid as entities in the complex generator in Noctua. That way they can be chosen if they have protein components.

srengel commented 7 years ago

@ukemi 's suggestion would work for me :)

ValWood commented 7 years ago

In general, I think it is more harmful to have people retrieve "protein complex", and not get ribosome, spliceosome, telomerase, (the historical and current situation), than it is to retain the distinction between a protein complex and a macromolecular complex. I often need to tell people to go up to macromolecular complex, and I often forget myself and search on "protein complex" by mistake.

I vote to discard the distinction between protein-only complex and protein-mixed complex. Its a simplification that I'm sure would HELP users.

ukemi commented 7 years ago

What about changing the term name macromolecular complex to protein-containing complex? It fits the definition. Could we change the name of protein complex to make it more explicit that it only contains proteins?

ValWood commented 7 years ago

but do we really need the distinction? would users be hurt by not making this distinction? (I think not).

we could even still have protein-RNA complex and protein-DNA complex. So if a user really, really did want to exclude ribosomes, telomerase, spliceosomes, DNA polymerase, MCM complex which would be excluded from "protein complex" in the current scenario, they could take the "protein containing complex" annotations and subtract the "protein-DNA" and "protein-RNA" complex annotations....

ValWood commented 7 years ago

Although the DNA-protein complexes ( at least GO:0043599 nuclear DNA replication factor C complex) appear to have is_a links to protein complex and protein-DNA complex. So if that is valid, this would be an alternative solution...but it seems a bit wrong?

bmeldal commented 7 years ago

Val, I think these inconsistencies stem from the issues with TPVs: If a pre-terminal node has children that are a mix of protein-only complexes and protein-X complexes, the pre-terminal node belongs to protein-X complex but the terminal nodes may have a mix of both ancestries. That's what happened with the endonucleases :(

cmungall commented 7 years ago

I thought this was already the case. Can you check. If it's not, file a ticket in the Noctua tracker

On 9 Nov 2016, at 4:59, David Hill wrote:

This request stemmed from the Noctua workshop. Maybe one practical solution for now is to have @kltm or @cmungall make macromolecular complexes valid as entities in the complex generator in Noctua. That way they can be chosen if they have protein components.

You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/geneontology/go-ontology/issues/12782#issuecomment-259408928

kltm commented 7 years ago

@cmungall committed, but not deployed.

ukemi commented 7 years ago

Just to recap for Thursday's editor's call and the discussion above. What about changing the term name macromolecular complex to protein-containing complex and merging protein complex into it? It fits the definition and I think addresses the comment by @bmeldal . If that is the case, purely protein complexes would be annotated to the parent, for specific children such as protein-DNA complex, protein-lipid complex, protein DNA-RNA complex and protein carbohydrate complex.

ukemi commented 7 years ago

Doesn't a cell fit the definition of a macromolecular complex?

ValWood commented 7 years ago

Or anything really....organelle etc. Probably need to tighten that def!

bmeldal commented 7 years ago

You are opening another can of worms! Yes, where's the distinction between a large complex (e.g. proteosome, TF-Pol machinery etc), filaments/fibres (e.g. actin, microtubules), membranes, organelles and whole cells! At the moment, for curation's sake, we haven't curated filaments but their minimal-repeating units, such as collagen trimers...

dosumis commented 7 years ago

For filaments/fibres (e.g. actin, microtubules) we have:

We have: supramolecular complex: A cellular component that consists of an indeterminate number of proteins or macromolecular complexes, organized into a regular, higher-order structure such as a polymer, sheet, network or a fiber.

image

(May be a few things left to move under here)

macromolecular complex should have the clause "fixed/determinate number of...." to distinguish.

This still means that large complexes with determinate numbers of components are covered (e.g. ribosome subunits).

ukemi commented 7 years ago

Just as a reminder, I plan to implement this soon:

https://github.com/geneontology/go-ontology/issues/12640

deustp01 commented 7 years ago

To organize Birgit's worms, would it be appropriate to make has_part relationships supramolecular complex X has_part the appropriate determinate minimal repeating unit?

dosumis commented 7 years ago

On 7 Dec 2016 2:40 pm, "deustp01" notifications@github.com wrote:

To organize Birgit's worms, would it be appropriate to make has_part relationships supramolecular complex X has_part the appropriate determinate minimal repeating unit?

Yes. See collagen terms.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

bmeldal commented 7 years ago

@ValWood Is GO:0099080 supramolecular complex not being part of the GO:0032991 macromolecular complex class a problem for your users? It contains (as children several levels down) GO:0005874 microtubule and GO:0005884 actin filament.

I agreed with this as those repeating units cannot be defined as clearly distinct complexes. But I think we need a comment on all three top-level terms to advise users that they may need to broaden their search.

@dosumis I can't find the collagen terms having this has_part relationship. I checked via teh Neighborhood tab in Amigo2.

ValWood commented 7 years ago

Not so much.

The potential for inconsistency with this extra term is a bit bothering.

Is a supercomplex a supracomplex? GO:0097249 mitochondrial respiratory chain supercomplex GO:0031617 NMS complex (KMN kinetochore network) GO:0035632 mitochondrial prohibitin complex

also GO:0044530 supraspliceosomal complex is not a supramolecular complex GO:0098643 banded collagen fibril is not a supramolecular complex

How do you decide if something is supramolecular? Do we need the term?

bmeldal commented 7 years ago

It was added to deal with polymers and fibrils etc. when @dosumis was working on the restructuring of complexes and the attempt to make it more automatic (some of which proved to complex (!) to achieve...) as they don't really fit the definition of a 'simple' complex. But I see the problem it raises.

Let the editors discuss this afternoon :)

thomaspd commented 7 years ago

We discussed on ontology call, and based on Val's Nov 8 suggestion, we propose the short-term fix to be making protein complex an exact synonym for macromolecular complex in the GO, and define macromolecular complex as containing at least one protein and at least two gene products. We can revisit later if we decide it would be useful to distinguish between protein-only complexes and those that contain other macromolecules.

dosumis commented 7 years ago

On Dec 8, 2016, at 9:33 AM, Val Wood notifications@github.com wrote:

Not so much.

The potential for inconsistency with this extra term is a bit bothering.

Is a supercomplex a supracomplex? GO:0097249 mitochondrial respiratory chain supercomplex GO:0031617 NMS complex (KMN kinetochore network) GO:0035632 mitochondrial prohibitin complex

also GO:0044530 supraspliceosomal complex is not a supramolecular complex GO:0098643 banded collagen fibril is not a supramolecular complex

How do you decide if something is supramolecular?

The definition hinges on having a indeteminate number of components.

Banded collagen fibril fits the bill and so should be classified under supramolecular complex (everything under complex of collagen trimers probably should). I’ll fix this.

Not sure the others do.

Do we need the term?

If not, aren’t we then back to everything in CC arguably being a 'macromolecular complex’. I think it is also useful to separate out these higher order structures from the kind of complexes that InTact curate. There is precedent too in the ECM field, where the term supramolecular is used.

I suspect almost any granularity type classification in CC will be a bit fuzzy. But does that mean we should have none? I think consistency will be OK as long as we have some guidelines and examples in place.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/geneontology/go-ontology/issues/12782#issuecomment-265695074, or mute the thread https://github.com/notifications/unsubscribe-auth/AAG4xwkWYCssqQ39XqK_ephRJa5038xkks5rF87pgaJpZM4Krvjj.

bmeldal commented 7 years ago

[...] we propose the short-term fix to be making protein complex an exact synonym for macromolecular complex [...]

What happens to the current term 'protein complex'? Will it be merged with 'macromolecular complex'?

ukemi commented 7 years ago

I think that is the most practical thing to do, don't you? That way we will have a class that we know contains at least a protein and then we will have classes that we know contain a protein and something else. Does it really help us a great deal to know if a complex is made up exclusively of proteins. I suspect the data and annotations for that would be very incomplete.

judyblake commented 7 years ago

I agree that it is useful to know if a complex is made up exclusively of proteins.

On Fri, Dec 9, 2016 at 5:49 AM, David Hill notifications@github.com wrote:

I think that is the most practical thing to do, don't you? That way we will have a class that we know contains at least a protein and then we will have classes that we know contain a protein and something else. Does it really help us a great deal to know if a complex is made up exclusively of proteins. I suspect the data and annotations for that would be very incomplete.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/geneontology/go-ontology/issues/12782#issuecomment-266006011, or mute the thread https://github.com/notifications/unsubscribe-auth/AFE114TGP7DP8iWO4RBPTGRFabiZQdtMks5rGU5CgaJpZM4Krvjj .

-- Judy

bmeldal commented 7 years ago

@judyblake The problem is that the line is blurry. Depending on where the user comes from they may include prosthetic groups or not in their definition. What about ATP or metal ions? We include them if they are functional co-factors (ATP in a complex that's not primarily an ATPase or metal ions in respiratory complexes) but not if they are effectively substrates (ATP in ATPases, metal ion in their transmembrane channels). What about a complex where the substrate is required for the complex formation (e.g. maltose transporter)?

deustp01 commented 7 years ago

Another can of worms. Paul Thomas's definition excludes small molecules as qualifying a physical entity for complex status: it must have two or more different macromolecules, at least one of which is a protein. My recollection is that GO has always tested entities for complex status based on their content of macromolecules only. But all of Birgit's points about the central role that small molecules play in forming and stabilizing complexes and in determining their molecular function are right, so it would be good biochemistry to dig into the small-molecule worm can and see if these could count as qualifying entities for defining complexes. However, practically, that looks like a separate issue - it would not be crazy to confine this thread to sorting out what combinations of macromolecules qualify an entity for the "complex" label (and what kind of complex label) and move this new issue - when can small molecules be used as qualifiers - to a separate thread.

bmeldal commented 7 years ago

worms worms worms - you know, I did my PhD in Nematology... ;-)

In the CP we define a complex as having at least 2 protein entities but they can be identical (homodimers). However, I can see us changing that when we dig deeper into RNA-complexes! But that's another can ;-)

Happy to move the small molecule can of worms to a new ticket :)

ValWood commented 7 years ago

Re > I agree that it is useful to know if a complex is made up exclusively of proteins..

I have never come across a use case. We gain much more by making "protien -containing complex" the parent and losing "protein only complex" than the old arrangement.

Even in the file of pombe protein complexes I include the DNA-protein complexes: The replicsome telomere cap complex DNA recombinase DNA replication preinitiation complex nucleosomes etc

The RNA protein complexes ribosome translation initiation complex RISC loading complex signal recognition particle spliceosome etc.

This file is used routinely by all of our Mass Spec people. Nobody has ever requested that I provide a file without these non-protein complexes.

However, I am frequently telling people that if they want ALL complexes with the existing structure , then they need to move up to "macromolecular complex"

We really don't have this information for many complexes, which may have an unknown RNA component....

I could live without "protein only complex"

judyblake commented 7 years ago

ok 'protein-containing' I yield....

On Fri, Dec 9, 2016 at 10:20 AM, Val Wood notifications@github.com wrote:

Re > I agree that it is useful to know if a complex is made up exclusively of proteins..

I have never come across a use case. We gain much more by making "protien -containing complex" the parent and losing "protein only complex" than the old arrangement.

Even in the file of pombe protein complexes I include the DNA-protein complexes: The replicsome telomere cap complex DNA recombinase DNA replication preinitiation complex nucleosomes etc

The RNA protein complexes ribosome translation initiation complex RISC loading complex signal recognition particle spliceosome etc.

This file is used routinely by all of our Mass Spec people. Nobody has ever requested that I provide a file without these non-protein complexes.

However, I am frequently telling people that if they want ALL complexes with the existing structure , then they need to move up to "macromolecular complex"

We really don't have this information for many complexes, which may have an unknown RNA component....

I could live without "protein only complex"

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/geneontology/go-ontology/issues/12782#issuecomment-266069941, or mute the thread https://github.com/notifications/unsubscribe-auth/AFE112-rE4etERauHVxPFLvHMi8H2eONks5rGY3IgaJpZM4Krvjj .

-- Judy

ValWood commented 7 years ago

Mainly it would be really difficult to do...time sink...

ukemi commented 7 years ago

The plan:

Is a protein-DNA-RNA complex a type of protein-DNA complex? The definition suggests this is true, since there is no indication that the protein-DNA complex exclusively contains protein and DNA. If this is the case, we should be able to record equivalence axioms for these.

ValWood commented 7 years ago

protein-DNA-RNA complex a type of protein-DNA complex?

what was it created for? ...no annoatations

bmeldal commented 7 years ago

I second @ukemi 's plan.

I don't know if I've seen protein-DNA-RNA complex before. I won't miss it.