geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
220 stars 40 forks source link

exosome complexes, their relationships and functions #12574

Closed bmeldal closed 8 years ago

bmeldal commented 8 years ago

There are three eukaryotic exosomes, the cytoplasmic, nuclear and nucleolar exosomes.

GO currently has the following terms: GO:0000178 exosome (RNase complex) Def: Complex of 3'-5' exoribonucleases. with 2 children: GO:0000177 cytoplasmic exosome (RNase complex) Def: Complex of 3'-5' exoribonucleases found in the cytoplasm. and GO:0000176 nuclear exosome (RNase complex) Def: Complex of 3'-5' exoribonucleases found in the nucleus. also: GO:1902555 endoribonuclease complex Def: A protein complex which is capable of endoribonuclease activity. is_a GO:0032991 macromolecular complex capable_of GO:0004521 endoribonuclease activity

but that's not the whole story! Read on, the changes are smaller than the ticket, I just want to give you all the background info!

All three complexes have the same core nonamer complex (Exo-9) which is then complimented with one or two of three enzymes to give the holocomplex its catalytic activities: DIS3/RRP44 has endoribonuclease & processive hydrolytic exoribonuclease activity (GO:0016891 / EC:3.1.26.-, GO:0000175 / EC:3.1.13.-) DIS3L has only processive hydrolytic exoribonuclease activity (EC:3.1.13) (this isoform is not present in yeast) EXOSC10/RRP6 has only distributive hydrolytic exoribonuclease activity (EC:3.1.13) [NB: the human Exo-9 has weak constitutional phosphorolytic activity as well but it might not be physiological. Phosphorolytic activity is, however, commonly found in the prokaryotic exosome]

The composition of the catalytically active complexes is as follows: a) 10-subunit cytoplasmic exosome: Core + Q8TF46 (DI3L1_HUMAN) = DIS3L = DIS3L1 (restricted to cytoplasm, PMID:26726035, Uniprot has refs for nuclear/nucleolar location but may not be physiological)

[May contain small fraction of Exo-11 (composition below) with either DIS3 (Q9Y2L1) or DIS3L but DIS3 is primarily nuclear and is missing from nucleolus, PMID:20531386]

b) 11-subunit nuclear exosome: Core + Q9Y2L1 (RRP44_HUMAN) = DIS3 (primarily nuclear isoform, PMID:20531386) Q01780 (EXOSX_HUMAN) = EXOSC10 = RRP6 (missing in small fraction of nuclear exosomes, PMID:26726035)

c) 10-subunit nucleolar exosome: Core + Q01780 (EXOSX_HUMAN) = EXOSC10 = RRP6 (plus DIS3 (Q08162) in yeast, PMID:26726035)

I propose the following changes: 1) GO:0000178 exosome (RNase complex) -Def: Complex of 3'-5' exoribonucleases. +Def: A ribonuclease complex that has 3-prime to 5-prime exoribonuclease activity and possibly endoribonuclease activity, producing 5-prime-phosphomonoesters. Participates in a multitude of cellular RNA processing and degradation events preventing nuclear export and/or translation of aberrant RNAs. Restricted to processing linear and circular single-stranded RNAs (ssRNA) only. RNAs with complex secondary structures may have to be unwound or pre-processed by co-factors prior to entering the complex, esp if the 3-prime end is structured. References: PMID:17174896, PMID:20531386, PMID:26726035 +is_a GO:1905354 (NEW in TG) exoribonuclease complex +capable_of GO:0000175 3'-5'-exoribonuclease activity

2) GO:0000177 cytoplasmic exosome (RNase complex) -Def: Complex of 3'-5' exoribonucleases found in the cytoplasm. +Def: A ribonuclease complex that has 3-prime to 5-prime processive hydrolytic exoribonuclease activity producing 5-prime-phosphomonoesters. Participates in a multitude of cellular RNA processing and degradation events preventing nuclear export and/or translation of aberrant RNAs. Restricted to processing linear and circular single-stranded RNAs (ssRNA) only. RNAs with complex secondary structures may have to be unwound or pre-processed by co-factors prior to entering the complex, esp if the 3-prime end is structured. References: PMID:17174896, PMID:20531386, PMID:26726035 is_a GO:0000178 exosome (RNase complex) [currently there] +is_a GO:1905354 (NEW in TG) exoribonuclease complex (should get relationship via parent though) +capable_of GO:0000175 3'-5'-exoribonuclease activity (should get relationship via parent though)

3) GO:0000176 nuclear exosome (RNase complex) -Def: Complex of 3'-5' exoribonucleases found in the nucleus. +Def: A ribonuclease complex that has 3-prime to 5-prime processive and distributive hydrolytic exoribonuclease activity and endoribonuclease activity, producing 5-prime-phosphomonoesters. Participates in a multitude of cellular RNA processing and degradation events preventing nuclear export and/or translation of aberrant RNAs. Restricted to processing linear and circular single-stranded RNAs (ssRNA) only. RNAs with complex secondary structures may have to be unwound or pre-processed by co-factors prior to entering the complex, esp if the 3-prime end is structured. References: PMID:17174896, PMID:20531386, PMID:26726035 is_a GO:0000178 exosome (RNase complex) [currently there] +is_a GO:1905354 (NEW in TG) exoribonuclease complex (should get relationship via parent though) +is_a GO:1902555 endoribonuclease complex +capable_of GO:0000175 3'-5'-exoribonuclease activity (should get relationship via parent though) +capable_of GO:0016891 endoribonuclease activity, producing 5'-phosphomonoesters

4) GO:NEW nucleolar exosome (RNase complex) Def: A ribonuclease complex that has 3-prime to 5-prime distributive hydrolytic exoribonuclease activity and in some taxa (e.g. yeast) endoribonuclease activity, producing 5-prime-phosphomonoesters. Participates in a multitude of cellular RNA processing and degradation events preventing nuclear export and/or translation of aberrant RNAs. Restricted to processing linear and circular single-stranded RNAs (ssRNA) only. RNAs with complex secondary structures may have to be unwound or pre-processed by co-factors prior to entering the complex, esp if the 3-prime end is structured. References: PMID:17174896, PMID:20531386, PMID:26726035 +is_a GO:0000178 exosome (RNase complex) +is_a GO:1905354 (NEW in TG) exoribonuclease complex (should get relationship via parent though) +capable_of GO:0000175 3'-5'-exoribonuclease activity (should get relationship via parent though)

Relevant references (incl reviews): 26726035 - Review: structure, function 23352926 - Review: structure, function (evolutionary aspects) 23910895 - Review: more emphasis on RNA threading 17174896 - crystal structure of human Exo-9 (2nn6), human & yeast function 20531386 - DIS3(L) isoforms: complex evidence by CoIP+MS, cellular locations by cell fractionation + colocolisation, substrate selectivity

Yeast crystals: 23376952 (Makino et al, 2013) - 4ifd - Exo-11 + RNA 25043052 (Wasmuth et al, 2014) - 4oo1 - Exo-10 (RRP6) + RNA 27345150 (Kowalinski et al, 2016) - 5jea - Exo-10 (RRP44) + SKI7 (co-factor) + RNA by xray 27174052 (Lui et al, 2016) - 5g06 (EM-3366) - Exo-10 (RRP44) + SKI7 (co-factor) by EM

Thanks, Birgit

bmeldal commented 8 years ago

The following relationships could also be added to all complexes:

GO:0003723 RNA binding GO:0006396 RNA processing GO:0006401 RNA catabolic process

bmeldal commented 8 years ago

Side issue: GO:1902555 endoribonuclease complex is_a GO:0032991 macromolecular complex

but GO:0000178 exosome (RNase complex) is_a GO:0043234 protein complex

I'd be happy with both being is_a GO:0043234 protein complex which will probably be necessary for one of the edits requested above.

mcourtot commented 8 years ago

Re side note: the definition of endoribonuclease complex reads "A protein complex which is capable of endoribonuclease activity. " so I don't think there is any controversy asserting it under protein complex. I approved the TG request for the new exoribonuclease complex term.

mcourtot commented 8 years ago

I added the RNA binding, processing and catabolic process to the parent exosome (RNase complex) which would propagate to all complexes - let me know if this is not what you intended.

bmeldal commented 8 years ago

Sounds fine so far. Now to the nitty-gritty bit ;-)

bmeldal commented 8 years ago

is_a GO:1905354 (NEW in TG) exoribonuclease complex and capable_of GO:0000175 3'-5'-exoribonuclease activity can also be simply placed on the parent GO:0000178 exosome (RNase complex) as it's true for all children.

Then we just have the small, individual edits left plus the new CC term that I can't create myself at the moment.

mcourtot commented 8 years ago

Yes, that is how I added it (based on your suggestion above) I was checking as you were explicit for those but not the RNA terms :)

Changes made as described above, and created [Term] +id: GO:0101019 +name: nucleolar exosome (RNase complex) +namespace: cellular_component +def: "A ribonuclease complex that has 3-prime to 5-prime distributive hydrolytic exoribonuclease activity and in some taxa (e.g. yeast) endoribonuclease activity, producing 5-prime-phosphomonoesters. Participates in a multitude of cellular RNA processing and degradation events preventing nuclear export and/or translation of aberrant RNAs. Restricted to processing linear and circular single-stranded RNAs (ssRNA) only. RNAs with complex secondary structures may have to be unwound or pre-processed by co-factors prior to entering the complex, esp if the 3-prime end is structured." [PMID:17174896, PMID:20531386, PMID:26726035] +is_a: GO:0000178 ! exosome (RNase complex) +created_by: bhm

Thank you so much for the very descriptive, very complete ticket :thumbsup: I think this has to be the best written one I have ever seen! Thanks for taking the time of introducing the issue and describing required changes/addition.

bmeldal commented 8 years ago

You are welcome! It was a copy/paste job from my google doc that serves as template for the complex annotations in CP. I always do that for the big ones as I never curate them in one go!

bmeldal commented 8 years ago

You know, we completely forgot the part_of relationships to cytosol, nucleus and nucleolus, resp! I I'm just doing it now in the CP...

mcourtot commented 8 years ago

[Term] id: GO:0000176 name: nuclear exosome (RNase complex) intersection_of: GO:0000178 ! exosome (RNase complex) intersection_of: part_of GO:0005634 ! nucleus relationship: part_of GO:0031981 ! nuclear lumen

[Term] id: GO:0000177 name: cytoplasmic exosome (RNase complex) intersection_of: GO:0000178 ! exosome (RNase complex) intersection_of: part_of GO:0005737 ! cytoplasm

I can add the nucleolus: [Term] id: GO:0101019 name: nucleolar exosome (RNase complex) intersection_of: GO:0000178 ! exosome (RNase complex) intersection_of: part_of GO:0005730 ! nucleolus

bmeldal commented 8 years ago

Yes, please add nucleolus to GO:0101019 nucleolar exosome

mcourtot commented 8 years ago

All done :)

ValWood commented 8 years ago

Note: not all GO:1902555 endoribonuclease complex are protein complexes, some are ribonucleoprotein complexes

For example ribonuclease P complex contains the ncRNA RNase P K-RNA

Historically protein complexes have contained only proteins...

bmeldal commented 8 years ago

This is a long-standing issue. I did open a ticket some time ago about the sometimes arbitrary placement of complexes under 'protein complex' or 'macromolecular complex'. I can't find my ticket but this one from Janos highlights it as well: #11287

As we are adding more complexes that have non-protein members we'll have to make a decision as to how we are retaining the large subclasses under macromolecular complexes. @dosumis has some ideas but I don't think he's had time to work further on the templates suggested before the GOC mtg in Geneva.

ValWood commented 8 years ago

Hi Birgit, AFAIK, it would be under protein, or ribonuclear complex if it is only one or the other.

If it could be either/or (like endoribonuclease complex), it has to move up to 'macromolecular'.

This might be what you are seeing, it probably looks arbitrary, but it might not be...

Val

bmeldal commented 8 years ago

Forgot about this, too early when I replied: http://wiki.geneontology.org/index.php/Guidelines_on_%27protein_complex%27_terms @paolaroncaglia and I re-defined the protein complex term to allow non-protein members of the complex: Def: A stable macromolecular complex composed (only) of two or more polypeptide subunits along with any covalently attached molecules (such as lipid anchors or oligosaccharide) or non-protein prosthetic groups (such as nucleotides or metal ions). Prosthetic group in this context refers to a tightly bound cofactor. The component polypeptide subunits may be identical. https://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0043234 We could still have RNP complexes classified under both terms, protein complex and GO:1990904 ribonucleoprotein complex.

I had similar problems with protein complex vs GO:0032993 protein-DNA complex.

We were then going to classify anything under protein complex that isn't an assembly but I don't think this has happened yet.

A new term, GO:0099512 supramolecular fiber was created to class the non-defined larger structures, see above ticket from Janos. https://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0099512

I don't think all assemblages have been moved to GO:0099512 supramolecular fiber yet, though; e.g. GO:0098644 complex of collagen trimers is still a protein complex. @dosumis, is this the same term you were suggesting as 'supramolecular assemblages?

Apologies for making things more complex (pun intended)!

Birgit

mcourtot commented 8 years ago

Is there a consensus on what should be done here?

mcourtot commented 8 years ago

@bmeldal @ValWood is there anything remaining to be done on this ticket (and if yes what) or can it be closed?

ValWood commented 8 years ago

I don't know, the parentage depends on the answer to the question, and the current policy for protein complex/ ribonucleoprotein complex...

bmeldal commented 8 years ago

Policy according to Paola and my re-definition would be to put it all under 'protein complex' but it's really up to the editors to make the final decision as we'll have to deal with the legacy at the same time!

mcourtot commented 8 years ago

I talked to @paolaroncaglia and @dosumis just now, and they said they didn't think they should go under protein complexes, but rather under macromolecular complex (or ribonucleoprotein complex), as they didn't have only a few nucleotides but rather whole chains.

bmeldal commented 8 years ago

I think, then it should be is_a ribonucleoprotein complex.

paolaroncaglia commented 8 years ago

I haven’t read the full thread :-) but wrt parentage of endoribonuclease complex a bit of archaeology shows that the term was added via TG template, so both defined and placed as a protein complex then someone changed the parentage to macromolecular complex recently but I don’t know who or why, can’t find it on GH http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:1902555#term=history but it might have been @tberardini , based on the svn log:

r34074 | tberardi | 2016-06-22 01:24:20 +0100 (Wed, 22 Jun 2016) | 1 line adding in parents for TGFF generated protein complex terms

@tberardini , did you do that on purpose for ‘endoribonuclease complex’, possibly based on personal communications that I couldn’t track? This would be in agreement with Melanie’s latest comment.

tberardini commented 8 years ago

I think I probably added that parent after it was 'lost' upon TG commit.

mcourtot commented 8 years ago

@ValWood , @bmeldal : do we have consensus for moving them under ribonucleoprotein complex?

dosumis commented 8 years ago

The individual complexes should be under 'ribonucleoprotein complex' (as long as they have both protein and RNA components). If (and only if) you want to assert that all complexes with endoribonuclease activity (known and yet to be discovered) contain RNA should you have 'ribonucleaoprotein complex as the genus of "endoribonuclease complex" . Otherwise, use 'macromolecular complex'.

Hth, David

dosumis commented 8 years ago

Note - one of the subclasses - Ire1, is classified under 'protein complex'

ValWood commented 8 years ago

I have a bigger issue......many people use the "protein complex" term, and would expect that to retrieve complexes like the ribosome and the spliceosome and telomerase (I suspect)

Is it possible to define a protein complex as a complex which has only proteins, or protein and RNA components?

so protein complex --ribonucleoprotein complex

Would that be crazy? then everything can go under protein complex, unless we know that it has an RNA component, then it moves down...

This way people will retrieve all protein complexes with the protein complex term.

Similarly protein-DNA complex (telosome), which is currently not retrieved by a "protein complex" search. I doubt there are any biologists who would not describe the telosome as a 'protein complex' https://en.wikipedia.org/wiki/Shelterin

but you would not currently retrieve it with a protein complex search...

mcourtot commented 8 years ago

Hi Val,

For the current ticket I will move things under macromolecular complex (I think the misclassification David mentions above is actually my mistake)

Would you mind creating a new ticket for protein complex in general? It would be easier to keep the discussion clean (of course, feel free to reference this one)

Thanks, Melanie

bmeldal commented 8 years ago

This is more complicated!

endonucleases

1) GO:1902555 endoribonuclease complex should have is_a GO:1905348 endonuclease complex The related activity terms have the right relationship. BUT, see pt 2:

2) GO:1902555 endoribonuclease complex is_a GO:0032991 macromolecular complex but GO:1905348 endonuclease complex is_a GO:0043234 protein complex --> So, pt 1 is not possible as GO:0032991 macromolecular complex is NOT GO:0043234 protein complex! (It's the other way round)

3) Children: GO:1990332 Ire1 complex GO:0000214 tRNA-intron endonuclease complex GO:1903095 ribonuclease III complex

all have relationships to both parents: is_a GO:1905348 endonuclease complex AND is_a GO:1902555 endoribonuclease complex

BUT GO:0070578 RISC-loading complex AND GO:0030677 ribonuclease P complex have relationships is_a GO:1902555 endoribonuclease complex AND is_a GO:0032991 macromolecular complex is_a GO:0030529 intracellular ribonucleoprotein complex is_a GO:1990904 ribonucleoprotein complex complex is_a GO:0032991 macromolecular complex

GO:0030677 ribonuclease P complex is the only one that has the RNA in the defs of its children (contains one RNA and one protein molecule)

So, pt 1 is not possible because of GO:0030677 ribonuclease P complex!

But what constitutes RNA being part_of the complex and where is it 'just' RNA binding??? GO:0070578 RISC-loading complex could just as well be a 'protein complex'. It's a grey zone and makes this complicated every.single.time :(

Birgit

bmeldal commented 8 years ago

@ValWood that was exactly what I was getting at in the past and I think we have discussed that before. The new def for protein complex full-fills your needs but the ontology has not yet followed suit.

Tag me when you have created the new ticket, or shall I do it?

mcourtot commented 8 years ago

In order, and slowly :) Point 1 and 2: If an endoribonuclease complex is an endonuclease complex, then the classification of endonuclease complex needs to be updated to is_a macromolecular complex., as we have agreed that endoribonuclease complexes are not protein complexes.

Does that make sense?

ValWood commented 8 years ago

OK I didn't know if the suggestion made sense (it does to me). I can open a ticket (probably tomorrow).

@mcourtot this, if implemented would change "as we have agreed that endoribonuclease complexes are not protein complexes"

bmeldal commented 8 years ago

OK, for now we'll have to move this entire branch under macromolecular complex.

And then sort out the more general issue in the new ticket.

dosumis commented 8 years ago

OK, for now we'll have to move this entire branch under macromolecular complex. And then sort out the more general issue in the new ticket.

+1

mcourtot commented 8 years ago

Endoribonucleases are now under macromolecular complex.