Closed ValWood closed 3 years ago
Some of them also also prokaryotic - for example the MutL MutS complexes I think are also found in bacteria.
@AndreaAuchincloss
would you please have a look at this list to see if you spot any bacterial complexes ?
Thanks, Pascale
@keseler if you also want to comment that'd be much appreciated !
MutL and MutS proteins exist in bacteria and archaea, and are similar enough to their eukaryotic counterparts that they share InterPro (etc) domains. In E.coli the complex is different from eukaryotes, there is a MutHLS complex annotated in EcoCyc (there are other subunits, looking at Paul Modrich's Nobel lecture UvrD and some exo- and endonucleases are also involved). MutH does not exist in eukaryotes. None of the 5 Mut terms Val proposes to group under "nuclear complex" would be appropriate for this bacterial complex anyway because they're too eukaryotic just by definition.
The day someone wants to annotate the bacterial mismatch repair complex in GO they could update GO:1990710 (MutS complex, it has E.coli in the comments), so grouping the above 5 MutL and MutS terms under nuclear complex doesn't pose a problem for me. You might want to alter the definitions so they are explicitly eukaryotic (although that should be evident from the ancestor chart).
I looked at the other Child terms of GO:0005634 (nucleus) and checked a few of them for bacterial annotation. None of them ring a bell as being bacterial; my knowledge is NOT encyclopedic, so I may have missed something.
In summary as far as I can tell this grouping is fine.
I suppose we'll want ER complex, mitochondria complex, chloroplast complex ?
This would be useful for curators. It's really easy to see cellular locations when drilling down because of all the complex terms.
Sorry this took me a while. I agree with Andrea, the complex terms in the list seem fine.
If you are going to create additional higher-level complex terms for the locations of the complexes, how about cytoplasmic complex, membrane complex, periplasmic complex, extracellular complex? These would be useful for prokaryotes.
We should think about this idea very carefully. I seem to recall that we used to try to classify complexes by location, and then when a complex moves around and can be in more than one place, we ended up with multiple terms for a complex depending on its localization. I'm just not sure that this info really belongs in the ontology.
I'm just not sure that this info really belongs in the ontology.
Isn't that the situation that now can be handled as an annotation in GO-CAM, so there's no longer a need for composed terms like nuclear_ribosomal subunit and also cytosolic_ribosomal subunit and ER-associated_ribosomal subunit?
I find the groupings quite useful for ontology browsing. Without some grouping, it is difficult to locate the buried non-complex terms buried in the mass of complexes. In addition, these complexes have the parent "nucleus" so based on this logic, we should move all complexes from under any location which would bump them all up to the root node, so I'm not sure this was ever the plan?
I thought the plan was that if a complex had multiple locations, then it would not get a location specific term, but if the only location of function for a complex was a particular compartment it could be added.
IIRC the exact problem was that we have "duplicate' terms for complexes at different locations. For example, we have "nuclear proteasome" and "cytosolic proteasome". The discussion I remember was to merge these into a single proteasome term with no location specific version (which I now see is the issue that @deustp01 refers to)
Also, we already have location specific groupings, for example: GO:0106083 nuclear membrane protein complex GO:0072546 ER membrane protein complex GO:0098799 outer mitochondrial membrane protein complex
It seems reasonable that if a complex has a single 'resident' location then an ancestor to that location fills in annotation gaps and helps annotation consistenccy? If these groupings are removed I will need to make sure all of the complexes also have their annotated location, because often times it is implicit from the complexposition in the heirarchy.
In summary, if these complex can't be housed under a "nuclear complex" grouping term, they shouldn't be under "nucelus" in the first place...
@krchristie do you remember when the discussion about merging nuclear and cytoplasmic versions of the same complex occurred? I know we have discussed a few times and agreed to implement (over the years). However, I cannot find any relevant GO ticket for this....
@vanaukenk maybe you remember this?
While I understand that it can be helpful when browsing to have things grouped under a given term, it became apparent to me when examining ChEBI roles that we have to think about these groupings from a couple different perspectives. So, if we place complexes that are only ever found in the nucleus under the term nucleus
, but we do not place terms that have multiple locations under that term, then the list of complexes under the term nucleus
is only a partial list. This is a problem for browsing if someone is thinking that the complex they are looking for is located in the nucleus without knowing that it is also found in another location as well.
More problematically, when someone does an enrichment analysis and looks at the term nucleus
, they might expect they would get everything that is present in the nucleus, but they will not get the gene products annotated to complexes that that are in the nucleus only some of the time and somewhere else at other times.
I am wondering if it would be best if we did not assign complexes to any cellular structures within the ontology.
do you remember when the discussion about merging nuclear and cytoplasmic versions of the same complex occurred? I know we have discussed a few times and agreed to implement (over the years). However, I cannot find any relevant GO ticket for this....
@vanaukenk maybe you remember this?
I think this was quite a long time ago, probably when I was at SGD so more than 8 years ago. I remember pretty clearly that discussion on the related issue of terms like these TFIIH core complex terms when it is present in either of the two different complex that the TFIIH core can be part of occurred when I was at SGD. Note that these specific subportion terms are only a fraction of the total annotations to a TFIIH core complex term, with the portion of NEF3 complex
term not used at all.
-- transcription factor TFIIH core complex (721 annotations, 27 exp) --- core TFIIH complex portion of holo TFIIH complex (17 annots, 3 exp) --- core TFIIH complex portion of NEF3 complex (0 annotations)
@ValWood - when you want to tag me, the appropriate handle is @krchristie. You tagged a different K Christie above.
So, if we place complexes that are only ever found in the nucleus under the term nucleus, but we do not place terms that have multiple locations under that term, then the list of complexes under the term nucleus is only a partial list.
but this is true for every term in every ontology?
This is a problem for browsing if someone is thinking that the complex they are looking for is located in the nucleus without knowing that it is also found in another location as well.
I see your point, but If you were looking for a specific complex you would find it by searching. This could be mitigated byt fdefining location specific grouping terms as "A complex for which the only location of action is the blah" (or similar)
More problematically, when someone does an enrichment analysis and looks at the term nucleus, they might expect they would get everything that is present in the nucleus, but they will not get the gene products annotated to complexes that are in the nucleus only some of the time and somewhere else at other times.
but here the onus is on the curator to annotate the correct location (which usually comes from different experiments anyway). You can't depend on a complex to provide the location, but it is a nice 'backstop' if the location annotation is ommitted. If we removed the 'location specific complexes' I'm sure we would lost a lot of valuable annotation.
I am wondering if it would be best if we did not assign complexes to any cellular structures within the ontology.
Pascale raised this option uesterday and it seems to be something under discussion (i.e putting complexes in their own ontology branch, which is essentially what you are suggesting).
I would support this change but there would be a lot of "gap filling annotation" required. The other main problem I envisage is whether all complexes (kinetochore, ribosome, spliceosome?) would be included
I remember the related discussion about TFIIH too, but I also specifically remember a discussion about multi-location versions of the same complex
Whilst looking for tickets with the label "GOC_meeting" I stumbled across the ticket about merging nuclear and cytoplasmic versions of the same complex!!! It was opened in Nov 2016
I will try to explain the problem for curators. You wna to see if your specific nuclear location has already been described. However, you don't know the terminology that might be used so you looks at the descendants of nucleus. This is what you see. It would. be really nice if you could just see the locations without needing to browse the complexes.
It did not seem so controversial here to add a complex grouping term becasue a) they are already under nucleus and under complex b) these location specific grouping terms exist elsewhere.
GO:0002111 BRCA2-BRAF35 complex | part_of |
---|---|
GO:0031601 nuclear proteasome core complex | part_of |
GO:0043073 germ cell nucleus | is_a |
GO:0031519 PcG protein complex | part_of |
GO:0070532 BRCA1-B complex | part_of |
GO:0034981 FHL3-CREB complex | part_of |
GO:0071664 catenin-TCF7L2 complex | part_of |
GO:0097572 right nucleus | is_a |
GO:0000790 nuclear chromatin | part_of |
GO:0031613 nuclear proteasome regulatory particle, lid subcomplex | part_of |
GO:0000794 condensed nuclear chromosome | part_of |
GO:0071204 histone pre-mRNA 3'end processing complex | part_of |
GO:0005958 DNA-dependent protein kinase-DNA ligase 4 complex | part_of |
GO:1990590 ATF1-ATF4 transcription factor complex | part_of |
GO:0000798 nuclear cohesin complex | part_of |
GO:0031380 nuclear RNA-directed RNA polymerase complex | part_of |
GO:0000214 tRNA-intron endonuclease complex | part_of |
GO:0033063 Rad51B-Rad51C-Rad51D-XRCC2 complex | part_of |
GO:0043599 nuclear DNA replication factor C complex | part_of |
GO:0031039 macronucleus | is_a |
GO:1990477 NURS complex | part_of |
GO:0000109 nucleotide-excision repair complex | part_of |
GO:0034692 E.F.G complex | part_of |
GO:0071144 heteromeric SMAD protein complex | part_of |
GO:0035145 exon-exon junction complex | part_of |
GO:0070313 RGS6-DNMT1-DMAP1 complex | part_of |
GO:0031533 mRNA cap methyltransferase complex | part_of |
GO:0031598 nuclear proteasome regulatory particle | part_of |
GO:0031510 SUMO activating enzyme complex | part_of |
GO:0098537 lobed nucleus | is_a |
GO:0034978 PDX1-PBX1b-MRG1 complex | part_of |
GO:0062128 MutSgamma complex | part_of |
GO:0033597 mitotic checkpoint complex | part_of |
GO:0005635 nuclear envelope | part_of |
GO:0000228 nuclear chromosome | part_of |
GO:0043564 Ku70:Ku80 complex | part_of |
GO:0110092 nucleus leading edge | part_of |
GO:0070516 CAK-ERCC2 complex | part_of |
GO:0070767 BRCA1-Rad51 complex | part_of |
GO:0070418 DNA-dependent protein kinase complex | part_of |
GO:0070354 GATA2-TAL1-TCF3-Lmo2 complex | part_of |
GO:0005677 chromatin silencing complex | part_of |
GO:0043076 megasporocyte nucleus | is_a |
GO:0070531 BRCA1-A complex | part_of |
GO:0034980 FHL2-CREB complex | part_of |
GO:1990513 CLOCK-BMAL transcription complex | part_of |
GO:0097571 left nucleus | is_a |
GO:0033620 Mei2 nuclear dot complex | part_of |
GO:0030870 Mre11 complex | part_of |
GO:0032116 SMC loading complex | part_of |
GO:0031618 nuclear pericentric heterochromatin | part_of |
GO:0031981 nuclear lumen | part_of |
GO:0048353 primary endosperm nucleus | is_a |
GO:0035059 RCAF complex | part_of |
GO:0000943 retrotransposon nucleocapsid | part_of |
GO:0033064 XRCC2-RAD51D complex | part_of |
GO:0030689 Noc complex | part_of |
GO:1990453 nucleosome disassembly/reassembly complex | part_of |
GO:0051457 maintenance of protein location in nucleus | occurs_in |
GO:1990378 upstream stimulatory factor complex | part_of |
GO:0005666 RNA polymerase III complex | part_of |
GO:0005681 spliceosomal complex | part_of |
GO:0000346 transcription export complex | part_of |
GO:0000418 RNA polymerase IV complex | part_of |
GO:0071027 nuclear RNA surveillance | occurs_in |
GO:0031595 nuclear proteasome complex | part_of |
GO:0043601 nuclear replisome | part_of |
GO:0000176 nuclear exosome (RNase complex) | part_of |
GO:0070876 SOSS complex | part_of |
GO:0046818 dense nuclear body | part_of |
GO:0005697 telomerase holoenzyme complex | part_of |
GO:0030532 small nuclear ribonucleoprotein complex | part_of |
GO:0070353 GATA1-TAL1-TCF3-Lmo2 complex | part_of |
GO:0033203 DNA helicase A complex | part_of |
GO:0089701 U2AF complex | part_of |
GO:0070847 core mediator complex | part_of |
GO:0070557 PCNA-p21 complex | part_of |
GO:0034064 Tor2-Mei2-Ste11 complex | part_of |
GO:0033260 nuclear DNA replication | occurs_in |
GO:0042405 nuclear inclusion body | part_of |
GO:1990512 Cry-Per complex | part_of |
GO:0032301 MutSalpha complex | part_of |
GO:0110093 nucleus lagging edge | part_of |
GO:0048189 Lid2 complex | part_of |
GO:0031040 micronucleus | is_a |
GO:0070274 RES complex | part_of |
GO:0032389 MutLalpha complex | part_of |
GO:0034753 nuclear aryl hydrocarbon receptor complex | part_of |
GO:0140510 mitotic nuclear bridge | part_of |
GO:0070421 DNA ligase III-XRCC1 complex | part_of |
GO:0070467 RC-1 DNA recombination complex | part_of |
GO:0031610 nuclear proteasome regulatory particle, base subcomplex | part_of |
GO:0033065 Rad51C-XRCC3 complex | part_of |
GO:0046536 dosage compensation complex | part_of |
GO:1990433 CSL-Notch-Mastermind transcription factor complex | part_of |
GO:0033167 ARC complex | part_of |
GO:0032039 integrator complex | part_of |
GO:0005640 nuclear outer membrane | part_of |
GO:1990354 activated SUMO-E1 ligase complex | part_of |
GO:0043224 nuclear SCF ubiquitin ligase complex | part_of |
GO:0000818 nuclear MIS12/MIND complex | part_of |
GO:0032807 DNA ligase IV complex | part_of |
GO:1990589 ATF4-CREB1 transcription factor complex | part_of |
GO:0000347 THO complex | part_of |
GO:0000780 condensed nuclear chromosome, centromeric region | part_of |
GO:0000419 RNA polymerase V complex | part_of |
GO:0055029 nuclear DNA-directed RNA polymerase complex | part_of |
GO:0000152 nuclear ubiquitin ligase complex | part_of |
GO:0070877 microprocessor complex | part_of |
GO:0000784 nuclear chromosome, telomeric region | part_of |
GO:0031607 nuclear proteasome core complex, beta-subunit complex | part_of |
GO:0048555 generative cell nucleus | is_a |
GO:0000788 nuclear nucleosome | part_of |
GO:0070390 transcription export complex 2 | part_of |
GO:1905754 ascospore-type prospore nucleus | is_a |
GO:0019908 nuclear cyclin-dependent protein kinase holoenzyme complex | part_of |
GO:1902375 nuclear tRNA 3'-trailer cleavage, endonucleolytic | occurs_in |
GO:0005880 nuclear microtubule | part_of |
GO:0070310 ATR-ATRIP complex | part_of |
GO:1902377 nuclear rDNA heterochromatin | part_of |
GO:0045120 pronucleus | is_a |
GO:0070533 BRCA1-C complex | part_of |
GO:0071033 nuclear retention of pre-mRNA at the site of transcription | occurs_in |
GO:0030895 apolipoprotein B mRNA editing enzyme complex | part_of |
GO:0062119 LinE complex | part_of |
GO:0070693 P-TEFb-cap methyltransferase complex | part_of |
GO:0032302 MutSbeta complex | part_of |
GO:0033062 Rhp55-Rhp57 complex | part_of |
GO:0033066 Rad51B-Rad51C complex | part_of |
GO:0043596 nuclear replication fork | part_of |
GO:0046808 assemblon | part_of |
GO:0070552 BRISC complex | part_of |
GO:0032806 carboxy-terminal domain protein kinase complex | part_of |
GO:0097165 nuclear stress granule | part_of |
GO:0008180 COP9 signalosome | part_of |
GO:0032390 MutLbeta complex | part_of |
GO:0000439 transcription factor TFIIH core complex | part_of |
GO:0000783 nuclear telomere cap complex | part_of |
GO:0031604 nuclear proteasome core complex, alpha-subunit complex | part_of |
GO:0032545 CURI complex | part_of |
GO:0031499 TRAMP complex | part_of |
GO:0048556 microsporocyte nucleus | is_a |
if we place complexes that are only ever found in the nucleus under the term nucleus, but we do not place terms that have multiple locations under that term, then the list of complexes under the term nucleus is only a partial list. This is a problem for browsing if someone is thinking that the complex they are looking for is located in the nucleus without knowing that it is also found in another location as well.
This is what normally happens, for example for all DNA replication complexes. These cannot be placed under nucleus, so the onus is on the curator to make the appropriate nuclear annotation
I will try to explain the problem for curators. You wna to see if your specific nuclear location has already been described. However, you don't know the terminology that might be used so you looks at the descendants of nucleus. This is what you see. It would. be really nice if you could just see the locations without needing to browse the complexes.
@ValWood - I totally get this problem. However, I think you also understand the ontological reasons why trying to code locations of complexes in the ontology is not a solution because of the issue that happens for complexes like the DNA replication complexes you mentioned that can not be placed under the term nucleus
because not all DNA replication complexes are present in a nucleus. Then, when we try to solve that issue by creating terms for things like nuclear DNA replication complex
and cytoplasmic DNA replication complex
, curators rarely use these terms because they get to the phrase that matches what they see in the paper, i.e. DNA replication complex
and never even notice the more granular terms that have locations coded into them.
Personally, I think that we have already seen that trying to make these location coded complex terms is not an effective solution to the problem of helping curators find a complex by location. The constraints of making the ontology always true really make it problematic to use as a browsing tool. I think we need to come up with a different solution to help curators find this kind of information. I think that the Protein2GO suggestions of co-annotations might be a good direction to think about. It would be cool if curators/users could easily browse/search for complex terms cross-referenced to existing annotations, e.g. complexes known to be found in the nucleus. We probably also need to change the paradigm so that instead of trying to create one term that does it all, we will use a set of terms. GO-CAMs allow us to do this.
I don't think we should have location specific complexes though? But if a complex is in the ontology under as location it should be under a location that is always true. This is a slightly ortholognal discussion to this request which is to group the existing terms (there is no change in any meaning by doing this).
If complexes remain in the location branch they should be in the correct place. An alternative is to move the complex terms outside of the CC aspect (i.e separate complexes and locations).
I don't think we should have location specific complexes though
Isn't this the point (as Karen said two comments up) - the issue should be handled by annotation complex (GO:cell_component) is part_of GO: cell_component, not by creation of new compound ontology terms?
And to get fussy, will those hypothetical always-in-the nucleus terms behave properly for complexes that occur in taxa with open mitosis?
I am confused. We are trying to remove the compound ontology terms.
The new ticket is here: https://github.com/geneontology/go-ontology/issues/20000
replacing the ticket that was opened in 2016 https://github.com/geneontology/go-ontology/issues/12833
but the existence of "location specific complex terms" is a different issue from the suggestion in this ticket to group the complexes that are already under nucleeus- and still belong here (if they don't belong here the nuclear parentage should be removed).
An alternative suggestion is to remove all of the complexes from out of CC into their own GO aspect. This is unlikely to happen immediately.
Whatever happens in the long term it would be useful not to see a long list of complexes directly under nucleus, and the timescales for the other issues will likely be much longer.
Note that all of these complexes are already under nucleus, so grouping them only follows patterns used elsewhere.
Merging complexes that are not always nuclear (see https://github.com/geneontology/go-ontology/issues/20000) will result in their removal from under nucleus. This should not be surprising to curators. If a protein complex has multiple locations of action it cannot exist in the ontology as a descendant of one of those locations.
This issue is about something else...
Would it be possible to group all of the complex terms directly under "nucleus" under a grouping term "nuclear complex" (there is already a similar grouping term for "nuclear membrane complex"). I ask because there are so many complexes here that is is difficult to locate the nuclear parts hidden among them, and our users are not finding them......
This is only a few: