geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
223 stars 40 forks source link

transcriptional preinitiation complex #15870

Closed ValWood closed 1 year ago

ValWood commented 6 years ago

transcriptional preinitiation complex A protein-DNA complex composed of proteins binding promoter DNA to form the transcriptional preinitiation complex (PIC), the formation of which is a prerequisite for transcription. PMID:22751016

has no children and only 704 annotations in total (10 experimental SGD and CAFA)

SGD annotations are SUA7 (TFIIB), TFA1 (TFIIE), TFA2 (TFIIE), SSL2 (TFIIH)

There must be an issue with the definition/placement (should it have children? should it exist?)

pgaudet commented 6 years ago

~I wonder if it's the same as~

@krchristie

Thanks, Pascale

@colinlog : No, it's not the same

ukemi commented 6 years ago

10556

bmeldal commented 6 years ago

Without looking into the details, it sounds like one of those situations where the core FUNCTION is conserved but dep on taxon, tissue, cellular circumstances etc the composition is variable. We have curated the yeast complex: https://www.ebi.ac.uk/complexportal/complex/CPX-2662 and annotated to GO:0005665 DNA-directed RNA polymerase II, core complex https://www.ebi.ac.uk/QuickGO/term/GO:0005665

while a lot of the general TFs and core mediator (in several species) are annotated to: GO:0016591 DNA-directed RNA polymerase II, holoenzyme https://www.ebi.ac.uk/QuickGO/term/GO:0016591

We have no annotations to GO:0097550 transcriptional preinitiation complex https://www.ebi.ac.uk/QuickGO/term/GO:0097550

The yeast RNAPII refers to the PIC in the description

During a transcription cycle, Pol II, general transcription factors and the mediator complex (CPX-3226) assemble as the preinitiation complex (PIC) at the promoter.

but we didn't curate the PIC itself because it is difficult to define as RNAPII, general TFs and core mediator come and go during transcription initiation.

ValWood commented 6 years ago

There are only 10 EXP annotations total to GO:0097550 transcriptional preinitiation complex 5 CAFA 4 SGD 1 uniprot. It seems sensible to get rid of this ....

pgaudet commented 6 years ago

OK

bmeldal commented 6 years ago

while a lot of the general TFs and core mediator (in several species) are annotated to: GO:0016591 DNA-directed RNA polymerase II, holoenzyme https://www.ebi.ac.uk/QuickGO/term/GO:0016591

Sorry, they are annotated to the specific CHILDREN of this term! e.g. TFIIIB complex, core mediator complex...

pgaudet commented 6 years ago

According to wikipedia: https://en.wikipedia.org/wiki/Transcription_preinitiation_complex is like the holoenzyme (GO:0016591)? The definition is "A nuclear DNA-directed RNA polymerase complex containing an RNA polymerase II core enzyme as well as additional proteins and transcription factor complexes, that are capable of promoter recognition and transcription initiation from an RNA polymerase II promoter in vivo. These additional components may include general transcription factor complexes TFIIA, TFIID, TFIIE, TFIIF, or TFIIH, as well as Mediator, SWI/SNF, GCN5, or SRBs and confer the ability to recognize promoters."

bmeldal commented 6 years ago

That does sound like the holoenzyme and PIC are regarded as the same thing. In which case it's a grouping term for the specific TFs, Pols and mediator etc.

pgaudet commented 6 years ago

OK so I will merge rather than obsolete.

@ValWood @krchristie OK ?

ValWood commented 6 years ago

Thant sounds sensible if Karen agrees.

pgaudet commented 6 years ago

Another question: The term has subclasses:

'protein-containing complex' and ('capable of' some 'DNA-directed 5'-3' RNA polymerase activity')

however the definition really talks about 'transcription initiation', so I think these relations should go.

(Edited to add): Based on the definitions, the 'DNA-directed RNA polymerase II, core complex' should have the 'capable of' some 'DNA-directed 5'-3' RNA polymerase activity' ("... Although the core is competent to mediate ribonucleic acid synthesis, it requires additional factors to select the appropriate template.")

@ValWood @krchristie what do you think ?

pgaudet commented 6 years ago

A second question: DNA-directed RNA polymerase II, holoenzyme is a subclass of 'nucleoplasm'; that also seems wrong, given that its a complex with the DNA.

Should it be part of chromatin ?

Thanks, Pascale

krchristie commented 6 years ago

"DNA-directed RNA polymerase II, holoenzyme" is NOT necessarily bound to DNA. In fact it has been purified as a large protein complex not attached to DNA many, many different times. There have also been models that propose that the holoenzyme preassembles before binding to DNA, so, unless you have new evidence that disproves that idea, I think that it is not OK to specify that the RNAP II holoenzyme is bound to DNA.

Also worth being aware that there is a long-standing issue about the fact that the composition of "the" holoenzyme appears to be incredibly variable. Thus, it is not a single complex, but rather any of a number of complexes that contain RNAP II and some number of other factors that are required to bring it to the promoter.

Based on the definitions, the 'DNA-directed RNA polymerase II, core complex' should have the 'capable of' some 'DNA-directed 5'-3' RNA polymerase activity' ("... Although the core is competent to mediate ribonucleic acid synthesis, it requires additional factors to select the appropriate template.")

I'm inclined to agree that

ValWood commented 6 years ago

"DNA-directed RNA polymerase II, holoenzyme" is NOT necessarily bound to DNA.

The same is true for many complexes, but we building location of activity into GO seems fine to me? I don't have a problem with any nuclear transcription factor complexes being a child of "chromatin".

ukemi commented 6 years ago

I think this is very dangerous. The ontology is meant to represent universals. It seems part of is not the right relationship for what you want. It is universally true that the complex is always part of the nucleoplasm.

ValWood commented 6 years ago

Is that what we always do though?

ValWood commented 6 years ago

Maybe we do. Nearly. There are no complexes under chromatin, except "nucleosome"

ukemi commented 6 years ago

Because nucleosomes are always part of chromatin.

ValWood commented 6 years ago

So there are no free nucleosomes? OK... Makes sense...

ukemi commented 6 years ago

Nucleosome: A complex comprised of DNA wound around a multisubunit core and associated proteins, which forms the primary packing unit of DNA into higher order structures.

krchristie commented 6 years ago

I was about to say the same thing as @ukemi

ValWood commented 6 years ago

OK!

pgaudet commented 6 years ago

'nuclear transcription factor complex' is 'part of' some nucleus'.

We should do the same with the others (GTFs), I think.

pgaudet commented 6 years ago

Well - DNA-directed pol II and IV complexes are 'part of the nucleoplasm', which excludes the chromosomes. Is this right ?

Isn't it the case that these proteins are active at the chromatin but can also be found in free form? Aren't we capturing the active form ?

(Safest seems to be 'nuclear' for all).

Pascale

pgaudet commented 6 years ago

Current structure is: 'nuclear DNA-directed RNA polymerase complex'

If I move alpha DNA polymerase:primase complex' out (see #15977) I can add 'capable of' some 'DNA-directed 5'-3' RNA polymerase activity' to the parent 'nuclear DNA-directed RNA polymerase complex' (and since the core complex is part_of the holoenzyme, the core complex anyways inherits the activity).

Pascale

bmeldal commented 6 years ago

Safest seems to be 'nuclear' for all.

I think so.

If I move alpha DNA polymerase:primase complex' I can add 'capable of' some 'DNA-directed 5'-3' RNA polymerase activity' to the parent 'nuclear DNA-directed RNA polymerase complex'

GO:0055029 nuclear DNA-directed RNA polymerase complex "A protein complex, located in the nucleus, that possesses DNA-directed RNA polymerase activity."

That holds true for 'alpha DNA polymerase:primase complex' so can't move unless you make the def of 'nuclear DNA-directed RNA polymerase complex' stricter and exclude synthesis of short RNA strand.

and since the core complex is part_of the holoenzyme, the core complex anyway inherits the activity.

Aren't we merging them? Or just holoenzyme and PIC? I don't like the core vs holo- distinction as it has a fluid boundary and is used mainly by experimentalists to distinguish between persistently-found proteins and those that come and go dep on the cellular circumstances. Often you don't find the core complex on it's own, at least not as the functional unit.

pgaudet commented 6 years ago

These labels should be simplified from 'DNA-directed RNA polymerase (...)', to

'RNA polymerase I complex' 'RNA polymerase II, core complex' 'RNA polymerase II, holoenzyme' 'RNA polymerase III complex' 'RNA polymerase IV complex' 'RNA polymerase V complex' (since the RNA-directed RNA polymerases are called 'RNA-directed RNA polymerases', and not I, II, III, IV, V, there is no risk of confusion)

(Done in #15980)

pgaudet commented 6 years ago

@bmeldal So you suggest I do a 3-way merge between PIC, core and holenzyme ? (that works for me)

bmeldal commented 6 years ago

It's safer as it reflects biology better, I don't know that the others thinks???

(I've come across quite a few examples of that "behaviour' when curating poorly-defined epigenetic complexes. They seem to be poorly defined as they are a) difficult to purify and b) vary dep on cellular circumstances. And then there are the technical artifacts to consider - just because you don't see one of the proteins one day doesn't mean it's not there, it just didn't get pulled down under the conditions. Change the conditions and you get a slightly different complex... And then you get the complexes defined on what has been detected on a Western based on expected members! No effort to identify ALL potential members but drawing some strong conclusions about the complex composition. Nightmare!!!)

pgaudet commented 6 years ago

Reflecting the biology is great !!

bmeldal commented 6 years ago

Reflecting the biology is great !!

Isn't that our job?

pgaudet commented 6 years ago

OK, 3-way merge it is, and the label will be 'DNA-directed RNA polymerase II'. Let me know if that's not right.

pgaudet commented 6 years ago

In fact I think the holoenzyme should be obsoleted (as @ukemi also suggested in #10556). We cannot be sure of whether the proteins annotated to the holoenzyme are part of PolII.

There are

(9) PomBase @ValWood (9) SGD @krchristie @suzialeksander (9) UniProt @ggeorghiou @sylvainpoux (1) TAIR @tberardini

Let me know if a merge would be OK.

Thanks, Pascale

ValWood commented 6 years ago

Can you summarize, I got a bit lost.

You aren't doing anything with mediator are you?

I agree that we don't need "holoenzyme"

pgaudet commented 6 years ago

I think mediator needs to be moved out from RNA pol II complex, see #15979

ValWood commented 6 years ago

This is what I would prefer

keep

DNA-directed RNA polymerase II, core complex https://www.pombase.org/term/GO:0005665

mediator https://www.pombase.org/term/GO:0016592 (I don't see this as part of any polymerase term)

get rid of DNA-directed RNA polymerase II, holoenzyme (GO:0016591) https://www.pombase.org/term/GO:0016591

bmeldal commented 6 years ago

"DNA-directed RNA polymerase II, core complex" --> "DNA-directed RNA polymerase II" (remove "core")

Mediator and integrator complexes are not a type of RNAPII, they interact with it: https://github.com/geneontology/go-ontology/issues/15979

ValWood commented 6 years ago

"DNA-directed RNA polymerase II, core complex" --> "DNA-directed RNA polymerase II" (remove "core")

is only used for the polymerase itself. There shouldn't be any mediator subunits annotated to this term.

I don't think you can merge DNA-directed RNA polymerase II, holoenzyme (GO:0016591) into any existing term.

Most of our annotations to this term would not fit the existing subcomplexes. I will fix them. I think everyone should check, it's been used somewhat loosely....

https://github.com/pombase/curation/issues/2067

bmeldal commented 6 years ago

I don't think you can merge DNA-directed RNA polymerase II, holoenzyme (GO:0016591) into any existing term.

Most of our annotations to this term would not fit the existing subcomplexes. I will fix them. I think everyone should check, it's been used somewhat loosely....

pombase/curation#2067

Ok, I guess it needs an annotation revision ticket, @pgaudet

pgaudet commented 6 years ago

Discussing with @ValWood In fact several annotations to 'transcriptional preinitiation complex' are incorrect, so we will propose to obsolete.

krchristie commented 6 years ago

Core RNAP II versus holoenzyme does not have a fluid boundary. Core is very simply defined as the 12 subunit enzyme. It is definitely NOT equivalent to either the PIC or the holoenzyme. I think it would be a truly bad idea to merge core and holoeyzme. The core term has a very useful role of definining the composition of the RNAP II enzyme itself.

Holoeyzme has multiple different compositions, which all include RNAP II core, as well as a varying composition of other transcription complexes. I have proposed obsoleting it previously due to the inability to define it precisely. However, this has been strenuously objected to by multiple people because the phrase "holoenzyme" is used frequently in the literature. @RLovering may have an opinion on this.

I would think that incorrect annotations to a term would be justification for fixing the annotations, not necessarily for obsoleting the term.

bmeldal commented 6 years ago

Just because experimentalists call things core and holo doesn't mean it reflects biology. In epigenetic complexes they call the catalytic core components "core" as they are always there and necessary but they never act on their own in the cell (they may be functional on their own in vitro).

We should describe what's happening in the cell.

The mediator is different, the CDK subcomplex does come and go and gives the mediator a different function.

bmeldal commented 6 years ago

Core RNAP II versus holoenzyme does not have a fluid boundary. Core is very simply defined as the 12 subunit enzyme.

Not according to the GO def: " RNA polymerase II, one of three nuclear DNA-directed RNA polymerases found in all eukaryotes, is a multisubunit complex; typically it produces mRNAs, snoRNAs, and some of the snRNAs. Two large subunits comprise the most conserved portion including the catalytic site and share similarity with other eukaryotic and bacterial multisubunit RNA polymerases. The largest subunit of RNA polymerase II contains an essential carboxyl-terminal domain (CTD) composed of a variable number of heptapeptide repeats (YSPTSPS). The remainder of the complex is composed of smaller subunits (generally ten or more), some of which are also found in RNA polymerases I and III. Although the core is competent to mediate ribonucleic acid synthesis, it requires additional factors to select the appropriate template."

No list of core subunits (in bold) and a clear comment that it required additional subunit for its full activity (in bold).

Therefore, "holoenzyme" subunits can be annotated to this term.

And PIC is very badly defined:

GO:0097550 transcriptional preinitiation complex "A protein-DNA complex composed of proteins binding promoter DNA to form the transcriptional preinitiation complex (PIC), the formation of which is a prerequisite for transcription."

As mentioned above, we didn't curate the PIC as it's impossible to define.

krchristie commented 6 years ago

That sentence you have highlighted in the definition above does NOT mean that holoenzyme components should be annotated to the term for core RNAP II.

It refers to the fact that RNAP II is competent for the FUNCTION of the enzyme activity, but that other things are required for the PROCESS of transcription.

It becomes very problematic to have all the things that have been annotated to holoenzyme start get annotated to something that is 'capable of' RNA polymerase activity, because NONE of these other things are capable of this activity, nor do they contribute to the activity. Rather, they act to bring the enzyme to the correct place.

pgaudet commented 6 years ago

What I proposed in the end is to

As the exchange shows, PIC and holoenzyme are not clear complexes. I read the definition for core the same as @krchristie , but if you have suggestions to clarify the meaning @bmeldal , please send them (perhaps the sentence in bold is actually more confusing than helpful).

Thanks, Pascale

pgaudet commented 6 years ago

Emailed Colin, Astrid, Marcio, Ruth and Val.

bmeldal commented 6 years ago

"Although the core is competent to mediate ribonucleic acid synthesis, it requires additional factors to select the appropriate template."

It refers to the fact that RNAP II is competent for the FUNCTION of the enzyme activity, but that other things are required for the PROCESS of transcription.

As the exchange shows, PIC and holoenzyme are not clear complexes. I read the definition for core the same as @krchristie , but if you have suggestions to clarify the meaning @bmeldal , please send them (perhaps the sentence in bold is actually more confusing than helpful).

I read it as referring to the COMPOSITION as we are discussing the COMPONENT term. Where did I go wrong?

It becomes very problematic to have all the things that have been annotated to holoenzyme start get annotated to something that is 'capable of' RNA polymerase activity, because NONE of these other things are capable of this activity, nor do they contribute to the activity. Rather, they act to bring the enzyme to the correct place.

I would never suggest to annotate the non-catalytic components to RNAP ACTIVITY but we are discussing the COMPONENT term here. Which complex COMPONENT term will you annotate the old holoenzyme and PIC components to now? If they are chaperones etc they should never have been annotated to a complex term anyway. But how do you annotate components that are truly part of the RNAPII complex but not part of the conserved core and not a chaperone? @ValWood?

ValWood commented 6 years ago

Which complex COMPONENT term will you annotate the old holoenzyme and PIC components to now? If they are chaperones etc they should never have been annotated to a complex term anyway. But how do you annotate components that are truly part of the RNAPII complex but not part of the conserved core and not a chaperone?

I don't currently have any examples which would not be annotated to one of the subcomplexes, with the exception of the CTD phosphatase fcp1. I would be happy for this not to have a holoenzyme annotation since it is connected via substrate "rpb1". This phosphatase may be more promiscuous (I don't know, but it relocalizes to the cytosol during hypoxia).

I'm a bit ambivalent whether we keep the term or not if people think we should have it as a grouping term, I'm happy for it to stay....but we need to be precise about what would be annotated to it.....

ValWood commented 6 years ago

This is what I have https://www.pombase.org/term/GO:0016591

ValWood commented 6 years ago

It might be useful to know what would not be annotated without the "holoenzyme" term. Then we would know if we needed it.