geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
220 stars 40 forks source link

NTR: [RNA polymerase large subunit CTD code] #23393

Closed colinlog closed 1 year ago

colinlog commented 2 years ago

There may perhaps be a need to have an overarching BP term somewhere between the BPs GO:0016070 RNA metabolic process and GO:0006366 RNA polymerase II that can house the 'RNA polymerase II cycle' that molecularly consists of post-translational modifications of the YSPTSPS heptad at every residue (phosphorylation of Y1, S2, T4, S5, S7) and proline isomerisation (P3, P6). It is process because, like the histone code, there are non-polymerase sbunit proteins that write this code (many CDKs) and other proteins that read the CTD modification code to (i) regulate RNA polymerase II (sub)processessuch as transcription initiation, promoter clearance, elongation, termination and recycling and (ii) help the maturation or processing of the nascent RNA. Furthermore, there are non-consensus heptad repeats that are important to snRNA and snoRNA regulation, R-loop resolution and transcription termination (nice review is PMID:28248323)

For human it is POLR2A, budding yeast RBP1

Existing children: MFs for the modification? RNA polymerase transcription and all its children that are known to depend on the CTD code? recruitment of RNA capping, RNA splicing factors is also result of the CTD code too!

A new child BP would be The variant / non-consensus repeat modification code?

The molecular functions for the enzymatic modification of every amino acid in the heptad repeat have been created and these should have part_of relations to this new BP.

One open question might be whether this should be a 'regulates' branchor really a BP?

ValWood commented 2 years ago

Do we need such a grouping term?

These phosphorylation events (kinases) can just be coupled to their substrates, and then part_of the processes that they regulate (initiation, elongation, termination) etc.

This term seems like an extra level of classification.?

I do have a question though, whether these should be part_of or "regulation of" the respective processes. I have always been a bit unclear on this. I have used "regulation" but depends where we define the starts of initiation etc (and seems odd to be 'regulation as we decided earlier that these are "general initiation factors")

colinlog commented 2 years ago

Regulation: Agreed, not regulatory as such. These are GTFs that are themselves regulated by dbTFs (GO:0003700 and descendants) and/or coTFs (GO:0003712 and descendants). However, note that CDK8/cyclinC CTD kinase is part of a coTF protein complex (the mediator), which is a complex that is activated by dbTFs, but that is not an issue, is it?

Part_of not good, has_part always works: The BP transcription initiation by RNA polymerase II GO:0006367 ALWAYS has_part MF RNA polymerase II C-terminal domain S5 kinase activity GO:0140836. In a discussion with Pascale, because there are combinatory modification aspects (eg; need to have S7p to efficiently make S5p) to the RNAP2 large subunit heptad repeat code that controls the RNA polymerase II cycle from initiation to termination and re-initiation by the enzyme at another promoter, the other relation, namely CTD-S5 kinase activity is part_of BP intiation is not always true.

Extra level: The CTD code is different from the histone code or from activating cascades because it involves 52 repeats in man (26 in yeast). Hence, this code distinguishes itself from all the other PTM-driven codes because the mechanisms underlying it are not only combinatorial at the level of one heptad repeat, they are also combinatorial at the level of the 52 repeats. Hence, I would think that the CTD modification functions (Y1kinase,S2kinase,P3isomerase,T4kinase,S5kinase,S5O-glc-NAc,P6isomerase,S7kinase,S7O-glcNAc) should have is_a relations to a parent MF term that could indicate that this concerns the CTD (and not just its phosphorylation, as there are promline isomerisation and serine O-glcNAc activities), for example RNA polymerase II large subunit carboxy-terminal domain (CTD) modification. Alternatively, we could house these activities for each CTD residue under the existing BP:0006366 transcription by RNA polymerase II with the relation has_part. I would like to argue that we can do both, a has_part to the BP and a is_a to the proposed çhapeau MF for CTD modifcation activities.

Not yet dealt with: There are variant repeats that bear a R residue at position 1 that can be methylated and others that bear a K residue at position 7 that can be methylated, acetylated or ubiquitylated. These appear to be involved in "snRNA and snoRNA regulation, R-loop resolution and transcription termination [R1me]" or that "Supports nucleosome occupancy at promoters; negatively regulates gene expression [K7me]" and that "Induction of growth-factor response genes, transcription elongation; maintains balance between Lys methylation and acetylation and affects mRNA expression levels [K7ac]" or that direct "RPB1 degradation [K7ubi]".

All this is very well summarized in PMID:28248323 [a table of repeats, a table of modifcations, a figure of where the CTD modifications are found along the genes, and references to the enzymes /complexes that are known to perform the modifications as well as readers of the modifications on the CTD

pgaudet commented 1 year ago

Proposal is to create a new term, 'RNA polymerase large subunit C-terminal domain modifying activity' to group all the activites. (We will do the same for histone modifiers). This is an unusual way to group terms but otherwise these terms are not easy to find.

Thanks, Pascale

pgaudet commented 1 year ago

Done in #24980