geneontology / go-annotation

This repository hosts the tracker for issues pertaining to GO annotations.
BSD 3-Clause "New" or "Revised" License
34 stars 10 forks source link

Annotation review: GO:0006351 transcription, DNA-templated #1720

Open pgaudet opened 6 years ago

pgaudet commented 6 years ago

(Edited to make clearer):

GO:0006351 transcription, DNA-templated should be a 'do not annotate' term; it should always be possible to specify whether we are talking about -- GO:0001121 transcription from bacterial-type RNA polymerase promoter -- GO:0006390 transcription from mitochondrial promoter -- GO:0042793 transcription from plastid promoter -- GO:0006360 transcription from RNA polymerase I promoter -- GO:0006366 transcription from RNA polymerase II promoter -- GO:0006383 transcription from RNA polymerase III promoter -- GO:0001059 transcription from RNA polymerase IV promoter -- GO:0001060 transcription from RNA polymerase V promoter

AND whether we are talking about initiation or elongation, for each of those promoters.

@krchristie @ValWood @RLovering @thomaspd feedback would be much appreciated.

Thanks, Pascale

ValWood commented 6 years ago
  1. Some accessory proteins ore not regulatory. TFIIA,D,E,F,H,K TFIII,A B,C, and others
pgaudet commented 6 years ago

Hi @ValWood These mediate initiation, don't they ?

ValWood commented 6 years ago

Ah for elongation, I'm not sure...but we have stuff annotated to transcription elongation that is not annotated to regulation and not RNA pol (CCR4-NOT complex, Pfa1 complex, THO complex, TFIIS, elf1 and others). I don't know if we even know enough about their actual role to know if they are regulating or not...

RLovering commented 6 years ago

Hi Pascale

in the TF meetings we have been discussing that we need to decide where the process of transcription starts and ends.

If we compare transcription to signaling. The term 'receptor mediated signaling pathway' covers every step from ligand binding to regulation of transcription. We could have said the pathway starts at the first receptor-activated-intracellular-entity, because in many ways you could view the action of the ligand and the receptor as 'regulating the intracellular pathway'.

The approach that Astrid and Marcello thought might work would be for the term 'transcription' to include all of the steps that the general/basic TFs are responsible for (even though initiation and elongation are in many ways a 'regulation' process) and this then means that proteins that regulate transcription are those which interacts with the general/basic TFs, and all the other TFs, co-activators, etc.

The reason behind this is that the alternative is that the only proteins annotated to transcription will be the RNA polymerases. And yet they are unable to transcribe without the action of the general/basic TFs.

This fits with the current ontology where we have:

GO:0006351 transcription, DNA-templated

is a child GO:0006352 DNA-templated transcription, initiation

is a child GO:0006367 transcription initiation from RNA polymerase II promoter

is a child GO:0006366 transcription from RNA polymerase II promoter

is a child GO:0006367 transcription initiation from RNA polymerase II promoter

is a child GO:0006355 regulation of transcription, DNA-templated

GO:0006357 regulation of transcription from RNA polymerase II promoter

I have also been suggesting that under regulation of transcription parent terms we have something along the lines of:

  1. signaling pathway involved in regulation of transcription (because not all signaling pathways regulate transcription and this would enable us to put the IDs of the genes regulated by a signaling pathway into the AE field - and with Noctua the downstream processes could be exported to the AE field in this way too.)

  2. then have terms such as regulation of transcription from RNA polymerase II promoter (and child terms) reserved for use for only entitites that actually act in the nucleus.

So if you want the term 'transcription' to be reserved for only RNA polymerases then I suggest that you would need some sort of child term under regulation of transcription. The problem is it is very hard to think about how to define a term that is reserved solely for the initiation/elongation/PIC/general/basic TFs, maybe something like:

  1. Regulation of RNA polymerase initiation and elongation (I realise this in not a great term but I don’t have time to think about a name for a term that I don’t think we should try to create).

I think you need to think hard about what will improve the GO for the GO users. Personally I don't think grouping general/basic TFs with the dbTFs, co-activators etc is helpful. Just as you and I don't think grouping signaling molecules with dbTFs, co-activators etc is helpful either. Once you have considered what groupings are useful, and also consider what terms enable downstream target of the process to be associated with the entities annotated, this might help with guiding the ontology.

This seems like a very basic issue that needs to be agreed. It seems to me that there are 2 key options for deciding the start and end of the process of Transcription. This is such a big decision and one that needs to be agreed by everyone, not just the NTNU COST groups.

At present the PIC is not being annotated as regulating transcription, therefore if you intend to now suggest the PIC is annotated to 'regulation of transcription' this should be presented in a clear way so that people can understand what impact this is going to have on their data and their users analyses. ie not just presenting a revised ontology but maybe have a very simple diagram of transcription (ie one of Astrid's figures) with a ring around the proteins currently associated with the regulation terms and those only associated with transcription, and then a new figure showing which proteins will be associated with the regulation and transcription terms in the new ontology.

Best

Ruth

RLovering commented 6 years ago

Also shouldn't there be an option to select 'transcription' as a project or label?

RLovering commented 6 years ago

Also, if the 'process' transcription is only associated with RNA polymerases then you have reduced 'transcription' to a 'function'.

pgaudet commented 6 years ago

yes you can pick the label - let me decide which ones go in the project !

pgaudet commented 6 years ago

Also, if the 'process' transcription is only associated with RNA polymerases then you have reduced 'transcription' to a 'function'.

But then how do you define 'transcription' versus 'regulation of transcription'?

That would help if we had this clarified.

Thanks, Pascale

RLovering commented 6 years ago

Hi Pascale,

based on the definitions of the existing child terms for the existing term GO:0006351 transcription, DNA-templated

The definition of GO:0006351 transcription, DNA-templated could be improved to state something like:

The cellular synthesis of RNA on a template of DNA. Transcription starts with the assembly of the RNA polymerase preinitiation complex (PIC) at the core promoter region of a DNA template, resulting in the subsequent synthesis of RNA from that promoter. The process includes the transition between the initiation and elongation phases of transcription, ie promoter clearance and release. The process ends with termination, when the formation of phosphodiester bonds ceases, the RNA-DNA hybrid dissociates, and RNA polymerase releases the DNA.

Personally, I often look at the child terms for a parent term if the definition for the parent term is rather vague as this often helps me work out exactly what the parent term includes.

For the parent terms of GO:0006351 transcription, DNA-templated the definition could also be adopted with modifications.

Ruth

ValWood commented 6 years ago

I really prefered the old split between gene specific-trancription and general transcription before the previous overhaul. This was very useful for users, for analysis, for modularization.

Using this you can correctly annotate "regulation of gene specific transcription" (via signalling), or the "regulation of core-transcription" (or it's specific parts) by conserved co-repressors etc.

It was very easy to separate the two different sets using GO terms ( now, you need to do complicated subtraction and union queries).

I can't remember why we needed to get rid of it but @ukemi and @krchristie will know.

RLovering commented 6 years ago

OK sorry I had misinterpreted this ticket, following discussions I had had previously. But I think the improved definition would be helpful

pgaudet commented 6 years ago

@ValWood Here I am talking about Process, not function. What I am proposing is to have the 'general' machinery being involved in transcription, and the specific transcription factors being annotated to 'regulation of transcription' (or some child of that). Are we on the same page '?

Pascale

krchristie commented 6 years ago

I don't agree with saying that the:

'general' machinery being involved in transcription, and the specific transcription factors being annotated to 'regulation of transcription' (or some child of that)

Early on in transcription research, there was some thought that the GTFs were used at ALL RNAP II promoters, but this seems to be an artifact of having done a lot of work on a very small subset of promoters. With additional research using a much wider selection of promoters, it began to be realized that there are many different assembly pathways to form a pre-initiation complex (PIC) that contains RNAP II. See the figure in this paper:

Sikorski TW, Buratowski S. The basal initiation machinery: beyond the general transcription factors. Curr Opin Cell Biol. 2009 Jun;21(3):344-51. doi: 10.1016/j.ceb.2009.03.006. Epub 2009 May 4. Review. PubMed PMID: 19411170 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2692371/

Thus, like the "specific" transcription factors, the "general" factors are regulating a set of genes, with the difference being in the scale of the set of genes regulated. Typically the "specific" factors regulate a gene or small group of genes, while the "general" factors regulate larger scale transcriptional programs.

When @ukemi and I worked through this question before, we contacted a number of RNAP III researchers to specifically ask the question, is TFIIIC part_of transcription or a regulator of transription, or both. We got basically a 50:50 split with some researchers thinking that TFIIIC is only part_of transcription, while the other half thought it was both part of the transcription machinery and also regulating transcription because it is simultaneously true that the transcription can not happen without TFIIIC and that TFIIIC is regulating when transcription occurs. We ended up making both the specific and general TFs both being involved in transcription and regulating it.

RLovering commented 6 years ago

Hi Karen

I agree with your comments, in fact have just annotated TAF1 as negatively regulating transcription. However in the ontology if we agree that the process of transcription involves the PIC ie transcription starts with initiation and ends when RNA polymerase releases the DNA then all components of the PIC are involved in transcription and initiation is part of transcription not regulating transcription.

This would suggest that it is possible that not all components of the PIC are involved in regulation of transcription. I was therefore proposing that the ontology structure that you created is maintained as shown in GO:0006352 DNA-templated transcription, initiation [https://www.ebi.ac.uk/QuickGO/term/GO:0006352]; GO:0006367 transcription initiation from RNA polymerase II promoter [https://www.ebi.ac.uk/QuickGO/term/GO:0006367]; GO:0051123 RNA polymerase II transcriptional preinitiation complex assembly [https://www.ebi.ac.uk/QuickGO/term/GO:0051123]

However, I think we need to consider whether terms, such as GO:0001139 transcription factor activity, core RNA polymerase II recruiting, should have 'regulation' parents. If initiation and the recruitment of RNA polymerase is part of the process of transcription as we have defined it then any PIC subunits interacting with RNA polymerase and recruiting this complex will not be regulating the process of transcription as we have defined it. Therefore I think the regulation parent needs to be removed.

ie GO:0001076 transcription factor activity, RNA polymerase II transcription factor binding, definition: Interacting selectively and non-covalently with an RNA polymerase II transcription factor, which may be a single protein or a complex, in order to modulate transcription. A protein binding transcription factor may or may not also interact with the template nucleic acid (either DNA or RNA) as well. should not be the parent of: GO:0001083 transcription factor activity, RNA polymerase II basal transcription factor binding and the definition should not state that the only reason this binding occurs is to modulate transcription: Definition: Interacting selectively and non-covalently with a basal RNA polymerase II transcription factor, which may be a single protein or a complex, in order to modulate transcription. A protein binding transcription factor may or may not also interact with the template nucleic acid (either DNA or RNA) as well.

If the action of the PIC is part of the process of transcription then a basal TF binding to another basal TF does not always lead to regulation of transcription.

And similarly recruiting TFII-class components by other members of the PIC is part of transcription. Unfortunately I do not know enough about PIC assembly however it seems off to have the terms GO:0001137 transcription factor activity, TFIIF-class transcription factor recruiting; GO:0001136 transcription factor activity, TFIIE-class transcription factor recruiting; GO:0001138 transcription factor activity, TFIIH-class transcription factor recruiting as regulating transcription when I would have thought that there are occasions when the recruitment is part of transcription and not necessarily regulating transcription.

If the regulation parent terms were removed then individual PIC subunits can be annotated to regulating transcription when this is shown to be the case. Rather than all PIC subunits being annotated to regulating transcription when it may not be the case.

If necessary we might want to create different terms for specific transcription factors that will always be regulating transcription through their recruitment of the PIC (and RNA polymerase) from the 'general transcription factors' whose action is 'part of' the transcription process as we are defining transcription, with the idea that when there is evidence that these GTFs have a regulatory role then they can be given an additional annotation.

I have just annotated a paper that shows that TP53 binds and recruits TFIID so ideally there would be terms such as:

GO:0051123 RNA polymerase II transcriptional preinitiation complex assembly (without regulation parents as per existing ontology)

Part_of NEW child: transcription factor activity, involved in preinitiation complex assembly. (NO regulation parents)

is_a child GO:0001074 transcription factor activity, RNA polymerase II proximal promoter sequence-specific DNA binding involved in preinitiation complex assembly. (KEEP regulation parents)

And consider reducing the number of recruiting terms many of which have not been used to: GO:0051123 RNA polymerase II transcriptional preinitiation complex assembly (NO regulation parents as per existing ontology)

Part_of NEW child: transcription factor activity, TFII-class transcription factor recruiting. (NO regulation parents)

is_a NEW child: transcription factor activity, RNA polymerase II proximal promoter sequence-specific DNA binding TFII-class transcription factor recruiting. (WITH regulation parents)

Delete more specific terms which currently have few annotations, if annotation extension information could be included to enable this to be more specific?:

KEEP this term: GO:0051123 RNA polymerase II transcriptional preinitiation complex assembly (without regulation parents as per existing ontology)

Part_of Consider deleting this term (no annotations) GO:0001136 transcription factor activity, TFIIE-class transcription factor recruiting (or if keeping then consider removing the regulation parent) Part_of Consider deleting this term (no annotations) GO:0001137 transcription factor activity, TFIIF-class transcription factor recruiting (or if keeping then consider removing the regulation parent) Part_of Consider deleting this term (1 expt annotation to MED11) GO:0001138 transcription factor activity, TFIIH-class transcription factor recruiting (or if keeping then consider removing the regulation parent)

From the definition of the mediator complex (A protein complex that interacts with the carboxy-terminal domain of the largest subunit of RNA polymerase II and plays an active role in transducing the signal from a transcription factor to the transcriptional machinery. The mediator complex is required for activation of transcription of most protein-coding genes) I think that the mediator complex is considered to be part of 'transcription' in the way that 'transcription is defined.

It would be good to make a decision about this and then include the mediator complex somehow in the definition of one of the transcription child term. Either stating that: the mediator complex is often involved in the process the transcription process; or stating that the mediator complex is often involved in regulating transcription.

Thanks

Ruth

RLovering commented 6 years ago

UCL annotations updated where possible

RLovering commented 5 years ago

Hi Pascale 1 HGNC annotation changed - today 1 BHF annotation removed previously 1 ARUK annotation removed previously so UCL done for this ticket/spreadsheet Ruth

srengel commented 5 years ago

SGD done.