geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
215 stars 39 forks source link

Patterns for axiomatisation of transcription factor activities #16970

Open pgaudet opened 5 years ago

pgaudet commented 5 years ago

From @dosumis on November 8, 2016 16:49

From @dosumis on October 28, 2015 13:46

From @dosumis on August 28, 2015 9:56

We currently have patterns like this:

But these patterns are not safe. It is not necessarily the case that being part of a regulatory process entails being a regulator of the regulated process. This pattern probably arose from implementation of the general MF part_of BP pattern. In this case, it would be better to directly assert MF regulation of BP. But which relation to use?

Perhaps directly activates:

Current def: "p directly activates q if and only if p is immediately upstream of q and p is the realization of a function to increase the rate or activity of q."

But see notes from 2015-07-23 eds meeting on defining directly postively regulates.

CC @cmungall @ukemi

Copied from original issue: geneontology/go-ontology#12033

_Copied from original issue: geneontology/designpatterns#2

_Copied from original issue: geneontology/molecular_functionrefactoring#23

pgaudet commented 5 years ago

From @dosumis on November 8, 2016 16:49

From @cmungall on August 28, 2015 16:16

I think 'directly activates' is correct here, even if we have a more general direct regulation parent that is not restricted to activities

pgaudet commented 5 years ago

From @dosumis on November 8, 2016 16:49

From @ukemi on August 28, 2015 17:5

It fits the definition of directly activates.

pgaudet commented 5 years ago

From @thomaspd on February 20, 2017 19:27

Main step to take is a top level restructuring. It removes the top level distinction between protein-binding vs DNA binding, as the current protein binding class includes both DNA binding TFs and nonDNA binding "cofactors". The main distinction is between DNA binding (TF) and non-DNA binding (T co-F), followed by effect (activators/coactivators vs repressors/corepressors). I've kept any older classes that would remove more than a few existing annotations, just to prevent any issues with annotations. screen shot 2017-02-20 at 11 25 46 am screen shot 2017-02-20 at 11 26 07 am

top level class should be transcription regulator activity is_a binding that directly regulates transcription

transcription factor (GO:0003700): is_a (or has_part?) sequence-specific DNA binding AND directly regulates transcription

transcription cofactor is a (or has_part?) protein binding, AND NOT sequence-specific DNA binding, that directly regulates transcription

transcription activator is a transcription factor that directly positively regulates transcription

transcription coactivator is a transcription cofactor…

pgaudet commented 5 years ago

From @dosumis on February 28, 2017 16:48

The ontology currently uses this pattern:

molecular_function that ('has part' some 'nucleic acid binding') and ('part of' some 'regulation of transcription, DNA-templated')

But in Noctua/LEGO curators are using directly_activates rather than part of:

image

http://noctua.berkeleybop.org/editor/graph/gomodel:583f430000000041

directly_positively_regulates may be justified here (based on direct interaction between the TF and other parts of transcriptional apparatus).

We need to decide on one of these two patterns (or reconcile the two with reasoning if that is possible).

pgaudet commented 5 years ago

From @dosumis on May 11, 2017 17:9

How to formalise this:

image

?

Differentia:

1. Regulatory effect on transcription - record via link to BP. Two possible patterns:

(a) part_of some 'regulation of transcription, DNA templated' # (use +ve/-ve R terms for transcriptional activator/repressor terms)

OR

(b) directly_regulates some 'transcription, DNA templated' # (use directly_(positively/negatively)_regulates edges for transcriptional activator/repressor terms

One of the aims of MF design patterns for compound functions such as this one is to maximise useful causal inference chains in LEGO. In this respect, pattern (b) is better. It doesn't obscure regulation of the whole process via a part_of link. However, ideally we'd still get inferred annotation to the relevant regulation of transcription BP term. solutions to this:

This is a general issue - covered in #49

2. RNA polymerase type:

formalise via transcription type.

image

3. DNA target bound: has_necessary_component* {transcription regulation region DNA binding}

* see https://github.com/geneontology/molecular_function_refactoring/issues/25#issuecomment-300870100

Some cleanup or target terms needed:

image

image

These can be defined using logical defs that use SO terms as differentia. The promoter_element hierarchy might cover what's needed.

See also https://github.com/geneontology/go-ontology/issues/13002

4. direct regulation e.g. by binding ligand or metal ion binding

* see https://github.com/geneontology/molecular_function_refactoring/issues/25#issuecomment-300870100

pgaudet commented 5 years ago

From @dosumis on May 24, 2017 14:50

Draft pattern here:

https://github.com/geneontology/molecular_function_refactoring/blob/master/patterns/transcription_factor_DNA_binding.yaml

Still need to sort out naming.

Some notes:

pgaudet commented 5 years ago

From @dosumis on May 24, 2017 14:51

CC @thomaspd

pgaudet commented 5 years ago

From @dosumis on May 26, 2017 16:57

Paul:

If we have a general 'necessary part of':

If transcription of gene X requires the activity of TFs A, B and C, we could say each activity is a 'necessary part of' transcription of gene X.

By analogy to necessary_component_of this would be a subproperty of a causal relation - so not => loss of causal chain.

pgaudet commented 5 years ago

From @dosumis on July 14, 2017 11:20

Notes on name and definition changes.

The original refactoring/implementation of detail TF terms in the GO relied assumptions about curation that may not longer apply in the new era of GO-cam curation. Classic GO curation is very granular, with small amounts of evidence - often single experiments (?) - being used for annotation. It is rare for a single experiment to show that a transcription factor acts to regulate transcription via DNA binding and via protein binding. To cope with this, two branches were added to the GO - one covering activities that (directly) regulate transcription via protein binding and another covering activities that directly regulate transcription via DNA binding. A broad interpretation of the phrase 'transcription factor' - covering cofactors and DNA binding transcription factors - allowed both branches to include this phrase in their labels. As DNA binding transcription also bind proteins as part of their regulation of transcription*, annotation of these activities relied on co-annotation with appropriate terms from each branch (although a small number of terms appear under both branches).

With GO-CAM modeling, we can much more easily combine different pieces of evidence to build a model of gene product activity, so these considerations no longer apply. In GO-CAM, and as part of ongoing work on refactoring molecular function, we aim to model the compound nature of molecular functions as far as possible. We therefore need new design patterns for (DNA binding) transcription factor activity that allow us to capture its compound nature (DNA binding and protein binding components and their relationship to regulation of transcription).

The original refactoring deliberately omitted a general term for transcription regulator activity. There were two reasons for this:

  1. A very restricted view of what terms count as molecular functions: that they require a clear specification of mechanism**. This seems unwarranted, as we have large numbers of MF terms to which this does not apply including molecular_function itself and all the regulator activity terms. This approach prevents us from using one of the great strengths of ontologies - that when we don't know details we can annotate to a more general class. This approach clashes with the preferred definition of molecular function used in the molecular function refactoring that the current work is part of: A process that can be carried out by a single gene product or complex.
  2. A concern that any 'transcription regulator activity' class would be redundant with the biological process: regulation of transcription. This concern seems unwarranted. The biological process term encompasses processes that are far upstream of transcription factor activity and processes that encompass multiple molecular functions. Signal transduction pathways that regulate transcription are an example of both.

This refactoring:

  1. Uses a tighter definition of transcription factor limiting it to activities that regulate transcription by binding DNA.
  2. Adapts an existing term to make a general transcription regulator class
  3. Reflects the compound nature of (DNA binding) TF activity

* I would be very interested to hear of any known exceptions to this. ** This is my understanding.

New proposed labels & textual definitions

(Proposed name changes are also discussed in #5 and in Paul's comment upthread). Template-based textual definitions for TFs using the names of component activities are proving hard to specify, so a free-er approach is taken here.


transcription factor activity, protein binding: "Interacting selectively and non-covalently with any protein or protein complex (a complex of two or more proteins that may include other nonprotein molecules), in order to modulate transcription. A protein binding transcription factor may or may not also interact with the template nucleic acid (either DNA or RNA) as well."

-->

transcription regulator activity: "Direct regulation of DNA-templated transcription via selective, non-covalent interaction with elements of the transcription initiation complex or associated proteins. Associated proteins include any protein capable of interacting, directly or indirectly with the transcription initiation complex." comment: This term is a general class that encompasses (DNA binding) transcription factors as well as cofactors.

Questions:


transcription cofactor activity: "Interacting selectively and non-covalently with a regulatory transcription factor and also with the basal transcription machinery in order to modulate transcription. Cofactors generally do not bind the template nucleic acid, but rather mediate protein-protein interactions between regulatory transcription factors and the basal transcription machinery."

-->

transcription cofactor activity: "Interacting selectively and non-covalently with a regulatory transcription factor and also with the basal transcription machinery in order to modulate transcription. Cofactor activity does no involve nucleic acid binding, but rather mediates protein-protein interactions between regulatory transcription factors and the basal transcription machinery." is_a: transcription regulator activity


transcription factor activity, sequence-specific DNA binding: "Interacting selectively and non-covalently with a specific DNA sequence in order to modulate transcription. The transcription factor may or may not also interact selectively with a protein or macromolecular complex."

-->

transcription factor activity: "Direct regulation of DNA-templated transcription via sequence-specific DNA binding and selective, non-covalent interaction with elements of the transcription initiation complex or associated proteins. Associated proteins include any protein capable of interacting, directly or indirectly with the transcription initiation complex." is_a: direct transcription regulator activity

Notes:

Questions:


RNA polymerase II transcription factor activity, sequence-specific DNA binding: Interacting selectively and non-covalently with a specific DNA sequence in order to modulate transcription by RNA polymerase II. The transcription factor may or may not also interact selectively with a protein or macromolecular complex."

-->

RNA polymerase II transcription factor activity: "Direct regulation of transcription from an RNA polymerase II promoter via sequence-specific DNA binding and selective, non-covalent interaction with elements of the transcription initiation complex or associated proteins. Associated proteins include any protein capable of interacting, directly or indirectly with the transcription initiation complex." is_a: transcription factor activity


transcription factor activity, sequence-specific DNA binding transcription factor recruiting: "The function of binding to a specific DNA sequence and recruiting another transcription factor to the DNA in order to modulate transcription. The recruited factor may bind DNA directly, or may be colocalized via protein-protein interactions."

-->

transcription factor activity, transcription regulator recruiting: "Direct regulation of DNA-templated transcription via sequence-specific DNA binding and recruitment of a transcription regulator (transcription factor or cofactor) via direct, non-covalent interaction with the regulator. Recruitment here means that the activity in question is required to bring the transcription regulator to the transcription initiation complex or associated proteins." is_a: transcription factor activity

Questions:


transcription factor activity, sequence-specific DNA binding transcription factor recruiting: "The function of binding to a specific DNA sequence and recruiting another transcription factor to the DNA in order to modulate transcription. The recruited factor may bind DNA directly, or may be colocalized via protein-protein interactions."

-->

transcription factor activity, transcription factor recruiting: "Direct regulation of DNA-templated transcription via sequence-specific DNA binding and binding of another transcription factor leading to its recruitment to an binding of a DNA regulatory region.


CC @astridla, @rlovering, @thomaspd - Comments please.

pgaudet commented 5 years ago

From @dosumis on July 14, 2017 14:54

Notes on problematic terms

RNA polymerase II transcription factor activity, sequence-specific transcription regulatory region DNA binding: "Interacting selectively and non-covalently with a specific sequence of DNA that is part of a regulatory region that controls transcription of that section of the DNA by RNA polymerase II and recruiting another transcription factor to the DNA in order to modulate transcription by RNAP II."

  1. Needs name to more clearly distinguish from: RNA polymerase II transcription factor activity, sequence-specific DNA binding: "Interacting selectively and non-covalently with a specific DNA sequence in order to modulate transcription by RNA polymerase II. The transcription factor may or may not also interact selectively with a protein or macromolecular complex."

  2. Why does it live here: image but not under 'RNA polymerase II transcription factor activity, sequence-specific DNA binding'

pgaudet commented 5 years ago

From @astridla on July 14, 2017 16:2

[Quoted text edited down by DOS]

The original refactoring/implementation of detail TF terms in the GO relied assumptions about curation that may not longer apply in the new era of GO-cam curation. Classic GO curation is very granular, with small amounts of evidence - often single experiments (?) - being used for annotation. It is rare for a single experiment to show that a transcription factor acts to regulate transcription via DNA binding and via protein binding. To cope with this, two branches were added to the GO - one covering activities that (directly) regulate transcription via protein binding and another covering activities that directly regulate transcription via DNA binding. A broad interpretation of the phrase 'transcription factor' - covering cofactors and DNA binding transcription factors - allowed both branches to include this phrase in their labels. As DNA binding transcription also bind proteins as part of their regulation of transcription*, annotation of these activities relied on co-annotation with appropriate terms from each branch (although a small number of terms appear under both branches).

  • I would be very interested to hear of any known exceptions to this.

[Astrid Lægreid] : I don’t know of any exception to this

...

transcription factor activity, protein binding: "Interacting selectively and non-covalently with any protein or protein complex (a complex of two or more proteins that may include other nonprotein molecules), in order to modulate transcription. A protein binding transcription factor may or may not also interact with the template nucleic acid (either DNA or RNA) as well."

-->

transcription regulator activity: "Direct regulation of DNA-templated transcription via selective, non-covalent interaction with elements of the transcription initiation complex or associated proteins. Associated proteins include any protein capable of interacting, directly or indirectly with the transcription initiation complex." comment: This term is a general class that encompasses (DNA binding) transcription factors as well as cofactors.

Questions:

  • Should we add 'direct' to the name to more clearly distinguish from upstream regulators?

[Astrid Lægreid]: If we want to allow any signalling molecule upstream of transcription to be annotated with “transcription regulator activity”, then, yes, “direct” would be helpful It may however not be easy to define what is “direct”, because we, in most cases, do not know whether the transcription regulator interacts directly with one of the components of the “polymerase initiation complex”, or whether one or more proteins are “between” the transcription regulator and one of the proteins in the initiation complex. (See: http://dx.doi.org/10.1016/j.sbi.2017.03.013, http://dx.doi.org/10.1016/j.bbagrm.2016.10.010, http://dx.doi.org/10.1002/anie.201608066)

Maybe there is a way to that all proteins interacting directly or indirectly with the “initiator complex” are “direct transcription regulators”? (as opposed to regulators acting more upstream)

  • Anticipated objection: we don’t actually have TI complex in GO. Perhaps 'basal transcription machinery would be a better term?

[Astrid Lægreid]: Even though I am not an expert in the complex molecular events involved in transcription initiation, elongation and termination, my ‘high level’ understanding is something in this direction

  1. Transcription factor binding specifically to regulatory regions in the gene enable formation of the “RNA polymerase II initiation complex” (this formation proceeds through a number of cascade-like ‘protein recruiting events’, where a cruical stage is the existence of a complex whichin which RNA polymerase II is becomes phosphorylated, activating it’s enzymatic capabilities within a stable “basal transcription complex”

  2. the stable “basal transcription complex” initiates transcription, thereby starting the “transcription elongation” phase, which is to some extent also biologically regulated (I don’t know the mechanisms); I’m not sure, but think that maybe the general view is that the “basal transcription complex” is a kind of ‘core DNA-templated RNA transcription’ machinery that catalyses the RNA polymerization throughout initiation-elongation-termination

  3. when the “basal transcription complex” reaches “termination signal(s)” (certain gene sequences), termination can occur; again, I think that this process is regulated (not sure whether there are several possible termination signals, also not sure which/how many protein factors are involved).

To my knowledge, regulation of step 1, initiation, is regarded to be most decisive for the ‘time’ and ‘quantity’ aspects of transcription. I think that what is regulated is when and how frequent a new ‘transcription cycle’ is started. This, I think, are the assumptions underlying our picture of the transcription factors (which in reality are a kind of “transcription initiation enabling factors”) as the most interesting/decisinve regulatory factors of transcription


transcription cofactor activity: "Interacting selectively and non-covalently with a regulatory transcription factor and also with the basal transcription machinery in order to modulate transcription. Cofactors generally do not bind the template nucleic acid, but rather mediate protein-protein interactions between regulatory transcription factors and the basal transcription machinery."

-->

transcription cofactor activity: "Interacting selectively and non-covalently with a regulatory transcription factor and also with the basal transcription machinery in order to modulate transcription. Cofactor activity does no involve nucleic acid binding, but rather mediates protein-protein interactions between regulatory transcription factors and the basal transcription machinery." is_a: transcription regulator activity

[Astrid Lægreid]: Again, maybe we indeed need to introduce the concept «RNA polymerase initiation complex (or “machinery”) since I think it is very likely that the transcription factors mainly interact with proteins that only help form the “basal transcription complex/machinery” (I feel a bit uncomfortable with the term “machinery”, but maybe it is well established in GO), but in many/most cases don’t interact with the “basal transcription complex” itself, once it is starting on the ‘initiation-elongation-termination’ –“road”

pgaudet commented 5 years ago

From @astridla on July 14, 2017 16:23

RNA polymerase II transcription factor activity, sequence-specific transcription regulatory region DNA binding: "Interacting selectively and non-covalently with a specific sequence of DNA that is part of a regulatory region that controls transcription of that section of the DNA by RNA polymerase II and recruiting another transcription factor to the DNA in order to modulate transcription by RNAP II."

  1. Needs name to more clearly distinguish from: RNA polymerase II transcription factor activity, sequence-specific DNA binding: "Interacting selectively and non-covalently with a specific DNA sequence in order to modulate transcription by RNA polymerase II. The transcription factor may or may not also interact selectively with a protein or macromolecular complex."

  2. Why does it live here:

image

but not under 'RNA polymerase II transcription factor activity, sequence-specific DNA binding'

In my understanding, these two terms pertain to the same functionality

Since I cannot think of any transcription factor that does not enable its function via interacting with other proteins/complexes, I think that there should be no “may or may not” in the descriptive part “The transcription factor may or may not also interact selectively with a protein or macromolecular complex."

pgaudet commented 5 years ago

From @dosumis on July 17, 2017 15:15

Notes on discussion with Astrid:

Tie this branch down to: "direct regulation of transcription initiation"

  1. This = regulation of 'transcription because regulation of transcription initiation' is_a regulation of transcription (or should be). It would also be better to rename the general classes to '(direct) transcription initiation regulator activity' - if this is not too wordy.

  2. Directness: "... via selective, non-covalent interaction with elements of the transcription initiation complex or associated proteins. Associated proteins include any protein capable of interacting, directly or indirectly with the transcription initiation complex."

  3. What is recruiting? Use Astrid's figure (TBA).

pgaudet commented 5 years ago

TRanscription regulator activity branch does not have logical definitions. I will try to add these.

ValWood commented 5 years ago

Re: recruitment, have been wondering how to do this. We have tonnes of these annotated to "localization".

This is as far as I got. https://github.com/pombase/curation/issues/2117 Does it fit with your thinking?

pgaudet commented 5 years ago

To discuss with Colin et al?