geneontology / go-annotation

This repository hosts the tracker for issues pertaining to GO annotations.
BSD 3-Clause "New" or "Revised" License
33 stars 10 forks source link

Working group to finalize transcription decision tree #1463

Closed vanaukenk closed 3 years ago

ukemi commented 7 years ago
rachhuntley commented 7 years ago

I am working on a new version of the flow chart.

krchristie commented 7 years ago

I'm interested in participating in this group.

mlacencio commented 7 years ago

I am also interested in participating in this group!

rachhuntley commented 7 years ago

Hello,

Sorry for the delay. Here is an updated version of the decision tree, taking into account comments that were made at the GOC meeting (relayed to me by Ruth). There are two slides, one with the tree and the other with the annotations. I tried to put everything on one diagram, but it was very full and messy!

I am looking forward to your comments and any further suggestions.

Thanks. Rachael.

Expression_Transcription_Decision_Tree_ForGOC.pptx

pgaudet commented 7 years ago

I'm also interested in participating.

I like the new layout of the guidelines, much simpler :)

A few points:

Thanks, Pascale

On Mon, Jan 16, 2017 at 11:11 AM, Rachael Huntley notifications@github.com wrote:

Hello,

Sorry for the delay. Here is an updated version of the decision tree, taking into account comments that were made at the GOC meeting (relayed to me by Ruth). There are two slides, one with the tree and the other with the annotations. I tried to put everything on one diagram, but it was very full and messy!

I am looking forward to your comments and any further suggestions.

Thanks. Rachael.

Expression_Transcription_Decision_Tree_ForGOC.pptx https://github.com/geneontology/go-annotation/files/707961/Expression_Transcription_Decision_Tree_ForGOC.pptx

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/geneontology/go-annotation/issues/1463#issuecomment-272821243, or mute the thread https://github.com/notifications/unsubscribe-auth/AEj7UGMjCcw0axxzmSnCD0zNWVK6GsRoks5rS0I_gaJpZM4K2jje .

rachhuntley commented 7 years ago

Hello, Sorry for the delay. I've edited the decision tree (attached), with help from Ruth and Barbara, to take into account Pascale's comments and a couple of other omissions we noticed.

Pascale, your last comment:

It's actually no evidence of direct binding + a protein location relative to the DNA leads to a 'contributes to' (annotation #5)

Please do send further comments (negative AND positive), so we know if we're getting close!

Thanks, Rachael. Expression_Transcription_Decision_Tree_ForGOCv2.pptx

pgaudet commented 7 years ago

Hi Rachael,

Didn't there use to be a list describing 'Annotation 1' to 'Annotation 6'?

Thanks, Pascale

On Mon, Feb 27, 2017 at 10:16 AM, Rachael Huntley notifications@github.com wrote:

Hello, Sorry for the delay. I've edited the decision tree (attached), with help from Ruth and Barbara, to take into account Pascale's comments and a couple of other omissions we noticed.

Pascale, your last comment:

It's actually no evidence of direct binding + a protein location relative to the DNA leads to a 'contributes to' (annotation #5 https://github.com/geneontology/go-annotation/issues/5)

Please do send further comments (negative AND positive), so we know if we're getting close!

Thanks, Rachael. Expression_Transcription_Decision_Tree_ForGOCv2.pptx https://github.com/geneontology/go-annotation/files/804055/Expression_Transcription_Decision_Tree_ForGOCv2.pptx

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/geneontology/go-annotation/issues/1463#issuecomment-282748786, or mute the thread https://github.com/notifications/unsubscribe-auth/AEj7UHVWaJdPFIreynSdwDp05nEDqYqoks5rgujHgaJpZM4K2jje .

rachhuntley commented 7 years ago

It's on slide 2 of the powerpoint file. Has it downloaded correctly for you?

pgaudet commented 7 years ago

OK, got it now, thanks !

krchristie commented 7 years ago

Hi,

I don't understand what this comment in the lower right refers to since I don't see a "#" in any of the flow chart boxes.

♯The evidence can be experimental (e.g. an EMSA done with a purified protein), or it can be
an author statement (e.g. when a protein is known to bind a specific DNA region), or it can be 
inferred (e.g. from an in silico prediction to bind a specific DNA region). 

More importantly, I do not agree that you can use non-experimental evidence of the DNA binding and still use an experimental evidence code for the annotations this flow chart is meant to provide guidance for. Am I misunderstanding what this is meant to say? If so, please clarify.

thanks,

-Karen

rachhuntley commented 7 years ago

Hi Karen,

Are you looking at the most recent version? It's linked in my comment above, called Expression_Transcription_Decision_Tree_ForGOCv2.pptx.

I recall the comment you cite from a much earlier version of the decision tree, but in the most recent version we don't address evidence codes at all, as I think it would get too complicated for a flow chart.

Rachael.

krchristie commented 7 years ago

Hi Rachael,

I clicked on the link from your comment in the email and the old version opened up, but I have the new one and will take a look.

thanks,

Karen

vanaukenk commented 7 years ago

Hi,

@rachhuntley @RLovering @BarbaraCzub

Would one of you be able to give an update on the progress of the Expression/Transcription decision tree on Tuesday's (March 14th) annotation call?

Thanks, --Kimberly

BarbaraCzub commented 7 years ago

Hi Kimberly,

I'm afraid next week will not be possible for us. Would the following call be a good alternative?

Thanks, Barbara

@vanaukenk @rachhuntley @RLovering

rachhuntley commented 7 years ago

Hi Kimberly,

The update is that we have revised the decision tree, which is attached above, and we would like feedback on it from this working group. When the working group are happy with it, then we will announce it at a future annotation call and get it added to the website.

As Barbara said, we are all unavailable on Tuesday, so if anyone has any questions or comments the best place to put them is on this thread.

Thanks, Rachael.

vanaukenk commented 7 years ago

Okay, thanks @BarbaraCzub and @rachhuntley

rachhuntley commented 7 years ago

Hello,

Could everyone who has expressed an interest in this working group take a look at the latest version of the decision tree (linked from my post above on 27th February, entitled Expression_Transcription_Decision_Tree_ForGOCv2.pptx) and send me any comments or suggestions. We would like to wrap this up and make some official guidelines.

Thanks, Rachael.

mlacencio commented 7 years ago

Hi @rachhuntley !

I would like to take a look at this new version of the decision tree, but I will be able to do it only on April 21 (I will be working with Martin Krallinger at the CNIO, Madrid, until April 19). Would it be possible to wrap this discussion up after April 21 (suggestion: April 28)?

Best,

Marcio

rachhuntley commented 7 years ago

Hi @mlacencio Marcio, Yes, that should be fine. We are just starting the Easter break here now, so will be off for a few days anyway.

Best wishes, Rachael.

mlacencio commented 7 years ago

Hi all,

Please find below some questions and suggestions on the decision tree!

  1. QUESTIONS

1.1. Should this decision tree also concern other types of RNA polymerases?

1.2. How should curators interpret the green boxes? For example, let's suppose a curator finds a reporter assay plus EMSA with purified protein and regulatory region. In this case, should the curator use any annotation, a suitable combination of them or all of them? I believe that you mean that the curator must apply all annotations for each condition, e.g. for the above-mentioned condition, curator must annotate the protein with terms 1, 2, 3 and 6. Is that so?

1.3. If I understood well, are you suggesting the utilization of the qualifier "contributes_to" along with DNA binding terms to annotate ChIP?

  1. SUGGESTIONS (IF ONLY RNA POL II IS CONSIDERED)

2.1. Replace the following phrase of title, "regulating transcription from an RNA polymerase II promoter", with "regulating transcription from RNA polymerase II"

Recall that we are dealing with any type of regulatory region (promoter and enhancer), although I am aware that we have terms in which we can find the phrase "RNA polymerase II promoter" (e.g., GO:0006357)

2.2. Replace "enhancer assay" with "reporter assay"

2.3. Replace "EMSA with nuclear extract and regulatory region AND competition experiments" with "EMSA with nuclear extract containing overexpressed recombinant DbTF and regulatory region or EMSA supershift with nuclear extract and regulatory region"

EMSA with nuclear extract AND competition experiments are not so reliable because it is not possible to ascertain the identity of DbTF even with competition between wild-type and mutated regulatory region. This experimental setup can become more reliable if the regulatory region is known to be bound ONLY by the DbTF of interest and, in fact, this is a very unlikely event.

On the other hand, the DbTF of interest can be identified in EMSA with nuclear extract and overexpression of DbTF or EMSA supershift regardless of competition experiments. But, of course, competition experiments can increase the confidence of these types of EMSA.

2.4. Annotation 1 should be GO:000981 and children instead of only GO:0000981. In this fashion, the curator is free to use any children terms that are more suitable to his/her case. Moreover, it should be possible to have GO:0000981 and children with "contributes_to".

2.5. Annotation 2 should be GO:0006357 and children instead of only GO:0006357. In this fashion, curator is free to use any children terms that are more suitable to his/her case.

2.6. Annotations 5 and 6 should be GO:0043565 and children instead of GO:0003677. Maybe GO:0003677 is a too broad term as we are dealing specifically with at least sequence-specific DNA binding.

Waiting for reply!

rachhuntley commented 7 years ago

Hi Marcio, Thanks very much for your useful comments. I've discussed these with Ruth @RLovering and our responses are inline. I've also updated the tree and table with some of your suggestions (attached below). Let me know if you (dis)agree with any of this.

Rachael.

Expression_Transcription_Decision_Tree_ForGOCv3.pptx

QUESTIONS 1.1. Should this decision tree also concern other types of RNA polymerases?

No, we agreed at the beginning that this would make the tree to complex. More decision trees can be made for the other RNA polymerases.

1.2. How should curators interpret the green boxes? For example, let's suppose a curator finds a reporter assay plus EMSA with purified protein and regulatory region. In this case, should the curator use any annotation, a suitable combination of them or all of them? I believe that you mean that the curator must apply all annotations for each condition, e.g. for the above-mentioned condition, curator must annotate the protein with terms 1, 2, 3 and 6. Is that so?

I have added pluses to the green boxes, which should hopefully make it clear that all of them should be annotated.

1.3. If I understood well, are you suggesting the utilization of the qualifier "contributes_to" along with DNA binding terms to annotate ChIP?

Yes, that is correct, for annotation 5 - and is stated in the annotation table.

SUGGESTIONS (IF ONLY RNA POL II IS CONSIDERED)

2.1. Replace the following phrase of title, "regulating transcription from an RNA polymerase II promoter", with "regulating transcription from RNA polymerase II"

Recall that we are dealing with any type of regulatory region (promoter and enhancer), although I am aware that we have terms in which we can find the phrase "RNA polymerase II promoter" (e.g., GO:0006357)

Agree that this title is not accurate, I spoke to Ruth and we felt your suggestion didn't quite sound right, so we have suggested changing the phrase to "regulating transcription by RNA polymerase II", what do you think?

2.2. Replace "enhancer assay" with "reporter assay"

Consulting with Ruth, we originally changed this from reporter assay because this is too general. Reporter assays are used in many experiments, not necessarily for showing regulation of transcription (miRNA target gene silencing validation comes to mind). In any case, this is just an example of an assay that may be used. I have now added asterisks to the sentence describing where to find further information on other types of assays, including the TF guidelines paper.

2.3. Replace "EMSA with nuclear extract and regulatory region AND competition experiments" with "EMSA with nuclear extract containing overexpressed recombinant DbTF and regulatory region or EMSA supershift with nuclear extract and regulatory region"

EMSA with nuclear extract AND competition experiments are not so reliable because it is not possible to ascertain the identity of DbTF even with competition between wild-type and mutated regulatory region. This experimental setup can become more reliable if the regulatory region is known to be bound ONLY by the DbTF of interest and, in fact, this is a very unlikely event.

On the other hand, the DbTF of interest can be identified in EMSA with nuclear extract and overexpression of DbTF or EMSA supershift regardless of competition experiments. But, of course, competition experiments can increase the confidence of these types of EMSA.

After some thought and discussions with Ruth, we decided that this is quite a complex thing to get over in a space limited image. We decided to remove this statement, because this is only an example of demonstrating sequence-specific DNA binding, and added the asterisk to the further information on other types of assays demonstrating DNA binding. Is this reasonable?

2.4. Annotation 1 should be GO:000981 and children instead of only GO:0000981. In this fashion, the curator is free to use any children terms that are more suitable to his/her case. Moreover, it should be possible to have GO:0000981 and children with "contributes_to".

Agree, and added to the annotation table

2.5. Annotation 2 should be GO:0006357 and children instead of only GO:0006357. In this fashion, curator is free to use any children terms that are more suitable to his/her case.

Agree, and added to the annotation table

2.6. Annotations 5 and 6 should be GO:0043565 and children instead of GO:0003677. Maybe GO:0003677 is a too broad term as we are dealing specifically with at least sequence-specific DNA binding.

Agree about annotation 6, and added to the annotation table. However, we're not sure about annotation 5 - could a non-sequence specific DNA binding protein (i.e. a general DNA binding protein) be identified in an EMSA or ChIP assay? If so, then this can't be sequence-specific.

rachhuntley commented 7 years ago

@krchristie @pgaudet could you please take a look at the comments above from Marcio and myself and the new version of the decision tree (v3 linked above) and let me know whether you have any further comments/suggestions? Ruth would like to present our agreed version at the GOC meeting in June.

Thanks, Rachael.

mlacencio commented 7 years ago

Hi @rachhuntley and @RLovering

Thank you for all comments. My turn now! Please see my comments inline!

QUESTIONS 1.1. Should this decision tree also concern other types of RNA polymerases?

No, we agreed at the beginning that this would make the tree to complex. More decision trees can be made for the other RNA polymerases.

OK. I agree.

1.2. How should curators interpret the green boxes? For example, let's suppose a curator finds a reporter assay plus EMSA with purified protein and regulatory region. In this case, should the curator use any annotation, a suitable combination of them or all of them? I believe that you mean that the curator must apply all annotations for each condition, e.g. for the above-mentioned condition, curator must annotate the protein with terms 1, 2, 3 and 6. Is that so?

I have added pluses to the green boxes, which should hopefully make it clear that all of them should be annotated.

OK. I think now it is clearer than before

1.3. If I understood well, are you suggesting the utilization of the qualifier "contributes_to" along with DNA binding terms to annotate ChIP?

Yes, that is correct, for annotation 5 - and is stated in the annotation table.

I suppose, however, that we can only apply this rule only after some type of approval by GOC staff in June. Am I correct?

SUGGESTIONS (IF ONLY RNA POL II IS CONSIDERED)

2.1. Replace the following phrase of title, "regulating transcription from an RNA polymerase II promoter", with "regulating transcription from RNA polymerase II"

Recall that we are dealing with any type of regulatory region (promoter and enhancer), although I am aware that we have terms in which we can find the phrase "RNA polymerase II promoter" (e.g., GO:0006357)

Agree that this title is not accurate, I spoke to Ruth and we felt your suggestion didn't quite sound right, so we have suggested changing the phrase to "regulating transcription by RNA polymerase II", what do you think?

Good suggestion! Much better now!

2.2. Replace "enhancer assay" with "reporter assay"

Consulting with Ruth, we originally changed this from reporter assay because this is too general. Reporter assays are used in many experiments, not necessarily for showing regulation of transcription (miRNA target gene silencing validation comes to mind). In any case, this is just an example of an assay that may be used. I have now added asterisks to the sentence describing where to find further information on other types of assays, including the TF guidelines paper.

In fact a reporter assay is too general. However, by far the reporter assay has been the most popular method used to show regulation of transcription. But although I think that an enhancer assay is too specific as an example, this is just an example and you provide links for further information on other assays

2.3. Replace "EMSA with nuclear extract and regulatory region AND competition experiments" with "EMSA with nuclear extract containing overexpressed recombinant DbTF and regulatory region or EMSA supershift with nuclear extract and regulatory region"

EMSA with nuclear extract AND competition experiments are not so reliable because it is not possible to ascertain the identity of DbTF even with competition between wild-type and mutated regulatory region. This experimental setup can become more reliable if the regulatory region is known to be bound ONLY by the DbTF of interest and, in fact, this is a very unlikely event.

On the other hand, the DbTF of interest can be identified in EMSA with nuclear extract and overexpression of DbTF or EMSA supershift regardless of competition experiments. But, of course, competition experiments can increase the confidence of these types of EMSA.

After some thought and discussions with Ruth, we decided that this is quite a complex thing to get over in a space limited image. We decided to remove this statement, because this is only an example of demonstrating sequence-specific DNA binding, and added the asterisk to the further information on other types of assays demonstrating DNA binding. Is this reasonable?

Yes, this is reasonable! Agree!

2.4. Annotation 1 should be GO:000981 and children instead of only GO:0000981. In this fashion, the curator is free to use any children terms that are more suitable to his/her case. Moreover, it should be possible to have GO:0000981 and children with "contributes_to".

Agree, and added to the annotation table

What about the qualifier "contributes_to" to GO:000981?

2.5. Annotation 2 should be GO:0006357 and children instead of only GO:0006357. In this fashion, curator is free to use any children terms that are more suitable to his/her case.

Agree, and added to the annotation table

Ok

2.6. Annotations 5 and 6 should be GO:0043565 and children instead of GO:0003677. Maybe GO:0003677 is a too broad term as we are dealing specifically with at least sequence-specific DNA binding.

Agree about annotation 6, and added to the annotation table. However, we're not sure about annotation 5 - could a non-sequence specific DNA binding protein (i.e. a general DNA binding protein) be identified in an EMSA or ChIP assay? If so, then this can't be sequence-specific.

Theoretically speaking, non-specific DNA binding proteins (in other words, histones and proteins involved in replication and repair) can be identified by EMSA or ChIP. But in practice we have observed that these methods have not been used for identifying non-specific DNA binding proteins (although I am aware that ChIP has been used to identify histones). So I believe that it is safe to consider annotation 5 as GO:0043565 and qualifier "contributes_to"

rachhuntley commented 7 years ago

Hello all, After some discussions between Marcio and Ruth and then Ruth and myself, we have made some updates to the decision tree. The major change is the removal of the third branch concerning DNA binding assays. Ruth and I felt that this part of the decision tree was not needed in order to answer the question that the tree was aiming to answer, i.e. should the curator be using regulation of gene expression or regulation of transcription; this was the brief for this task and this is addressed in the first two sections of the tree.

We also felt that the third branch would benefit from being expanded further, as a separate project, to provide guidance on capturing sequence specific DNA binding data, possibly with the inclusion of more information about the various experimental methods associated with the generation of this data, e.g. ChIP, EMSA.

There are also some additions to the annotation table regarding how to annotate a gene product that is part of a complex that regulates transcription.

I have attached the latest version here, please do comment on this so this can be discussed with the whole Consortium at the June meeting in Oregon.

Thanks, Rachael. Expression_Transcription_Decision_Tree_ForGOCv4.pptx

mlacencio commented 7 years ago

Hi @rachhuntley and @rachhuntley!

Sorry for my delay. I have just found out that the GOC meeting starts on June 1! I hope that some last-minute observations can be discussed.

  1. If we only find assays demonstrating mRNA synthesis (e.g., nuclear run-on and enhancer/promoter assay) and NO evidence for DNA binding activity, what is your suggestion? Here we usually annotate with GO:0006357 or child term. In your decision tree, this would be the "Annotation 2"

  2. Regarding ChIP or other "weak" evidences for DNA binding activity: wouldn't it possible to consider ChIP as valid experiment to assign "Annotation 1 + Annotation 2 + Annotation 3 + Annotation 6" based on prior knowledge on the binding activity of TF? For example, if a TF is already known to have some DNA binding activity, could we annotate it with "Annotation 1 + Annotation 2 + Annotation 3 + Annotation 6"?

I think that I do not have anything more to comment.

Best regards,

Marcio

rachhuntley commented 7 years ago

Hi Marcio, Thanks for your comments, my responses are below. Ruth @RLovering will be presenting this at the GOC, so I've copied her in, she may have other responses.

  1. If we only find assays demonstrating mRNA synthesis (e.g., nuclear run-on and enhancer/promoter assay) and NO evidence for DNA binding activity, what is your suggestion? Here we usually annotate with GO:0006357 or child term. In your decision tree, this would be the "Annotation 2"

Yes, annotation 2 is correct, and if you follow the left-hand side of the tree down through the "no" options, you get to annotation 2.

  1. Regarding ChIP or other "weak" evidences for DNA binding activity: wouldn't it possible to consider ChIP as valid experiment to assign "Annotation 1 + Annotation 2 + Annotation 3 + Annotation 6" based on prior knowledge on the binding activity of TF? For example, if a TF is already known to have some DNA binding activity, could we annotate it with "Annotation 1 + Annotation 2 + Annotation 3 + Annotation 6"?

I don't see how ChIP, which is a source of evidence for DNA binding, together with a prior knowledge of DNA binding, would lead to an annotation describing a regulation of transcription? The main aim of this decision tree is to cover those cases where you don't know if the protein you're annotating is a TF or non-TF, i.e. should you be using regulation of transcription or regulation of gene expression. If you have prior knowledge of evidence that the protein is a TF, then you probably wouldn't be using this tree. I do think that development of the third branch of the tree that we removed would be useful and maybe your thoughts on ChIP and prior knowledge/author intent could be expanded on there?

Are you going to the GOC meeting?

Best wishes, Rachael.

mlacencio commented 7 years ago

Hi Rachael,

Thank you for the quick reply!

Please see below some comments to your responses.

Response to comment 1: You're right. Sorry! I had forgotten this left-hand site of the trees.

Response to comment 2: You're right. If this tree is intended to be used to annotate candidate TFs, then maybe ChIP can not lead to an annotation describing a regulation of transcription. Maybe we can discuss ChIP-related issues during the development of the third branch.

Anyway, according to the inference rules in our curation guidelines (please see attached image), the combination of GO:0043565 AND any other BP term (GO:0006357, GO:0045944 or GO:0000122) results in GO:0000981, GO:0001227 or GO:0001228.

Let's consider that both of our rules are valid and let's suppose we have a TF1. If we find ChIP as evidence for DNA binding activity and promoter assay as evidence for transcription activity, then TF1 would be annotated with the following:

Annotation 1: GO:0000981 with qualifier "contributes_to" (GO:0043565 AND GO:0006457) Annotation 2: Promoter assay -> GO:0006457 Annotation 3: GO:0000790 Annotation 5: ChIP -> GO:0043565 with qualifier "contributes_to"

I am not going to GOC meeting now.

Best regards,

Marcio

inference_rules.pdf

krchristie commented 7 years ago

Hi,

Sorry so late to comment on the newer version. With the push to finish writing up the cilia ontology and annotation papers, I've been really swamped the last several months, and this just completely dropped off my radar.

I still think this comment is inappropriate:

Therefore, signaling molecules should not be annotated to regulation of transcription, but to
 regulation of gene expression.

I have no problem with saying that this decision tree is not intended for annotation of signaling molecules, but it does not logically follow that signaling molecules should therefore only be annotated to 'regulation of gene expression'. What a signaling molecule should be annotated to would depend on what is shown, which could be regulation of transcription. This decision tree just isn't the right tool to guide the choice of annotation, including that it shouldn't be suggesting a very general term without any consideration of what is actually known.

Could the Note please be changed to something like

This decision tree is intended ONLY for DNA binding transcription factors. It is NOT 
appropriate for annotation of gene products that are known/found not to bind DNA, or are 
constitutively bound to DNA (e.g. histones), or are not localized near a DNA regulatory region 
such as signaling molecules. Do NOT use this decision tree for the annotation of signaling 
molecules.
rachhuntley commented 7 years ago

Hi Karen,

Thanks for your comment. I'm happy to change the comment, but I disagree with the first sentence "This decision tree is intended ONLY for DNA binding transcription factors." The tree is intended to decide whether to annotate to a transcription term vs. a gene expression term in an attempt to prevent the over-usage of regulation of transcription terms when only a change in level of mRNA is shown. This is why we have one branch leading to regulation of gene expression, which you would not annotate to if you are intending to annotate a DNA binding transcription factor.

How about this comment instead: Use of this decision tree is NOT appropriate for annotation of gene products that are known/found not to bind DNA, or are constitutively bound to DNA (e.g. histones), or are not localized near a DNA regulatory region such as signaling molecules. Do NOT use this decision tree for the annotation of signaling molecules.

I won't be at the GOC meeting; Ruth @RLovering is presenting this, so hopefully she will have time to see this before her presentation.

krchristie commented 7 years ago

Hi Rachael,

I really like the comment you made about the purpose of the tree. What do you think about stating it explicitly in the note, something like this:

This decision tree is intended to guide when the evidence is strong enough to support annotation to a transcription factor term, or when annotation to gene expression should be used instead. Use of this decision tree is NOT appropriate for annotation of gene products that are known/found not to bind DNA, or are constitutively bound to DNA (e.g. histones), or are not localized near a DNA regulatory region such as signaling molecules. Do NOT use this decision tree for the annotation of signaling molecules.

krchristie commented 7 years ago

Again, apologies that I didn't get to this earlier. I have been feeling really swamped lately ...

I have two issues.

  1. don't know that I agree with Annotation Type 5 (as per version 4). It doesn't seem to me that this kind of evidence means that you know that the gene product contributes to DNA binding. It might be localized to DNA entirely through interaction with something else that actually binds DNA.

This seems to be the type of evidence about which the Tripathi et al. paper said this:

We have chosen not to rely on assays measuring in vivo TF–DNA interaction (e.g. the Chromatin 
ImmunoPrecipitation assay) because it is not possible to ascertain in these assays that the TF in 
question actually binds directly to DNA, or whether some other component in the in vivo system 
mediates the TF–DNA association.

Regarding the other two annotations types, 2 & 3, that are present in the same boxes with type 5, type 3 is obviously fine since that is what the ChIP or EMSA shows. I think I also don't mind using this evidence to make the distinction between "regulation of gene expression" and "regulation of transcription from RNAP II promoter" since this does seem to be reasonable evidence that the gene product is exerting its effect by being present somewhere in the gene's regulatory region and thus most likely regulating transcription, but I just don't see how this can be used to make any kind of "DNA binding" annotation.

  1. The Tripathi paper did a really nice job laying out guidelines that helped people select the most specific term appropriate for the annotation of transcription factors. However, this guide is ignoring all of the more specific terms and telling people to use only the very most general terms. I am concerned that people will just use this without taking the time to refer to the text of the transcription guidelines paper and thus we'll lose the opportunity to capture more specificity when it is available if people only use this guide. So, I'd really like to see this decision tree also include part of the guidance from the Tripathi paper to select the appropriate transcription factor term.

To that end, I've attached a revised version with my suggestions: Expression_Transcription_Decision_Tree_ForGOCv4-krc.pptx

The summary of changes is:

RLovering commented 7 years ago

Hi Karen

Thanks for revising these slides. I think this will be very useful. However, I think the GOC presentation has to get an agreement on the decision tree.

Then the guidelines that go on the GOC site can have a lot more information. So the key question is do you agree with the decision tree? Are you happy with the proposed annotations that can be created based on the experimental data?

Ruth

krchristie commented 7 years ago

I do not support the current decision tree (v4) in two ways:

  1. I think annotation type 5 (numbering from your versions) is suggesting making "DNA binding" annotations from types of experiments that were specifically discussed in the Tripathi paper as not strong enough to support DNA binding versus localized to chromatin somehow.

  2. I do not agree with the way that the decision tree is suggesting only the most general of the GO terms for RNAP II transcription with specific IDs because I think that people will look at that and think this is all I need, I don't need to look at anything else. It's one thing to say that we can have all the detailed information on the web page, but if we are handing out a "transcription decision tree" that indicates to use the most general terms relevant to RNAP II TFs, people will just use it without looking farther. I have no problem with leaving all of the evidence stuff out of this guide and referring people to the GOC website and the Tripathi, but I really have a problem with the way that this guide is suggesting only the top level terms that are relevant to the the TF annotations.

For this reason, I do not support at all the way that the table on the second page lists specific GO IDs. I do see that Annotation types 1 & 2 do have parenthetical expressions saying "(or child term)", but I just don't think this is very obvious. My initial goal was to just to clarify the second page of your guide, but when I tried to do it, it seemed that it fit well to refer to the table from the Tripathi table. Incorporating it into the decision tree gives people a unified place to see all of this info together in a systematic logical framework, that supports both the goal of preventing annotations to "regulation of transcription" when only "regulation of gene expression" is supported, and also also helps guide selection of appropriate TF terms when "regulation of transcription" or more specific is supported.

thanks,

Karen

mlacencio commented 7 years ago

Dear Karen,

I've continued the annotation of DbTFs by Tripathi and colleagues here at NTNU. Thus I am also directly interested in the improvement of the curation guidelines, including this decision tree. For now I would like to comment on Annotation 5. Later on I will comment on other issues raised by you.

Annotation 5: Our thoughts about ChIP have slightly changed since the publication of the Tripathi paper. We do believe that ChIP could be accepted as a limited evidence of DNA binding activity for a given candidate DbTF, but ONLY IF this DbTF is known to contain at least one DNA binding domain (DBD). For this purpose, we use as reference the TFClass database (tfclass.bioinf.med.uni-goettingen.de/).

To reinforce that this is a limited evidence, we support the mandatory utilization of the qualifier "contributes_to" along with the term GO:00043565. Although it is not possible to demonstrate that the immunoprecipitated protein is directly bound to DNA in a ChIP assay, it is plausible to think that a DBD-containing protein present in the immunoprecipitated complex will at least contribute to the binding.

In fact, even for EMSAs with in vitro translated proteins, we have used the term GO:0000981 or children with the qualifier "contributes_to" when we find as the only available DNA binding evidence a heterodimer complex comprising our DbTF of interest bound to DNA. Is this scenario so different from the immunoprecipitated complexes in ChIP?

In summary, I think that previous knowledge here (e.g., presence in TFClass database) should be taken into account, at least for human, mouse and rat. Currently, for these species, it is unlikely that a candidate TF is not present in the TFClass database.

Cheers!

Marcio

mlacencio commented 7 years ago

Dear Karen,

Continuing the previous comment:

More about annotation 5 and ChIP

  1. I would like to emphasize that we are considering for annotation only ChIP essays in which the DNA products are analyzed using qPCR with specific primers. Please find below some papers and associated figures from which we collected ChIP-based evidence for annotating the DbTFs of interest (note that we have not submitted these annotations to GO): HOXC6, 22896703, Fig 3D TOX3, 23447579, Fig 5B PURA, 17641060, Fig 5

  2. Just to complement the previous comment about previous knowledge: even if the proteins are from species other than mouse, human and rat, I believe that the presence of a DNA binding domain (DBD) known or suggested to be involved in site-specific DNA binding could reinforce the fact that a DBD-containing protein in a ChIP assay contributes to DNA binding activity. Maybe, in this case, besides the appropriate GO term with evidence "IDA" and qualifier "contributes_to", we could also use the appropriate GO term with evidence "ISS". But I think that this is something to be discussed in the future.

Annotation Table I have really appreciated your annotation table, i.e. Table 2 in your decision tree version. But, of course, as I am defending ChIP here, the only modification that I'd propose is the reconsideration of Annotation 5.

Cheers,

Marcio

krchristie commented 7 years ago

Hi Marcio,

Annotation 5 and ChIP

We discussed this at the GOC meeting in Corvalis (which I attended remotely) and I am concerned about using ChIP evidence that really isn't strong enough to support the statement that the gene product is involved in DNA binding and calling it IDA evidence. I don't think that using the "contributes to" qualifier changes this situation since what the "contributes to" qualifier means is that you know the specific function happens, but the function happens in the context of a complex of which the gene product is part. Using the "contributes to" qualifier for ChIP evidence that isn't really strong enough to distinguish between whether the gene product is localized to that region of the chromatin due to binding DNA directly or to binding some other protein that localizes to the chromatin is not the intended use of the "contributes to" qualifier because you still don't know that the appropriate function is actually a form of "DNA binding".

At Corvalis, we discussed the fact that if one of the main reasons you want to say that the gene product is a transcription factor is that you know that it has a DNA binding domain, then you should label this as what it is, some sort of sequence similarity evidence. Not all gene products with putative DNA binding domains really bind DNA. I realize this is the minority of cases, but if we are careless with our evidence, then it will be impossible to distinguish the ones that don't really bind DNA from the ones that do. So, I'd be OK with reconsidering some form of annotation 5, provided that we are clear about what the evidence really is and appropriate use, or not, of the "contributes to" qualifier".

Annotation Table

I'm glad you like the incorporation of the annotation table based on Sushil's paper here. I thought he did a really nice job with this and that the combination of the decision tree with an updated version of his table makes a compact yet thorough annotation guide.