geneontology / go-annotation

This repository hosts the tracker for issues pertaining to GO annotations.
BSD 3-Clause "New" or "Revised" License
35 stars 10 forks source link

MODs: Review annotations to DNA-binding transcription factor activity (GO:0003700) should NOT be also annotated to catalytic activity (GO:0003824) #2207

Closed pgaudet closed 3 years ago

pgaudet commented 5 years ago

https://docs.google.com/spreadsheets/d/12oW_6KAhCfoF5y80QrzCvEMg9h98XX4-rmdh_k5P9Q4/edit#gid=0

pgaudet commented 5 years ago

Hello,

This spreadsheet has proteins annotated to both DNA-binding transcription factor activity (GO:0003700) (or a child) and to catalytic activity (GO:0003824) (or a child). Most proteins should not have both functions - if there is an enzymatic activity, it is likely that the protein is a co-factor.

Would everyone please review their annotations to check whether the annotations are OK ? In this case I cannot tell whether the transcription factor or the catalytic activity is incorrect - although I tried to indicate.

Note that in some cases the annotations may be coming from PAINT - please put a note and ignore if that's the case.

Species Number of annotations
Homo sapiens - Complex 3
Dictyostelium discoideum 4
Drosophila melanogaster 14
Mus musculus 26
Schizosaccharomyces pombe 13
Rattus norvegicus 15
Saccharomyces cerevisiae S288C 16
Arabidopsis thaliana 57
Caenorhabditis elegans 7
Danio rerio 17

Thanks,

Pascale

annotations for

pfey03 commented 5 years ago

I cannot edit the google sheet, sent request but wanted to get it done. Inspected all, and Q55ER4 - rbbB has IBA only and the Taf subunits have mostly IBA and some historic ISS from SGD. Left them unchanged. Update Google sheet when I have access

ValWood commented 5 years ago

I flagged the pombe ones as "PAINT". I presume they are, we never had any annotation in the intersection.

bmeldal commented 5 years ago

These complexes have DNA-binding and catalytic subunits so both annotation are legit.

ValWood commented 5 years ago

Hi @bmeldal which gene products are these. When I make the rule I will make exceptions for these. Cheers. Val

bmeldal commented 5 years ago

The GPs are specific, either DNA-binding or enzymatic, but I annotate to the complex and the complex has both activities.

pgaudet commented 5 years ago

Sorry everyone, the spreadsheet is now editable.

Thanks, Pascale

ValWood commented 5 years ago

Hi @bmeldal which Complex(s) ? Is the DNA binding to a specific motif ? It would be good to have examples for the working group.

hattrill commented 5 years ago

We have a mixed bag.

Some are from PAINT/InterPro2GO (probably becuase some of the DNA-binding domains that are in TFs and chromatin modifying complexes. For example, the A-T Rich Interaction Domain (ARID) is found in transcription and chromatin regulators. A subset of ARID family proteins binds DNA specifically at AT-rich sites; while the others bind DNA non-specifically.

Some are contributes_to annotations from factors associated with modifying complexes and TF complexes. I can't remember where we got to with the "contributes_to" policy.

pgaudet commented 5 years ago

Thanks Helen. We tried to make it clear that specific DNA binding does not mean the function should be DNA-binding transcription factor activity. For the ARID proteins perhaps co-factor is more appropriate.

Pascale

bmeldal commented 5 years ago

TFIIF - CPX-79 GTF2F1: DNA-binding GTF2F2: helicase activity (extracted from UniProt)

TFIID - CPX-915/930 (variants of each other): Def: General transcription factor complex that acts as the primary core promoter recognition factor in the initiation of RNA polymerase II (Pol II)-dependent transcription. The TBP subunit of TFIID recognizes and binds to the TATA box (if present), while TAF1 and TAF2 interact with the Initiator element (Inr), and TAF1 and the TAF6-TAF9 module recognizes the downstream core promoter element (DPE). Other core promoter elements, such as the motif ten element (MTE), may also be involved. Binding of the general transcription factor complex TFIIA (CPX-519) enhances binding of TFIID to the core promoter and nucleates pre-initiation complex (PIC) assembly. Following recruitment of TFIIA to TFIID, TFIIB, TFIIF (CPX-79), Pol II, TFIIE and TFIIH are successively assembled at the core promoter, allowing the PIC to initiate Pol II transcription. While TFIID is essential for transcription and its post-mitotic reinitiation, the loss of one or more subunits does not harm ongoing transcription during any given cell cycle. TFIID promoter binding appears to be regulated by histone modifications: TAF1 bromodomains (1361-1617 aa, IPR001487) bind the modified histone tails of acetylated H4K16, H4K5/K12 and H4K8/K16. TAF1 also appears to exhibit histone acetyltransferase activity towards histones H3 and H4. TAF3 and the PHD domains of other TFIID subunits bind modified histone tails carrying trimethylated H3K4 in combination with acetylated H3K9 and H3K14. TAF1 phosphorylates TP53 (P04637) on Thr-55, leading to TP53 degradation and G1 cell cycle progression. Spermatocytes contain variants of TAF-containing complexes, including the TAF4B variant (CPX-930).

bmeldal commented 5 years ago

Add Helen/ARID proteins: I annotated the BAF complexes and wouldn't add TF activity to the chromatin remodellers :)

pgaudet commented 5 years ago

@bmeldal, GO:0003700 is not appropriate for TFIIF - CPX-79. You should annotate to GO:0016251 RNA polymerase II general transcription initiation factor activity and GO:0043565 sequence-specific DNA binding.

OK ?

Thanks, Pascale

hattrill commented 5 years ago

@bmeldal think that you misunderstand - we are not annotating TF activity to chrom remods - saying that the propagation from PAINT/InterPro2GO may come from situations where a particular DNA binding domain is some times found TFs and sometimes in chrom remods.

pgaudet commented 5 years ago

We definitely need to review PAINT annotations. I have done jmj, I'll check ARID.

Thanks, Pascale

bmeldal commented 5 years ago

@bmeldal think that you misunderstand - we are not annotating TF activity to chrom remods - saying that the propagation from PAINT/InterPro2GO may come from situations where a particular DNA binding domain is some times found TFs and sometimes in chrom remods.

I was just confirming that I wouldn't make the annotations and agreeing with Helen that it must come from the InterPro or PAINT assertions.

hattrill commented 5 years ago

Sorry @bmeldal misunderstood intentions!

@pgaudet - not sure if these have been reviewed via PAINT - didn't see them in our set, but MADF-BESS and SANT-MYB domain proteins are also spread between TFs and chrom modifiers.

bmeldal commented 5 years ago

TFIIF currently annotated to:

  1. GO:0000977 RNA polymerase II regulatory region sequence-specific DNA binding Child of GO:0043565 sequence-specific DNA binding --> keep

  2. GO:0001228 DNA-binding transcription activator activity, RNA polymerase II-specific child of GO:0003700 DNA-binding transcription factor activity --> will delete

  3. GO:0016251 RNA polymerase II general transcription initiation factor activity --> keep

  4. GO:0003711 transcription elongation regulator activity --> keep

  5. GO:0003678 DNA helicase activity --> keep

How does that sound?

ValWood commented 5 years ago

Does TFIIF binds to DNA in a sequence specific manner though? Doesn't the sequence specificity come from the "GO:0003700 DNA-binding transcription factor activity" ? Doesn't it It binds to DNA, but none specifically?

pgaudet commented 5 years ago

@bmeldal what you propose seems OK to me.

and also seems to address @ValWood 's comment - doesn't it ?

bmeldal commented 5 years ago

Removed GO:0001228 (also from mouse TFIIF). TF activity captured by GO:0016251 and DNA binding captured by GO:0000977.

tberardini commented 5 years ago

@lreiser and @ebakker2: I split the rows among the three of us.

srengel commented 5 years ago

2 SGD rows are legit. the others are from PAINT so i ignored.

pgaudet commented 5 years ago

@srengel Can you check again whether SGD:S000000971 | RPH1 really is a 'nucleic acid binding transcription factor activity' (note that this term has been merged into GO:0003700 DNA binding transcription factor activity).

I think 'GO:0003712 transcription coregulator activity' would be more appropriate.

Thanks for confirming -

Pascale

pfey03 commented 5 years ago

I now marked our annotations as unchanged. We only have some still valid ISS, all have a lot of IBA / PAINT, and in once case it's only the PAINT that created the enzymatic activity. The other, three, TAF subunits, I left the ISS and wait until IBA are gone or ISS WITH are gone, then I'll get notified.

sabrinatoro commented 5 years ago

I looked at the annotations. None of them have EXP evidence.

RLovering commented 5 years ago

Hi Sabrina

please could you mark on the spreadsheet which proteins are marked as dbTFs and have catalytic activity from IEAs so that we can contact InterPro or the MOD supplying the annotation etc if necessary to get these removed. For human very few proteins with catalytic activity are also dbTFs they are mostly co-regulators. see https://docs.google.com/spreadsheets/d/17mISqgr4JtlHO2ggpibcQ5pbOkes1d8v1Hh81nudpVg/edit#gid=0

Thanks

Ruth

sabrinatoro commented 5 years ago

@RLovering : yes! working on it! :-)

sabrinatoro commented 5 years ago

I added a sheet in the google doc (at the beginning of this ticket) to refer to the InterPro domains and the UniProtKB-KW on which the ZFIN IEA annotations were based. In most cases (for what I can tell), the 'conflict' comes from the difference in annotations coming from InterPro domains (most often dbTFs) and from UniProtKB-KW (most often activity annotations).

I would be happy to talk with any of you about this, and figure out how to move forward. Thanks!

ValWood commented 5 years ago

Hi @sabrinatoro,

The normal mechanism is to report the InterPro, or UniProtKB-KW violations on this tracker. It is usually easy to establish which one is the "violator" by looking at your in-house GO annotation/description, so it is a useful QC exercise for a MOD.

The InterPro mappings will get fixed immediately, and disappear from your next release. The UniProtKB-KW are usually a bit more difficult if a clean-up is required, because they are applied individual to entries manually, so the tracking down can take longer.

With the nice new GitHub feature, when you type the InterPro ID or the KW you will see if the issue is already reported. I think have previously adressed the ones which apply also to yeast, so the remaining should be multicellualr species specific.

cheers,

Val

ValWood commented 5 years ago

Hi @sabrinatoro

First one easy, https://github.com/geneontology/go-annotation/issues/2223 Zinc finger, C2HC-type domains should not have "DNA-binding transcription factor activity" because this is not always true.

In these cases, to be really thorough, you would check if you lose any valid annotations to "DNA-binding transcription factor activity" by the removal of this mapping.

Looking through your list, and based on gene names, I suspect this will resolve quite a few but the others will need a little bit of digging.

v

srengel commented 5 years ago

SGD done. the remaining yeast annotations are from PAINT.

gthayman commented 5 years ago

RGD done.

pgaudet commented 5 years ago

@ValWood Have the pombe annotations been reviewed ?

ValWood commented 5 years ago

Yes, see above. They are all from PAINT?

krchristie commented 5 years ago

MGI done. Remaining mouse annotations are either fine or are from PAINT, ISO's from other species, or Keywords not assigned by MGI.

dsiegele commented 5 years ago

Hi Pascal,

I don't see any E. coli annotations in the table at the beginning, so this may not be relevant. However, in E. coli there is at least one example of a protein that has both DNA-binding transcription factor activity and catalytic active. See putA at EcoCyc.

Debby

krchristie commented 5 years ago

Following on from the comment by @dsiegele, there are a number of mouse genes on this list where there are papers that explicitly talk about a transcription factor with catalytic activity. These are mentioned in the notes I added to the spreadsheet.

pgaudet commented 5 years ago

@RLovering Can you re-run the query/pivot table ? Maybe this can be closed ?

pgaudet commented 4 years ago

@bmeldal There are 3 General transcription factors annotated to DNA binding transcription factor activity:

CPX-79 rap30-rap74_human CPX-915 tfiid_human CPX-930 tfiid-taf4bvariant_human

Those should not be annotated to GO:0003700 and children, but rather to GO:0016251 RNA polymerase II general transcription initiation factor activity. Can you please have a look ?

bmeldal commented 4 years ago

done (also for mouse)

pgaudet commented 3 years ago

Most are done, several remaining came from PAINT, these will be corrected as updated annotation come through.