Closed pgaudet closed 3 years ago
Hello,
This spreadsheet has proteins annotated to both DNA-binding transcription factor activity (GO:0003700) (or a child) and to catalytic activity (GO:0003824) (or a child). Most proteins should not have both functions - if there is an enzymatic activity, it is likely that the protein is a co-factor.
Would everyone please review their annotations to check whether the annotations are OK ? In this case I cannot tell whether the transcription factor or the catalytic activity is incorrect - although I tried to indicate.
Note that in some cases the annotations may be coming from PAINT - please put a note and ignore if that's the case.
Species | Number of annotations |
---|---|
Homo sapiens - Complex | 3 |
Dictyostelium discoideum | 4 |
Drosophila melanogaster | 14 |
Mus musculus | 26 |
Schizosaccharomyces pombe | 13 |
Rattus norvegicus | 15 |
Saccharomyces cerevisiae S288C | 16 |
Arabidopsis thaliana | 57 |
Caenorhabditis elegans | 7 |
Danio rerio | 17 |
Thanks,
Pascale
annotations for
I cannot edit the google sheet, sent request but wanted to get it done. Inspected all, and Q55ER4 - rbbB has IBA only and the Taf subunits have mostly IBA and some historic ISS from SGD. Left them unchanged. Update Google sheet when I have access
I flagged the pombe ones as "PAINT". I presume they are, we never had any annotation in the intersection.
These complexes have DNA-binding and catalytic subunits so both annotation are legit.
Hi @bmeldal which gene products are these. When I make the rule I will make exceptions for these. Cheers. Val
The GPs are specific, either DNA-binding or enzymatic, but I annotate to the complex and the complex has both activities.
Sorry everyone, the spreadsheet is now editable.
Thanks, Pascale
Hi @bmeldal which Complex(s) ? Is the DNA binding to a specific motif ? It would be good to have examples for the working group.
We have a mixed bag.
Some are from PAINT/InterPro2GO (probably becuase some of the DNA-binding domains that are in TFs and chromatin modifying complexes. For example, the A-T Rich Interaction Domain (ARID) is found in transcription and chromatin regulators. A subset of ARID family proteins binds DNA specifically at AT-rich sites; while the others bind DNA non-specifically.
Some are contributes_to annotations from factors associated with modifying complexes and TF complexes. I can't remember where we got to with the "contributes_to" policy.
Thanks Helen. We tried to make it clear that specific DNA binding does not mean the function should be DNA-binding transcription factor activity. For the ARID proteins perhaps co-factor is more appropriate.
Pascale
TFIIF - CPX-79 GTF2F1: DNA-binding GTF2F2: helicase activity (extracted from UniProt)
TFIID - CPX-915/930 (variants of each other): Def: General transcription factor complex that acts as the primary core promoter recognition factor in the initiation of RNA polymerase II (Pol II)-dependent transcription. The TBP subunit of TFIID recognizes and binds to the TATA box (if present), while TAF1 and TAF2 interact with the Initiator element (Inr), and TAF1 and the TAF6-TAF9 module recognizes the downstream core promoter element (DPE). Other core promoter elements, such as the motif ten element (MTE), may also be involved. Binding of the general transcription factor complex TFIIA (CPX-519) enhances binding of TFIID to the core promoter and nucleates pre-initiation complex (PIC) assembly. Following recruitment of TFIIA to TFIID, TFIIB, TFIIF (CPX-79), Pol II, TFIIE and TFIIH are successively assembled at the core promoter, allowing the PIC to initiate Pol II transcription. While TFIID is essential for transcription and its post-mitotic reinitiation, the loss of one or more subunits does not harm ongoing transcription during any given cell cycle. TFIID promoter binding appears to be regulated by histone modifications: TAF1 bromodomains (1361-1617 aa, IPR001487) bind the modified histone tails of acetylated H4K16, H4K5/K12 and H4K8/K16. TAF1 also appears to exhibit histone acetyltransferase activity towards histones H3 and H4. TAF3 and the PHD domains of other TFIID subunits bind modified histone tails carrying trimethylated H3K4 in combination with acetylated H3K9 and H3K14. TAF1 phosphorylates TP53 (P04637) on Thr-55, leading to TP53 degradation and G1 cell cycle progression. Spermatocytes contain variants of TAF-containing complexes, including the TAF4B variant (CPX-930).
Add Helen/ARID proteins: I annotated the BAF complexes and wouldn't add TF activity to the chromatin remodellers :)
@bmeldal, GO:0003700 is not appropriate for TFIIF - CPX-79. You should annotate to GO:0016251 RNA polymerase II general transcription initiation factor activity and GO:0043565 sequence-specific DNA binding.
OK ?
Thanks, Pascale
@bmeldal think that you misunderstand - we are not annotating TF activity to chrom remods - saying that the propagation from PAINT/InterPro2GO may come from situations where a particular DNA binding domain is some times found TFs and sometimes in chrom remods.
We definitely need to review PAINT annotations. I have done jmj, I'll check ARID.
Thanks, Pascale
@bmeldal think that you misunderstand - we are not annotating TF activity to chrom remods - saying that the propagation from PAINT/InterPro2GO may come from situations where a particular DNA binding domain is some times found TFs and sometimes in chrom remods.
I was just confirming that I wouldn't make the annotations and agreeing with Helen that it must come from the InterPro or PAINT assertions.
Sorry @bmeldal misunderstood intentions!
@pgaudet - not sure if these have been reviewed via PAINT - didn't see them in our set, but MADF-BESS and SANT-MYB domain proteins are also spread between TFs and chrom modifiers.
TFIIF currently annotated to:
GO:0000977 RNA polymerase II regulatory region sequence-specific DNA binding Child of GO:0043565 sequence-specific DNA binding --> keep
GO:0001228 DNA-binding transcription activator activity, RNA polymerase II-specific child of GO:0003700 DNA-binding transcription factor activity --> will delete
GO:0016251 RNA polymerase II general transcription initiation factor activity --> keep
GO:0003711 transcription elongation regulator activity --> keep
GO:0003678 DNA helicase activity --> keep
How does that sound?
Does TFIIF binds to DNA in a sequence specific manner though? Doesn't the sequence specificity come from the "GO:0003700 DNA-binding transcription factor activity" ? Doesn't it It binds to DNA, but none specifically?
@bmeldal what you propose seems OK to me.
and also seems to address @ValWood 's comment - doesn't it ?
Removed GO:0001228 (also from mouse TFIIF). TF activity captured by GO:0016251 and DNA binding captured by GO:0000977.
@lreiser and @ebakker2: I split the rows among the three of us.
2 SGD rows are legit. the others are from PAINT so i ignored.
@srengel Can you check again whether SGD:S000000971 | RPH1 really is a 'nucleic acid binding transcription factor activity' (note that this term has been merged into GO:0003700 DNA binding transcription factor activity).
I think 'GO:0003712 transcription coregulator activity' would be more appropriate.
Thanks for confirming -
Pascale
I now marked our annotations as unchanged. We only have some still valid ISS, all have a lot of IBA / PAINT, and in once case it's only the PAINT that created the enzymatic activity. The other, three, TAF subunits, I left the ISS and wait until IBA are gone or ISS WITH are gone, then I'll get notified.
I looked at the annotations. None of them have EXP evidence.
Hi Sabrina
please could you mark on the spreadsheet which proteins are marked as dbTFs and have catalytic activity from IEAs so that we can contact InterPro or the MOD supplying the annotation etc if necessary to get these removed. For human very few proteins with catalytic activity are also dbTFs they are mostly co-regulators. see https://docs.google.com/spreadsheets/d/17mISqgr4JtlHO2ggpibcQ5pbOkes1d8v1Hh81nudpVg/edit#gid=0
Thanks
Ruth
@RLovering : yes! working on it! :-)
I added a sheet in the google doc (at the beginning of this ticket) to refer to the InterPro domains and the UniProtKB-KW on which the ZFIN IEA annotations were based. In most cases (for what I can tell), the 'conflict' comes from the difference in annotations coming from InterPro domains (most often dbTFs) and from UniProtKB-KW (most often activity annotations).
I would be happy to talk with any of you about this, and figure out how to move forward. Thanks!
Hi @sabrinatoro,
The normal mechanism is to report the InterPro, or UniProtKB-KW violations on this tracker. It is usually easy to establish which one is the "violator" by looking at your in-house GO annotation/description, so it is a useful QC exercise for a MOD.
The InterPro mappings will get fixed immediately, and disappear from your next release. The UniProtKB-KW are usually a bit more difficult if a clean-up is required, because they are applied individual to entries manually, so the tracking down can take longer.
With the nice new GitHub feature, when you type the InterPro ID or the KW you will see if the issue is already reported. I think have previously adressed the ones which apply also to yeast, so the remaining should be multicellualr species specific.
cheers,
Val
Hi @sabrinatoro
First one easy, https://github.com/geneontology/go-annotation/issues/2223 Zinc finger, C2HC-type domains should not have "DNA-binding transcription factor activity" because this is not always true.
In these cases, to be really thorough, you would check if you lose any valid annotations to "DNA-binding transcription factor activity" by the removal of this mapping.
Looking through your list, and based on gene names, I suspect this will resolve quite a few but the others will need a little bit of digging.
v
SGD done. the remaining yeast annotations are from PAINT.
RGD done.
@ValWood Have the pombe annotations been reviewed ?
Yes, see above. They are all from PAINT?
MGI done. Remaining mouse annotations are either fine or are from PAINT, ISO's from other species, or Keywords not assigned by MGI.
Hi Pascal,
I don't see any E. coli annotations in the table at the beginning, so this may not be relevant. However, in E. coli there is at least one example of a protein that has both DNA-binding transcription factor activity and catalytic active. See putA at EcoCyc.
Debby
Following on from the comment by @dsiegele, there are a number of mouse genes on this list where there are papers that explicitly talk about a transcription factor with catalytic activity. These are mentioned in the notes I added to the spreadsheet.
@RLovering Can you re-run the query/pivot table ? Maybe this can be closed ?
@bmeldal There are 3 General transcription factors annotated to DNA binding transcription factor activity:
CPX-79 rap30-rap74_human CPX-915 tfiid_human CPX-930 tfiid-taf4bvariant_human
Those should not be annotated to GO:0003700 and children, but rather to GO:0016251 RNA polymerase II general transcription initiation factor activity. Can you please have a look ?
done (also for mouse)
Most are done, several remaining came from PAINT, these will be corrected as updated annotation come through.
https://docs.google.com/spreadsheets/d/12oW_6KAhCfoF5y80QrzCvEMg9h98XX4-rmdh_k5P9Q4/edit#gid=0