geneontology / go-annotation

This repository hosts the tracker for issues pertaining to GO annotations.
BSD 3-Clause "New" or "Revised" License
31 stars 10 forks source link

Q03989 | ARID5A and Q14865 ARID5B not dbTFs #2997

Closed RLovering closed 4 years ago

RLovering commented 4 years ago

Hi Marcio Colin has confirmed that while this does bind DNA this is a AT-rich interactive domain-containing protein and the DNA binding is not specific enough to provide a genomic location/address. I don't think the paper annotated confirms DNA binding and the protein is not listed on the Yin HT-SELEX list

Q03989 | ARID5A | enables | GO:0001227 | DNA-binding transcription repressor activity, RNA polymerase II-specific | ECO:0000314(IDA) | ECO:0000314 | (IDA) |   | PMID:15941852

Best

Ruth

mlacencio commented 4 years ago

Hi Ruth!

In fact, this paper does not show any evidence for DNA binding activity for ARID5A! There are two issues here. First, I have added the wrong paper as reference to support this annotation! The right one was supposed to be PMID:15640446. But this paper, in turn, shows sequence-specific DNA binding activity for ARID5B (Figure 3), not ARID5A. As a consequence of these two mistakes, I have deleted the disputed annotation.

However, Lambert et al 2018 classify this protein as DbTF due to experimental evidence (protein binding microarray) for sequence-specific DNA binding activity for murine ARID5A orthologs (PMID:19443739 and PMID:25215497). This is recorded in JASPAR (http://jaspar.genereg.net/matrix/MA0602.1/). So, theoretically, it would be possible to create a new and more appropriate annotation for human ARID5A. But, for this, we have to discuss Colin's observation about lack of specificity of ARID proteins.

In this regard, the previous mentioned paper PMID:15640446 seems to be helpful as the authors have undertaken a survey of DNA-binding properties across the entire ARID family. According to them, among the seven ARID subfamilies, only the subfamilies ARID3 and ARID5 bind in a sequence-specific manner. But, please, I would appreciate if you could validate this information.

Colin's observation is very relevant and poses us the following question: how should we consider a sequence specific enough to be a genomic location/address? Is there any formal measurement for this? Should we ourselves propose something concerning this?

Best regards!

Marcio

RLovering commented 4 years ago

HI Marcio I just wanted to check with you. Basically we are working to the strategy that some coTFs do bind DNA but this is not specific. All of the ARID proteins are considered coTFs, therefore, the annotation to ARID5B. Therefore, please remove this annotation:

NTNU | Q14865 | ARID5B |   | GO:0001227 | DNA-binding transcription repressor activity, RNA polymerase II-specific | ECO:0000314(IDA) | ECO:0000314 | (IDA) |   | PMID:8649988

And do not add dbTF to ARID5B based on your comments above. Also you might be interested to read Colin's comments today following my query re KDM5A https://github.com/geneontology/go-annotation/issues/3008

Best

Ruth

RLovering commented 4 years ago

HI Marcio

I agree with your comment:

how should we consider a sequence specific enough to be a genomic location/address? Is there any formal measurement for this? Should we ourselves propose something concerning this? I think the general idea is that the sequence bound needs to be reasonably unique within the genome. For ARID5A and ARID5B Arttu explained in Rende that these are not specific enough to provide an address

Therefore, please remove this annotation:

NTNU | Q14865 | ARID5B | | GO:0001227 | DNA-binding transcription repressor activity, RNA polymerase II-specific | ECO:0000314(IDA) | ECO:0000314 | (IDA) | | PMID:8649988

Thanks

Ruth

mlacencio commented 4 years ago

Hi @RLovering!

Thank you for letting me know about Artu thoughts about ARID5A ana ARID5B roles.

Me, Astrid and Martin agree that a DbTF should provide a unique genomic address. But we think we should try to agree in a more objective definition of specificity and make this clear in the curation guidelines.

The discussion about the definition of a unique genomic address should involve Artu and colleagues as well as TFClass people (recall that Edgar Windenger et al also classify ARID5A and ARID5B as DbTFs). Although Artu has explained ARID5A and ARID5B can't provide a specific genomic address, ARID5A and ARID5B are still in his own list.

Thanks,

Marcio

pgaudet commented 4 years ago

@mlacencio

Thanks for the comments. We can look for literature that provides evidence got ARID5A binding specific genomic addresses, but the paper you cite for the annotation we are disputing , PMID:8649988, shows binding to an AT-rich region, ATATCG, while they show this region is present in the modulator located upstream of the human cytomegalovirus majorimmediate early gene enhancer, they don't show that this sequence is especially unique in the genome to provide a genomic address, rather than generally making the chromatin more accessible for transcription.

So the annotation to BP is justified here, but not to the MFs

Note that this paper strongly points to a coregulator function: PMID:21532585

Thanks, Pascale

RLovering commented 4 years ago

Hi Marcio

thanks for removing the ARID5A anntotations. Please would you remove ARID5B annotations too:

NTNU | Q14865 | ARID5B |   | GO:0000977 | RNA polymerase II transcription regulatory region sequence-specific DNA binding | ECO:0000314(IDA) | ECO:0000314 | (IDA) |   | PMID:8649988 NTNU | Q14865 | ARID5B |   | GO:0001227 | DNA-binding transcription repressor activity, RNA polymerase II-specific | ECO:0000314(IDA) | ECO:0000314 | (IDA) |   | PMID:8649988

You could replace these with GO:0003680 AT DNA binding and GO:0003714 | transcription corepressor activity respectively

Best

Ruth

mlacencio commented 4 years ago

Dear @RLovering and @pgaudet

I have removed the annotations!

Best,

Marcio