geneontology / go-annotation

This repository hosts the tracker for issues pertaining to GO annotations.
BSD 3-Clause "New" or "Revised" License
35 stars 10 forks source link

annotations to GO:0032993 by CollecTF #1589

Closed keseler closed 1 year ago

keseler commented 7 years ago

Hi all,

We just imported the most recent GO mappings from the UniProt file into EcoCyc, and I noticed some new mappings for transcription factors that come from CollecTF. Here is an example for CsgD, http://www.ebi.ac.uk/QuickGO/GProtein?ac=P52106:

UniProtKB | P52106 | csgD |   | GO:0032993 | protein-DNA complex | C | EXP | PMID:21421764 |   | 83333 | 20170328 | CollecTF |   UniProtKB | P52106 | csgD |   | GO:0032993 | protein-DNA complex | C | IDA | PMID:21421764 |   | 83333 | 20170328 | CollecTF |   UniProtKB | P52106 | csgD |   | GO:0032993 | protein-DNA complex | C | IPI | PMID:21421764 | RefSeq:NC_007779.1 | 83333 | 20170328 | CollecTF

The documentation for GO:0032993 says that it should not be used for transcription factors at all.

Also, it looks like the annotations from CollecTF come in sets of three, with the three different evidence codes, and one of them with a "with:" some RefSeq ID. That seems odd, even for other GO terms like this:

UniProtKB | P52106 | csgD |   | GO:0001216 | bacterial-type RNA polymerase transcriptional activator activity, sequence-specific DNA binding | F | EXP | PMID:21421764 |   | 83333 | 20170328 | CollecTF |   UniProtKB | P52106 | csgD |   | GO:0001216 | bacterial-type RNA polymerase transcriptional activator activity, sequence-specific DNA binding | F | IDA | PMID:21421764 |   | 83333 | 20170328 | CollecTF |   UniProtKB | P52106 | csgD |   | GO:0001216 | bacterial-type RNA polymerase transcriptional activator activity, sequence-specific DNA binding | F | IPI | PMID:21421764 | RefSeq:NC_007779.1 | 83333 | 20170328 | CollecTF

Can someone clarify, either for me or for CollecTF folks (or both), what the appropriate annotations should be?
Thanks, Ingrid

ivanerill commented 7 years ago

Hi Ingrid,

A couple of comments regarding your post.

You are correct that GO:0032993 should not be used for transcription factors (although the way it is worded in the documentation is a bit ambiguous; not clear whether TF complexes, or just TFs, are the entity to be avoided). We'll update those annotations to use GO:0005667 in those cases where the TF is a oligomer (not sure if a DNA-association component can be annotated for monomeric TFs; might a new term be warranted?).

CollecTF annotations do not come in sets of three. Depending on the PMID source, a single GO term might have several annotations, based on different experimental codes. CollecTF annotates using ECO evidence codes, not GO evidence codes, and due to its nature (a user-oriented database, where the user determines which experimental evidence to use when browsing/querying) it does not editorialize the annotation to determine the most relevant type of experimental evidence.

In the example you cite (PMID:21421764 and GO:0001216), the paper substantiates the evidence for GO:0001216 using ECO:0005631 (DNAse footprinting evidence used in manual assertion), ECO:0006007 (chromatin immunoprecipitation-chip evidence used in manual assertion) and ECO:0005667 (site-directed mutagenesis evidence used in manual assertion), which translate into IPI, IDA and EXP GO evidence codes.

We are working with EBI GOA to avoid having two ECO terms from the same paper substantiating the same GO term map back to the same GO evidence code (which would technically be a redundant annotation), but I believe that the use of ECO terms warrants keeping the three annotations mentioned above in GO. I understand the logic for considering IPI a more relevant experimental evidence, but that logic is open to interpretation. In the context of this example, some people might consider that a site-directed mutagenesis assay provides more information than a footprint, a others that the in vivo ChIP-chip is the most relevant annotation for molecular function.

Regarding the use of RefSeq genome accessions in IPI, this is something that was suggested for CollecTF annotations of binding in which IPI codes were used. IPI requires a WITH, and in this case the protein is shown to interact with part of the genome (not necessarily a coding region). I am not sure what else could or should be done in this regard.

Thanks,

Ivan

keseler commented 7 years ago

Hi Ivan,

Thank you for the explanations. Because the GO evidence codes come from an ECO mapping, the "set of three" thing makes much more sense now. It wouldn't have caught my attention in the first place if one of the evidence codes hadn't been the generic EXP; I guess it surprises me a little that ECO:0005667 would map down to just EXP and not something IMP-ish.

Using the W3110 RefSeq genome for the IPI "with" seems a little sub-optimal to me, but I don't have a better solution to offer, either. Are you annotating with respect to the specific E. coli strain that was used in a particular experiment? If not, maybe the MG1655 genome could be used instead; it's the one that is (or rather, will be again soon) kept updated at GenBank.

Cheers, Ingrid

ivanerill commented 7 years ago

Hi Ingrid,

Yes, I am not sure why ECO:0005667 maps up to EXP and not IMP, either.

Regarding W3110, CollecTF uses whatever reference strain is explicitly mentioned by the authors (or can be traced back to from the reported strain). It would be nice if everybody agreed on using MG1655... ;-)

Cheers,

Ivan

cmungall commented 7 years ago

You can ask questions about ECO here: https://github.com/evidenceontology/evidenceontology/issues

ivanerill commented 7 years ago

Thanks Chris!

On Tue, Jul 18, 2017 at 12:31 AM, Chris Mungall notifications@github.com wrote:

You can ask questions about ECO here: https://github.com/evidenceontology/ evidenceontology/issues

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/geneontology/go-annotation/issues/1589#issuecomment-315904297, or mute the thread https://github.com/notifications/unsubscribe-auth/AA9lqaJfXx82Hk_UPcy3vGnUfJS51NdHks5sO-DSgaJpZM4OQX6i .

pgaudet commented 7 years ago

Hello,

Is there an action point here ?

Thanks, Pascale

ValWood commented 1 year ago

Closing. @keseler Please open a new ticket and link to this one if there are still GO annotations to fix.