Closed keseler closed 1 year ago
Hi Ingrid,
A couple of comments regarding your post.
You are correct that GO:0032993 should not be used for transcription factors (although the way it is worded in the documentation is a bit ambiguous; not clear whether TF complexes, or just TFs, are the entity to be avoided). We'll update those annotations to use GO:0005667 in those cases where the TF is a oligomer (not sure if a DNA-association component can be annotated for monomeric TFs; might a new term be warranted?).
CollecTF annotations do not come in sets of three. Depending on the PMID source, a single GO term might have several annotations, based on different experimental codes. CollecTF annotates using ECO evidence codes, not GO evidence codes, and due to its nature (a user-oriented database, where the user determines which experimental evidence to use when browsing/querying) it does not editorialize the annotation to determine the most relevant type of experimental evidence.
In the example you cite (PMID:21421764 and GO:0001216), the paper substantiates the evidence for GO:0001216 using ECO:0005631 (DNAse footprinting evidence used in manual assertion), ECO:0006007 (chromatin immunoprecipitation-chip evidence used in manual assertion) and ECO:0005667 (site-directed mutagenesis evidence used in manual assertion), which translate into IPI, IDA and EXP GO evidence codes.
We are working with EBI GOA to avoid having two ECO terms from the same paper substantiating the same GO term map back to the same GO evidence code (which would technically be a redundant annotation), but I believe that the use of ECO terms warrants keeping the three annotations mentioned above in GO. I understand the logic for considering IPI a more relevant experimental evidence, but that logic is open to interpretation. In the context of this example, some people might consider that a site-directed mutagenesis assay provides more information than a footprint, a others that the in vivo ChIP-chip is the most relevant annotation for molecular function.
Regarding the use of RefSeq genome accessions in IPI, this is something that was suggested for CollecTF annotations of binding in which IPI codes were used. IPI requires a WITH, and in this case the protein is shown to interact with part of the genome (not necessarily a coding region). I am not sure what else could or should be done in this regard.
Thanks,
Ivan
Hi Ivan,
Thank you for the explanations. Because the GO evidence codes come from an ECO mapping, the "set of three" thing makes much more sense now. It wouldn't have caught my attention in the first place if one of the evidence codes hadn't been the generic EXP; I guess it surprises me a little that ECO:0005667 would map down to just EXP and not something IMP-ish.
Using the W3110 RefSeq genome for the IPI "with" seems a little sub-optimal to me, but I don't have a better solution to offer, either. Are you annotating with respect to the specific E. coli strain that was used in a particular experiment? If not, maybe the MG1655 genome could be used instead; it's the one that is (or rather, will be again soon) kept updated at GenBank.
Cheers, Ingrid
Hi Ingrid,
Yes, I am not sure why ECO:0005667 maps up to EXP and not IMP, either.
Regarding W3110, CollecTF uses whatever reference strain is explicitly mentioned by the authors (or can be traced back to from the reported strain). It would be nice if everybody agreed on using MG1655... ;-)
Cheers,
Ivan
You can ask questions about ECO here: https://github.com/evidenceontology/evidenceontology/issues
Thanks Chris!
On Tue, Jul 18, 2017 at 12:31 AM, Chris Mungall notifications@github.com wrote:
You can ask questions about ECO here: https://github.com/evidenceontology/ evidenceontology/issues
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/geneontology/go-annotation/issues/1589#issuecomment-315904297, or mute the thread https://github.com/notifications/unsubscribe-auth/AA9lqaJfXx82Hk_UPcy3vGnUfJS51NdHks5sO-DSgaJpZM4OQX6i .
Hello,
Is there an action point here ?
Thanks, Pascale
Closing. @keseler Please open a new ticket and link to this one if there are still GO annotations to fix.
Hi all,
We just imported the most recent GO mappings from the UniProt file into EcoCyc, and I noticed some new mappings for transcription factors that come from CollecTF. Here is an example for CsgD, http://www.ebi.ac.uk/QuickGO/GProtein?ac=P52106:
UniProtKB | P52106 | csgD | | GO:0032993 | protein-DNA complex | C | EXP | PMID:21421764 | | 83333 | 20170328 | CollecTF | UniProtKB | P52106 | csgD | | GO:0032993 | protein-DNA complex | C | IDA | PMID:21421764 | | 83333 | 20170328 | CollecTF | UniProtKB | P52106 | csgD | | GO:0032993 | protein-DNA complex | C | IPI | PMID:21421764 | RefSeq:NC_007779.1 | 83333 | 20170328 | CollecTF
The documentation for GO:0032993 says that it should not be used for transcription factors at all.
Also, it looks like the annotations from CollecTF come in sets of three, with the three different evidence codes, and one of them with a "with:" some RefSeq ID. That seems odd, even for other GO terms like this:
UniProtKB | P52106 | csgD | | GO:0001216 | bacterial-type RNA polymerase transcriptional activator activity, sequence-specific DNA binding | F | EXP | PMID:21421764 | | 83333 | 20170328 | CollecTF | UniProtKB | P52106 | csgD | | GO:0001216 | bacterial-type RNA polymerase transcriptional activator activity, sequence-specific DNA binding | F | IDA | PMID:21421764 | | 83333 | 20170328 | CollecTF | UniProtKB | P52106 | csgD | | GO:0001216 | bacterial-type RNA polymerase transcriptional activator activity, sequence-specific DNA binding | F | IPI | PMID:21421764 | RefSeq:NC_007779.1 | 83333 | 20170328 | CollecTF
Can someone clarify, either for me or for CollecTF folks (or both), what the appropriate annotations should be?
Thanks, Ingrid