geneontology / go-annotation

This repository hosts the tracker for issues pertaining to GO annotations.
BSD 3-Clause "New" or "Revised" License
34 stars 10 forks source link

GO_REFS - proposed merges #354

Closed gocentral closed 9 years ago

gocentral commented 17 years ago

emailed to annotation list back in Sept:

Greetings annotators,

One of my action items from recent meetings was to go through the GO References collection (go/doc/GO.references) to see which, if any, entries could be merged. I've come up with three potential merges; please look them over and comment.

Thanks!

  1. These interpro2go references all seem to be the same:

go_ref_id: GO_REF:0000002 title: Gene Ontology Annotation at EBI annotations using InterPro matches authors: Swiss-Prot/TrEMBL curators year: 2003 abstract: Transitive assignment using InterPro matches. This method is used for any protein that has one or more InterPro matches. The InterPro domain or family is assigned to the corresponding GO term using the interpro2go file. comment: Formerly GOA:interpro.

go_ref_id: GO_REF:0000007 title: Gene Ontology Annotation by the MGI Curatorial Staff authors: Mouse Genome Informatics Scientific Curators year: 2001 external_accession: MGI:2152098 external_accession: J:72247 abstract: For annotations documented via this citation, GO terms were assigned to MGI genes through InterPro protein domain assignments. Interpro protein domains are assigned to MGI genes as part of an ongoing curatorial collaboration between the SwissProt database and MGI (see J:53168). GO terms are associated with MGI genes using a translation table of InterPro protein domains to GO terms generated by Nicola Mulder at EBI.

go_ref_id: GO_REF:0000014 title: Gene Ontology Annotation Through Association of InterPro Records with GO Terms authors: ZFIN staff year: 2002 external_accession: ZFIN:ZDB-PUB-020724-1 abstract: ZFIN's curation of data includes the assignment of Gene Ontology (GO) controlled vocabulary terms. Genes are annotated with terms from three ontologies (Molecular Function, Cellular Component, and Biological Process). These annotations have been assigned based on an automated association of InterPro records with GO terms using a translation table provided by GO. The same set of GO vocabulary terms are used by other species genomic databases (www.geneontology.org). This allows direct comparisons of genes among different species. To obtain more detailed information about this process or to send comments, please contact ZFIN.

go_ref_id: GO_REF:0000017 title: dictyBase 'Inferred from Electronic Annotation (InterPro2GO method)' authors: DictyBase curators year: 2005 external_accession: DDB:10157 abstract: Gene Ontology (GO) annotations with the evidence code 'Inferred from Electronic Annotation' (IEA) are assigned automatically to gene products in dictyBase. Protein sequences in dictyBase are scanned for conserved functional domains that exist in InterPro. Domains are mapped to GO annotations using the InterPro2GO mapping file generated by Nicola Mulder at EBI. These GO annotations are then assigned to the respective gene products.

Note: GO_REF:0000016 is also interpro2go (FlyBase), but has some additional points not in the other four, namely that FB filters interpro2go mappings to the 'unknown' GO terms, and that they don't include annotations to a parent if FB has a non-IEA annotation to a child.

  1. Two ec2go references:

go_ref_id: GO_REF:0000003 title: Gene Ontology Annotation at EBI annotations authors: Swiss-Prot/TrEMBL curators year: 2003 abstract: Transitive assignment using enzyme codes. This method is used for any protein record in Swiss-Prot or TrEMBL that has an Enzyme Commission number in its description line. This EC number is then assigned to the corresponding GO term using the EC cross-references in the GO molecular function ontology. comment: Formerly GOA:spec.

go_ref_id: GO_REF:0000005 title: Gene Ontology Annotation by the MGI Curatorial Staff authors: Mouse Genome Informatics Scientific Curators year: 2001 external_accession: MGI:2152096 external_accession: J:72245 citation: Genomics 74:121-128 abstract: Enzyme Commission numbers that had been assigned to genes in MGI were annotated to GO terms based on the inclusion of EC numbers within GO terms from the molecular function ontology. Details of this strategy can be found in Hill et al., Genomics (2001) 74:121-128.

  1. Three spkw2go refs:

go_ref_id: GO_REF:0000004 title: Gene Ontology Annotation at EBI annotations from keyword mapping. authors: Swiss-Prot/TrEMBL curators year: 2003 abstract: Transitive assignment using Swiss-Prot keywords. This method is used for any protein record in Swiss-Prot or TrEMBL that has one or more Swiss-Prot keywords assigned. Each keyword is mapped to the corresponding GO term using the spkw2go file. comment: Formerly GOA:spkw.

go_ref_id: GO_REF:0000009 title: Gene Ontology Annotation by electronic association of SwissProt Keywords with GO terms authors: Mouse Genome Informatics Scientific Curators year: 2000 external_accession: MGI:1354194 external_accession: J:60000 abstract: The Mouse Genome Informatics (MGI) curation of data includes annotating genes to three ontologies (Function, Cellular Component, and Process) using the Gene Ontology (GO) controlled vocabulary shared with other species genomic databases (www.geneontology.org). Gene annotations in MGI citing this reference were assigned based on an electronic association of keywords from the SwissProt database with GO terms. The translation of SwissProt keywords to GO terms was carefully curated by MGI curators utilizing both SP and GO definitions to confirm that the associations were correct. The assignment of GO terms to individual genes was achieved electronically through database links. If a user discovers an annotation error or inconsistency, or requires more detailed information about this process, please contact MGI at mgi-help@informatics.jax.org.

go_ref_id: GO_REF:0000013 title: Gene Ontology Annotation Through Association of Swiss-Prot Keywords with GO Terms authors: ZFIN staff year: 2002 external_accession: ZFIN:ZDB-PUB-020723-1 abstract: ZFIN's curation of data includes the assignment of Gene Ontology (GO) controlled vocabulary terms. Genes are annotated with terms from three ontologies (Molecular Function, Cellular Component, and Biological Process). These annotations have been assigned based on an automated association of keywords from the Swiss-Prot database with GO terms using a translation table provided by GO. The same set of GO vocabulary terms are used by other species genomic databases (www.geneontology.org). This allows direct comparisons of genes among different species. To obtain more detailed information about this process or to send comments, please contact ZFIN.

Midori

Reported by: mah11

Original Ticket: "geneontology/annotation-issues/354":https://sourceforge.net/p/geneontology/annotation-issues/354

gocentral commented 17 years ago

Logged In: YES user_id=436423 Originator: YES

follow-ups on interpro2go from email exchange:

- We [MGI] also filter out any pointers to the three go unknowns using ip2go. [hjd] -- If everyone does, then we could include the FB reference in the merge. [mah] --- That would be ok so long as you include the line about FB ignoring annotations to a parent if we have a non-IEA annotation to a child. [sart] -- [ZFIN] too filter our any translations to obsolete terms, the 3 'Unknown' terms, and the root terms before we apply the translation files. [dgh]

Original comment by: mah11

gocentral commented 17 years ago

Logged In: YES user_id=436423 Originator: YES

an exchange on spkw2go:

1. I think the keyword one needs merged for sure but you need feedback from the others. We should be now be really saying UniProtKB keyword to GO mapping I suppose ????? These keywords are assigned to both UniProtKB/Swiss-Prot and UniProtKB/TrEMBL entries. Would the following be more informative. go_ref_id: GO_REF:000000? title: Electronic gene ontology annotations created using a manual mapping of UniProtKB keywords to GO terms authors: UniProtKB, GOA, MGI, ZFIN year: 2006 The first round of large scale GO annotations uses existing curated information within the UniProtKB flatfile. For many years keywords have been manually curated onto UniProtKB/Swiss-Prot entries and electronically assigned to UniProtKB/TrEMBL entries using manually curated set of rules (RuleBase). These keywords have been manually mapped to GO terms with careful checking of both UniProtKB keyword(http://ca.expasy.org/cgi-bin/keywlist.pl) and GO definitions. The UniProtKB Keyword to GO mapping file (also known as spkw2go) is updated monthly by GOA and available for download on both GO and EBI ftp sites. The assignment of GO terms to individual gene products was achieved electronically through database links. If a user discovers an annotation error or inconsistency, or requires more detailed information about this process, please contact the GO Consortium at gohelp@genome.stanford.edu or submit a comment the Sourceforge 'annotation issues' tracker https://sourceforge.net/projects/geneontology/. Would that work for everyone???? Evelyn

2. It is not a "first round" for us. We don't use them because they are "first round"" hjd

3. Hi Harold,

ok lets take 'first round' out then :-))))

I was trying to explain the history behind their curation but perhaps thats too UniProtKB specific..I have a further update to propose:

go_ref_id: GO_REF:000000? title: Electronic gene ontology annotations created using a manual mapping of UniProtKB keywords to GO terms authors: UniProtKB, GOA, MGI, ZFIN year: 2006 Large scale GO annotations can be performed using existing curated information within the UniProtKB flatfile. For many years keywords have been manually curated onto UniProtKB/Swiss-Prot entries and electronically assigned to UniProtKB/TrEMBL entries using both a manually curated set of rules (RuleBase, Apweiler et al, 2001, Brief. Bioinform. 2:9-18.) and an automatic annotation system (Spearmint, Kretschmann et al, 2001, Bioinformatics 17:920-926.) These keywords have been manually mapped to GO terms with careful checking of both UniProtKB keyword(http://ca.expasy.org/cgi-bin/keywlist.pl) and GO definitions. The UniProtKB Keyword to GO mapping file (also known as spkw2go) is updated monthly by GOA and available for download on both GO and EBI ftp sites. The assignment of GO terms to individual gene products was achieved electronically through database links. If a user discovers an annotation error or inconsistency, or requires more detailed information about this process, please contact the GO Consortium at gohelp@genome.stanford.edu or submit a comment the Sourceforge 'annotation issues' tracker https://sourceforge.net/projects/geneontology/.

Original comment by: mah11

gocentral commented 17 years ago

Logged In: YES user_id=546388 Originator: NO

Just want to understand the bottom line: Since we supply association file with our J#'s in them, will this "hurt" anything or will they be translated automatically?

Otherwise, will require various and sundry software fixes on our end: I assume that we can do like we do for the actual real refs: supply PMID as well as internal ref id separated as pipes?

hjd

Original comment by: hdrabkin

gocentral commented 17 years ago

Logged In: YES user_id=436423 Originator: YES

Harold - using J#s will be fine -- if we merge anything, the resulting entry will keep the J# as an external accession. Any software that uses the GO_REFs should accept any ID in the id, alt_id, or external_accession fields.

That said, it wouldn't hurt to do the J#|GO_REFid thing; it just wouldn't be required to keep things from breaking.

m

Original comment by: mah11

gocentral commented 17 years ago

Logged In: YES user_id=436423 Originator: YES

Comment on interpro2go from Doug:

Please Note: You asked to confirm the following: "(b) that annotations are filtered if there is a non-IEA annotation to a child?"

ZFIN does NOT do this for any of the translation files

-Doug

Original comment by: mah11

gocentral commented 17 years ago

Logged In: YES user_id=436423 Originator: YES

drafts sent to annotation mailing list:

go_ref_id: GO_REF:0000002 alt_id: GO_REF:0000007 alt_id: GO_REF:0000014 alt_id: GO_REF:0000016 alt_id: GO_REF:0000017 title: Gene Ontology annotation through association of InterPro records with GO terms. authors: DDB, FB, MGI, UniProt, ZFIN curators year: 2001 external_accession: MGI:2152098 external_accession: J:72247 external_accession: ZFIN:ZDB-PUB-020724-1 external_accession: FBrf0174215 external_accession: DDB:10157 abstract: Transitive assignment of GO terms based on InterPro classification. For any database entry (representing a protein or protein-coding gene) thah has been annotated with one or more InterPro domains, The corresponding GO terms are obtained from a translation table of InterPro entries to GO terms (interpro2go) generated manually by the InterPro team at EBI. comment: Formerly GOA:interpro. Note that GO annotations based on InterPro-to-GO transitive assignment may undergo subsequent filtering, e.g. to remove annotations redundant with manual curation; consult documentation from the annotation providers for further information.

go_ref_id: GO_REF:0000003 alt_id: GO_REF:0000005 title: Gene Ontology annotation based on Enzyme Commission mapping authors: UniProt curators; MGI curators year: 2001 external_accession: MGI:2152096 external_accession: J:72245 citation: Genomics 74:121-128 abstract: Transitive assignment using Enzyme Commission identifiers. This method is used for any database entry, such as a protein record in Swiss-Prot or TrEMBL, that has had an Enzyme Commission number assigned. The corresponding GO term is determined using the EC cross-references in the GO molecular function ontology. Also see Hill et al., Genomics (2001) 74:121-128. comment: Formerly GOA:spec.

go_ref_id: GO_REF:0000004 alt_id: GO_REF:0000009 alt_id: GO_REF:0000013 title: Gene Ontology annotation based on Swiss-Prot keyword mapping. authors: Swiss-Prot/TrEMBL curators year: 2000 external_accession: MGI:1354194 external_accession: J:60000 external_accession: ZFIN:ZDB-PUB-020723-1 abstract: Transitive assignment using Swiss-Prot keywords. This method is used for any database record that has one or more Swiss-Prot keywords assigned. Each keyword is mapped to the corresponding GO term in the spkw2go file, which was originally constructed manually by MGI curators and is now maintained by the GOA team at EBI. comment: Formerly GOA:spkw.

Original comment by: mah11

gocentral commented 17 years ago

Original comment by: mah11

gocentral commented 17 years ago

Logged In: YES user_id=436423 Originator: YES

merges now done in go/doc/GO.references

thanks to all who commented here or on the mailing list!

m

Original comment by: mah11

gocentral commented 17 years ago

Original comment by: mah11