geneontology / go-annotation

This repository hosts the tracker for issues pertaining to GO annotations.
BSD 3-Clause "New" or "Revised" License
34 stars 10 forks source link

col3 / gene product symbol minor bug in goa_human_isoform #1420

Closed cmungall closed 8 months ago

cmungall commented 8 years ago

In a GAF, each gene product should have the same symbol. For the Histone H3 protein, this varies between H3F3A and H3F3B

$ gzip -dc goa_human_isoform.gaf.gz | grep P84243
UniProtKB       P84243  H3F3A           GO:0000786      GO_REF:0000038  IEA     UniProtKB-KW:KW-0544    C       Histone H3      B4DEB1_HUMAN|H3F3A      protein taxon:9606      20160702        UniProt         UniProtKB:B4DEB1
UniProtKB       P84243  H3F3A           GO:0003677      GO_REF:0000038  IEA     UniProtKB-KW:KW-0238    F       Histone H3      B4DEB1_HUMAN|H3F3A      protein taxon:9606      20160702        UniProt         UniProtKB:B4DEB1
UniProtKB       P84243  H3F3A           GO:0005634      GO_REF:0000040  IEA     UniProtKB-SubCell:SL-0191       C       Histone H3      B4DEB1_HUMAN|H3F3A      protein taxon:9606      20160702        UniProt         UniProtKB:B4DEB1
UniProtKB       P84243  H3F3A           GO:0046982      GO_REF:0000002  IEA     InterPro:IPR009072      F       Histone H3      B4DEB1_HUMAN|H3F3A      protein taxon:9606      20160702        InterPro                UniProtKB:B4DEB1
UniProtKB       P84243  H3F3B           GO:0000786      GO_REF:0000002  IEA     InterPro:IPR000164      C       Histone H3.3    K7EP01_HUMAN|H3F3B      protein taxon:9606      20160702        InterPro                UniProtKB:K7EP01
UniProtKB       P84243  H3F3B           GO:0000786      GO_REF:0000002  IEA     InterPro:IPR000164      C       Histone H3.3    K7ES00_HUMAN|H3F3B      protein taxon:9606      20160702        InterPro                UniProtKB:K7ES00
UniProtKB       P84243  H3F3B           GO:0000786      GO_REF:0000038  IEA     UniProtKB-KW:KW-0544    C       Histone H3      K7EK07_HUMAN|H3F3B      protein taxon:9606      20160702        UniProt         UniProtKB:K7EK07
UniProtKB       P84243  H3F3B           GO:0000786      GO_REF:0000038  IEA     UniProtKB-KW:KW-0544    C       Histone H3      K7EMV3_HUMAN|H3F3B      protein taxon:9606      20160702        UniProt         UniProtKB:K7EMV3
UniProtKB       P84243  H3F3B           GO:0003677      GO_REF:0000038  IEA     UniProtKB-KW:KW-0238    F       Histone H3      K7EK07_HUMAN|H3F3B      protein taxon:9606      20160702        UniProt         UniProtKB:K7EK07
UniProtKB       P84243  H3F3B           GO:0003677      GO_REF:0000038  IEA     UniProtKB-KW:KW-0238    F       Histone H3      K7EMV3_HUMAN|H3F3B      protein taxon:9606      20160702        UniProt         UniProtKB:K7EMV3
UniProtKB       P84243  H3F3B           GO:0003677      GO_REF:0000038  IEA     UniProtKB-KW:KW-0238    F       Histone H3.3    K7EP01_HUMAN|H3F3B      protein taxon:9606      20160702        UniProt         UniProtKB:K7EP01
UniProtKB       P84243  H3F3B           GO:0003677      GO_REF:0000038  IEA     UniProtKB-KW:KW-0238    F       Histone H3.3    K7ES00_HUMAN|H3F3B      protein taxon:9606      20160702        UniProt         UniProtKB:K7ES00
UniProtKB       P84243  H3F3B           GO:0005634      GO_REF:0000040  IEA     UniProtKB-SubCell:SL-0191       C       Histone H3      K7EK07_HUMAN|H3F3B      protein taxon:9606      20160702        UniProt         UniProtKB:K7EK07
UniProtKB       P84243  H3F3B           GO:0005634      GO_REF:0000040  IEA     UniProtKB-SubCell:SL-0191       C       Histone H3      K7EMV3_HUMAN|H3F3B      protein taxon:9606      20160702        UniProt         UniProtKB:K7EMV3
UniProtKB       P84243  H3F3B           GO:0005634      GO_REF:0000040  IEA     UniProtKB-SubCell:SL-0191       C       Histone H3.3    K7EP01_HUMAN|H3F3B      protein taxon:9606      20160702        UniProt         UniProtKB:K7EP01
UniProtKB       P84243  H3F3B           GO:0005634      GO_REF:0000040  IEA     UniProtKB-SubCell:SL-0191       C       Histone H3.3    K7ES00_HUMAN|H3F3B      protein taxon:9606      20160702        UniProt         UniProtKB:K7ES00
UniProtKB       P84243  H3F3B           GO:0046982      GO_REF:0000002  IEA     InterPro:IPR009072      F       Histone H3      K7EK07_HUMAN|H3F3B      protein taxon:9606      20160702        InterPro                UniProtKB:K7EK07
UniProtKB       P84243  H3F3B           GO:0046982      GO_REF:0000002  IEA     InterPro:IPR009072      F       Histone H3      K7EMV3_HUMAN|H3F3B      protein taxon:9606      20160702        InterPro                UniProtKB:K7EMV3
UniProtKB       P84243  H3F3B           GO:0046982      GO_REF:0000002  IEA     InterPro:IPR009072      F       Histone H3.3    K7EP01_HUMAN|H3F3B      protein taxon:9606      20160702        InterPro                UniProtKB:K7EP01
UniProtKB       P84243  H3F3B           GO:0046982      GO_REF:0000002  IEA     InterPro:IPR009072      F       Histone H3.3    K7ES00_HUMAN|H3F3B      protein taxon:9606      20160702        InterPro                UniProtKB:K7ES00

I assume this is to do with the fact this protein has two genes associated with it (the AA seq is identical), and if a join is done between GPI and GPAD you would get these duplicates?

suzialeksander commented 8 months ago

no longer a problem