geneontology / go-releases

Tasks and notes for monthly GO releases
0 stars 0 forks source link

MGI GAF includes PRO isoforms in col2 #29

Open cmungall opened 1 year ago

cmungall commented 1 year ago

PR should only be used in col2 in the GAF

curl -L -s "http://current.geneontology.org/annotations/mgi.gaf.gz" | gzip -dc | grep ^PR | wc
    2139   51464  494819

Examples:

PR  Q8R5M8-4    mCADM1/iso:4    located_in  GO:0043196  PMID:21482734   IDA     C   cell adhesion molecule 1 isoform 4 (mouse)  mCADM1/iso:4|Cadm1d (mouse)|cell adhesion molecule 1 isoform d (mouse)  protein taxon:10090 20141125    MGI part_of(CL:0000540)
PR  Q8R5M8-2    mCADM1/iso:2    located_in  GO:0070852  PMID:21482734   IDA     C   cell adhesion molecule 1 isoform 2 (mouse)  mCADM1/iso:2|Cadm1c (mouse)|cell adhesion molecule 1 isoform c (mouse)  protein taxon:10090 20141125    MGI part_of(CL:0000540)
PR  Q8R5M8-3    mCADM1/iso:3    located_in  GO:0070852  PMID:21482734   IDA     C   cell adhesion molecule 1 isoform 3 (mouse)  mCADM1/iso:3|Cadm1b (mouse)|cell adhesion molecule 1 isoform b (mouse)  protein taxon:10090 20141125    MGI part_of(CL:0000540)
ukemi commented 1 year ago

Shouldn't the gaf only have MGI identifiers in column 2? PR identifiers (or any others)represent proteoforms (NAs) that should go in column 17????? They should represent valid associations from the GPI file.

pgaudet commented 1 year ago

Shouldn't the gaf only have MGI identifiers in column 2? PR identifiers (or any others)represent proteoforms (NAs) that should go in column 17?????

I think so!!

@cmungall are you referring to the GPAD format?

ukemi commented 1 year ago

No, gaf. http://geneontology.org/docs/gene-product-association-data-gpad-format/ http://geneontology.org/docs/go-annotation-file-gaf-format-2.2/