Open cmungall opened 1 year ago
Is this a request for TAIR to change their GAF?
@pgaudet @tberardini What would be the best forum to talk about this? It is an open question on how to proceed here.
Is there a big problem in leaving things as they are?
2722 UniProtKB - constant work in progress synchronize UniProt and TAIR mappings, coordination has been time-consuming 39138 TAIR - these are likely genetic loci, uncloned so they cannot be assigned an AGI_LocusCode 225444 AGI_LocusCode - everything else
bbop@wok:/home/skyhook/release/annotations$ zcat tair.gaf.gz | grep -v '^!' | cut -f 1,7 | sort | uniq -c
19388 AGI_LocusCode HDA
262 AGI_LocusCode HEP
185 AGI_LocusCode IC
21868 AGI_LocusCode IDA
61052 AGI_LocusCode IEA
4622 AGI_LocusCode IEP
4046 AGI_LocusCode IGI
17108 AGI_LocusCode IMP
24483 AGI_LocusCode IPI
37753 AGI_LocusCode ISM
8174 AGI_LocusCode ISS
633 AGI_LocusCode NAS
18351 AGI_LocusCode ND
866 AGI_LocusCode RCA
6653 AGI_LocusCode TAS
32884 TAIR IBA
24 TAIR IDA
10 TAIR IEP
61 TAIR IGI
526 TAIR IMP
3 TAIR IPI
11 TAIR ISS
78 TAIR NAS
5499 TAIR ND
42 TAIR TAS
2722 UniProtKB IBA
Most TAIR come from IBAs, which certainly do not corresponds to unmapped loci.
@dustine32 will look into the mappings from UniProt back to TAIR IDs in the PAINT pipeline.
bbop@wok:/home/skyhook/release/annotations$ zcat tair.gaf.gz | grep -v '^!' | cut -f 1 | sort | uniq -c
225444 AGI_LocusCode
39138 TAIR
2722 UniProtKB
Continued from:
659
We have both prefixes registered (note that no other registry acknowledges AGI_LocusCode)
Currently the GAF/GPI file from TAIR has duplicative entries:
These are the same gene:
https://www.arabidopsis.org/servlets/TairObject?accession=Locus:2065220
the policy for GO is to have a single representative entry for each gene. The GAF should always refer to that. Optionally a specific isoform can be indicated in c17 (and this will be the primary id in the GPAD)
This is the current distribution:
2722 UniProtKB (all IBA, indicating we were not able to map to a TAIR gene/locus) 39138 TAIR 225444 AGI_LocusCode