Closed cmungall closed 5 years ago
Some more stats:
Current release | Next release |
---|---|
683 EXP | 250 EXP |
48 HDA | 41 HDA |
59858 IBA | 58535 IBA |
215 IC | 179 IC |
30594 IDA | 24341 IDA |
127682 IEA | 56498 IEA |
10507 IEP | 10159 IEP |
313 IGI | 243 IGI |
5 IKR | 5 IKR |
9417 IMP | 8102 IMP |
7521 IPI | 3886 IPI |
168257 ISO | 132890 ISO |
23847 ISS | 3817 ISS |
640 NAS | 563 NAS |
6604 ND | 6589 ND |
3464 TAS | 2571 TAS |
Current release | Next release |
---|---|
435 AgBase | 186 AgBase |
93 Alzheimers_University_of_Toronto | 22 Alzheimers_University_of_Toronto |
640 ARUK-UCL | 511 ARUK-UCL |
2011 BHF-UCL | 1029 BHF-UCL |
70 CACAO | 39 CACAO |
439 CAFA | 300 CAFA |
4 DFLAT | 1 DFLAT |
1 dictyBase | 1 dictyBase |
76418 Ensembl | 13929 Ensembl |
7 FlyBase | 5 FlyBase |
6188 GOC | 5145 GOC |
59919 GO_Central | 58550 GO_Central |
542 HGNC | 175 HGNC |
1508 IntAct | 137 IntAct |
14424 InterPro | 11603 InterPro |
816 MGI | 534 MGI |
88 NTNU_SB | 68 NTNU_SB |
823 ParkinsonsUK-UCL | 360 ParkinsonsUK-UCL |
10 PINC | 10 PINC |
898 Reactome | 252 Reactome |
227163 RGD | 185959 RGD |
2646 SynGO | 1563 SynGO |
56 SynGO-UCL | 39 SynGO-UCL |
54288 UniProt | 28095 UniProt |
14 WB | 9 WB |
154 YuBioLab | 147 YuBioLab |
Pascale
1775 PMIDs have been dropped in the latest RGD GAF
Spot checking a specific protein (the one with the largest changes): RGD:70487 we went from 550 annotations to 260. Some redundant IEA/ISO are removed, which is nice, but also, many annotations from external sources (such as BHF) are not in the new dataset anymore, for example RGD:70487 GO:0010629 BHF-UCL in not in the new file, but it’s still in Protein2GO.
Pascale
@slaulederkind @gthayman
@jrsjrs @tutajm Thanks for the quick response.
In RGD, we have implemented additional QC to prevent submission of a GAF file with size differing substantially from previously submitted GAF files.
I apologize everyone for the problem.
dropped 449680 -> 308694 lines in gaf
I don't see any overall pattern in the drop
In some cases it looks like valid QC, e.g. many annotations to 'protein binding' gone.
I see some PMIDs have dropped out altogether, e.g.
http://amigo.geneontology.org/amigo/reference/PMID:9925767
The pub is from 1999 and I have not read it but I see no reason to suspect it is invalid and the annotations purged?
If there has been additional QC done on some of these and there is a conscious decision to deem the PMID not useful for GO curation would be awesome to record this somewhere, this is Frederic Bastian's proposal.
Or potentially this is an extreme redundancy trimming...?