geneontology / go-releases

Tasks and notes for monthly GO releases
0 stars 0 forks source link

QC - new GOA-GOC data exchange pipeline #94

Open pgaudet opened 3 weeks ago

pgaudet commented 3 weeks ago

The data coming from the GOA pipeline is on AmiGO staging. This ticket looks at the differences across the two datasets.

GOA ftp: https://ftp.ebi.ac.uk/pub/contrib/goa/panther_proteomes/ GOA error reports for external groups: https://ftp.ebi.ac.uk/pub/contrib/goa/reports/ Stats (GOC release): https://docs.google.com/spreadsheets/d/1asamlC32E8HDGCqUVaE1O3hp4jp-Y6Z7nt6PrK_6jGw/edit?gid=0#gid=0

Need to check all sources

Known differences between GOA and GOC pipeline:

pgaudet commented 2 weeks ago

Decrease in ND, since GOA is filtering ND annotations if there are other annotations

pgaudet commented 2 weeks ago

pombe taxon is different between pombase ad Uniprot

pgaudet commented 2 weeks ago

SGD: AmiGO staging: 54,426 annotations assigned by SGD QuickGO: 54,733 annotation sassigned by SGD

S. cerevisiae: AmiGO staging: 117,992 annotations QuickGO: 146,440 annotations (RNA, complexes and SP Reviewed proteins)

pgaudet commented 2 weeks ago

Human data: