glygener / glygen-issues

Repository for public GlyGen tickets
GNU General Public License v3.0
0 stars 0 forks source link

Query about GO annotations from UniProt #1670

Open katewarner opened 2 months ago

katewarner commented 2 months ago

What is the selection criteria for GO annotations provided by UniProt? Are they supposed to provide all GO annotations from a UniProt entry because I've noticed the GO annotations in an UniProt entry don't always match what they are providing to us?

For example, for P0CG48 these are the GO annotations in the UniProt entry: https://rest.uniprot.org/uniprotkb/P0CG48.txt

DR   GO; GO:0005829; C:cytosol; TAS:Reactome.
DR   GO; GO:0030666; C:endocytic vesicle membrane; TAS:Reactome.
DR   GO; GO:0005789; C:endoplasmic reticulum membrane; TAS:Reactome.
DR   GO; GO:0010008; C:endosome membrane; TAS:Reactome.
DR   GO; GO:0070062; C:extracellular exosome; HDA:UniProtKB.
DR   GO; GO:0005615; C:extracellular space; HDA:UniProtKB.
DR   GO; GO:0005741; C:mitochondrial outer membrane; TAS:Reactome.
DR   GO; GO:0005654; C:nucleoplasm; TAS:Reactome.
DR   GO; GO:0005634; C:nucleus; HDA:UniProtKB.
DR   GO; GO:0005886; C:plasma membrane; TAS:Reactome.
DR   GO; GO:0031982; C:vesicle; HDA:UniProtKB.
DR   GO; GO:0031386; F:protein tag activity; IBA:GO_Central.
DR   GO; GO:0003723; F:RNA binding; HDA:UniProtKB.
DR   GO; GO:0031625; F:ubiquitin protein ligase binding; IBA:GO_Central.
DR   GO; GO:0019941; P:modification-dependent protein catabolic process; IBA:GO_Central.
DR   GO; GO:0016567; P:protein ubiquitination; IBA:GO_Central.

Below is some of the ontology terms for P0CG48 in GlyGen with info on if they are in the UniProt entry and/or provided in the NT file. As you can see some of the terms are missing in UniProt or just missing in the entry, or just missing in the NT file. So my question is, does UniProt only provides certain terms to us, and/or do we have a selection criteria for the terms they provide to us?

GO term in GlyGen Term in UniProt entry? Provided in uniprot NT file?
metal ion binding (GO:0046872) Not in UniProt entry Not in NT file
protease binding (GO:0002020) Not in UniProt entry Is in NT file
protein tag activity (GO:0031386) Is in UniProt entry Is in NT file
RNA binding (GO:0003723) Is in UniProt entry Is in NT file
ubiquitin protein ligase binding (GO:0031625) Is in UniProt entry Is in NT file
modification-dependent protein catabolic process (GO:0019941)) Is in UniProt entry Is in NT file
protein ubiquitination (GO:0016567)) Is in UniProt entry Is in NT file
rykahsay commented 2 months ago

@jeet-vora ... I expect you dictated this selection criteria and I am hoping it is in our documentation

jeet-vora commented 2 months ago

@katewarner Let me review the issue after I am done with current work

There is no documentation as this was done in the early years of GlyGen. I will have to check my notes.

However since we are not showing all the GO terms my recommendation was to show the GO Terms which have evidence and in the ascending order, rest of the links can be seen clicking on the link.