geneontology / pathways2GO

Code for converting between BioPAX pathways and Gene Ontology Causal Activity Models (GO-CAM)
8 stars 0 forks source link

Reactome: GPI-FOLR1 errors #147

Closed nataled closed 2 years ago

nataled commented 2 years ago

This issue involves the following: R-HSA-5689808 R-HSA-5689457 R-HSA-5689806 R-HSA-6808777 R-HSA-6808779

These are all indicated as being GPI-anchored. GPI anchors are (to my knowledge) always located at the C terminus of the anchored protein. However, the indicated position of modification is internal to the sequence range given. Checking UniProtKB for FOLR1 (P15328), I have confirmed that this discrepancy is not due to a change of sequence or use of alternative isoform. UniProt indicates a GPI anchor at position 234, so likely the Reactome-indicated position of 161 should be changed to 234. Complicating matters is the fact that UniProt does have a modification indicated for position 161 (N-linked (GlcNAc...) asparagine), but I don't believe that was the intended modification to highlight in this set.

Side note: the GPI modification given is somewhat redundant, but this will be described in more detail in a different issue.

deustp01 commented 2 years ago

Curators annotated groupModifiedResidues when ordinary modifiedResidues are sufficient to capture GPI modification of a carboxy-terminal residue of a protein. And coordinates of that modified residue were corrected as needed to conform to UniProt records for the proteins. So the specific issues listed in this ticket are FIXED and the ticket can be CLOSED.

Other possible modifications of these proteins: we don't annotate glycosylations that are not required to distinguish functional forms of a protein - formally inconsistent but a lot of work for limited resources and little gain in information.

That leads to a tangent for the future: is this inconsistency damaging? Would annotating glycosylated sites more comprehensively but with no attempt to specify the chemical details of each - i.e. a generic glyco-residue modification that could be annotated semi-automatically wherever UniProt says there's such a modification. That, however, would create two kinds of glyco-modifications, ones manually constructed with full structures and ones with shorthand notations, itself an inconsistency.

nataled commented 2 years ago

To answer your question about the future: No, I don't think this inconsistency is damaging. There will be (and, actually, already are) very many cases where UniProtKB has a modification that isn't reflected in PRO. That's because UniProt only lists the possible modifications without concern for co-occurrence.

As for the last statement regarding full vs shorthand structures, that isn't really an inconsistency in my view. It just reflects the state of knowledge about the modification.

deustp01 commented 2 years ago

It just reflects the state of knowledge about the modification.

Sometimes but (full disclosure) sometimes the effort to account for all the branches and monosaccharides of a known structure doesn't seem worthwhile.