Closed nataled closed 2 years ago
Curators annotated groupModifiedResidues when ordinary modifiedResidues are sufficient to capture GPI modification of a carboxy-terminal residue of a protein. And coordinates of that modified residue were corrected as needed to conform to UniProt records for the proteins. So the specific issues listed in this ticket are FIXED and the ticket can be CLOSED.
Other possible modifications of these proteins: we don't annotate glycosylations that are not required to distinguish functional forms of a protein - formally inconsistent but a lot of work for limited resources and little gain in information.
That leads to a tangent for the future: is this inconsistency damaging? Would annotating glycosylated sites more comprehensively but with no attempt to specify the chemical details of each - i.e. a generic glyco-residue modification that could be annotated semi-automatically wherever UniProt says there's such a modification. That, however, would create two kinds of glyco-modifications, ones manually constructed with full structures and ones with shorthand notations, itself an inconsistency.
To answer your question about the future: No, I don't think this inconsistency is damaging. There will be (and, actually, already are) very many cases where UniProtKB has a modification that isn't reflected in PRO. That's because UniProt only lists the possible modifications without concern for co-occurrence.
As for the last statement regarding full vs shorthand structures, that isn't really an inconsistency in my view. It just reflects the state of knowledge about the modification.
It just reflects the state of knowledge about the modification.
Sometimes but (full disclosure) sometimes the effort to account for all the branches and monosaccharides of a known structure doesn't seem worthwhile.
This issue involves the following: R-HSA-5689808 R-HSA-5689457 R-HSA-5689806 R-HSA-6808777 R-HSA-6808779
These are all indicated as being GPI-anchored. GPI anchors are (to my knowledge) always located at the C terminus of the anchored protein. However, the indicated position of modification is internal to the sequence range given. Checking UniProtKB for FOLR1 (P15328), I have confirmed that this discrepancy is not due to a change of sequence or use of alternative isoform. UniProt indicates a GPI anchor at position 234, so likely the Reactome-indicated position of 161 should be changed to 234. Complicating matters is the fact that UniProt does have a modification indicated for position 161 (N-linked (GlcNAc...) asparagine), but I don't believe that was the intended modification to highlight in this set.
Side note: the GPI modification given is somewhat redundant, but this will be described in more detail in a different issue.