geneontology / pathways2GO

Code for converting between BioPAX pathways and Gene Ontology Causal Activity Models (GO-CAM)
8 stars 0 forks source link

Reactome: amino acid and PTM mismatches #312

Open nataled opened 9 months ago

nataled commented 9 months ago

I'm not sure how I missed these, but the following are cases where the amino acid specified in Reactome differs from that given by UniProtKB or possibly are cases where the PSI-MOD identifier is incorrect for the amino acid.

1) Reactome identifiers where Q9UM73 is the reference sequence. All of these incorrectly give position 1358 as a tyrosine (it's actually valine). Looking at the sequence through its whole history, there was never a tyrosine at that position. There is, however, a tyrosine at position 1359. Possibly a typo? The list of impacted identifiers is:

R-HSA-9699906
R-HSA-9715292
R-HSA-9699821
R-HSA-9700486
R-HSA-9715294
R-HSA-9699910
R-HSA-9701868
R-HSA-9700476
R-HSA-9699826
R-HSA-9715293
R-HSA-9701883
R-HSA-201535
R-HSA-9723885
R-HSA-9700480
R-HSA-9701860
R-HSA-9715287
R-HSA-9714063
R-HSA-9723299
R-HSA-9715286
R-HSA-9715295
R-HSA-9699882
R-HSA-9715290
R-HSA-9715289
R-HSA-9700462
R-HSA-9723884
R-HSA-9699921
R-HSA-9700464
R-HSA-9699800
R-HSA-9700489
R-HSA-9700473
R-HSA-9700477
R-HSA-9700465
R-HSA-9700484
R-HSA-9700482
R-HSA-9700466
R-HSA-9715285

2) Reactome identifiers where P28482 is the reference sequence. In all cases there is a phosphorylated threonine given at position 185, but the PSI-MOD identifier is given as MOD:00048; the correct PSI-MOD for these should be MOD:00047. Like for the first case, there was no time in the history of the sequence where that position was actually a tyrosine (which, if it was, it would make the PSI-MOD correct). I think there was just a typo for the PSI-MOD identifier:

R-HSA-9652791
R-HSA-9652175
R-HSA-9652802
R-HSA-9652799
R-HSA-9652797
R-HSA-9652785
R-HSA-9652806
R-HSA-9652808
R-HSA-9652794

3) R-HSA-179868. This uses P00533 as a reference, and indicates there's a phosphorylated tyrosine at position 1101. The actual amino acid there is proline (and always has been). I don't see a tyrosine nearby, but I note that the displayName indicates a different set of positions. Not sure what to make of it.

4) R-HSA-8878063. This uses Q01196 as a reference, indicating there's a phosphorylated serine at position 273. That position is actually a threonine.

5) R-HSA-1295590. This uses O43597 as its reference, indicating a phosphoserine at position 111. That position is actually an isoleucine, but it is flanked on either side by serines. Perhaps a typo?