geneontology / pathways2GO

Code for converting between BioPAX pathways and Gene Ontology Causal Activity Models (GO-CAM)
8 stars 0 forks source link

Reactome - Discrepancy between internal database and web presentation? #279

Open nataled opened 9 months ago

nataled commented 9 months ago

I found something odd. The following error messages were generated from my processing of the 'Liam' data. Each of these point out when the stored information for sequence length in Reactome differs from UniProtKB. What's odd is that these errors are not visible when looking at the web presentation (at least for the ones I spot checked). An example:

https://www.reactome.org/content/detail/R-HSA-419522 (shows chain 1-1451) https://reactome.org/content/schema/instance/browser/419522 (shows endCoordinate 1471)

EWAS  R-HSA-9707859   GO:0005654      O15172  1       223                     PGRMC2 [nucleoplasm] (O15172 length = 72)
EWAS  R-HSA-1236845   GO:0030670      Q03518  1       808                     TAP1 [phagocytic vesicle membrane] (Q03518 length = 748)
EWAS  R-HSA-400122    GO:0005789      O60427  1       501                     FADS1 [endoplasmic reticulum membrane] (O60427 length = 444)
EWAS  R-HSA-5216184   GO:0005759      Q5K4L6  1       730                     SLC27A3 [mitochondrial matrix] (Q5K4L6 length = 683)
EWAS  R-HSA-391965    GO:0005789      Q93084  1       1043                    ATP2A3 [endoplasmic reticulum membrane] (Q93084 length = 999)
EWAS  R-HSA-198188    GO:0005886      P01850  1       177                     TRBC1 [plasma membrane] (P01850 length = 176)
EWAS  R-HSA-5690190   GO:0005654      Q9UK80  1       1055                    USP21 [nucleoplasm] (Q9UK80 length = 565)
EWAS  R-HSA-2682364   GO:0005829      Q9HBY8  1       427                     SGK2 [cytosol] (Q9HBY8 length = 367)
EWAS  R-HSA-9612177   GO:0005654      Q15742  1       585                     NAB2 [nucleoplasm] (Q15742 length = 525)
EWAS  R-HSA-419393    GO:0005886      Q9HBW0  1       351                     LPAR2 [plasma membrane] (Q9HBW0 length = 348)
EWAS  R-HSA-5216046   GO:0005886      Q0D2K0  1       466                     NIPAL4 [plasma membrane] (Q0D2K0 length = 404)
EWAS  R-HSA-3239030   GO:0005654      Q8TEK3  1       1739                    DOT1L [nucleoplasm] (Q8TEK3 length = 1537)
EWAS  R-HSA-195060    GO:0005829      O94827  1       1062                    PLEKHG5 [cytosol] (O94827 length = 1006)
EWAS  R-HSA-3244651   GO:0005789      Q9NSU2  1       369                     TREX1 [endoplasmic reticulum membrane] (Q9NSU2 length = 314)
EWAS  R-HSA-947474    GO:0005789      Q03518  1       808                     TAP1 [endoplasmic reticulum membrane] (Q03518 length = 748)
EWAS  R-HSA-418301    GO:0005886      P20020  1       1258                    ATP2B1 [plasma membrane] (P20020 length = 1220)
EWAS  R-HSA-9751650   GO:0005789      Q8NH41  1       348                     OR4K15 [endoplasmic reticulum membrane] (Q8NH41 length = 324)
EWAS  R-HSA-442723    GO:0005886      Q13972  1       1275                    RASGRF1 [plasma membrane] (Q13972 length = 1273)
EWAS  R-HSA-8866674   GO:0005829      P60896  1       70                      SEM1 [cytosol] (P60896 length = )
EWAS  R-HSA-174204    GO:0005654      Q9UJX3  1       599                     ANAPC7 [nucleoplasm] (Q9UJX3 length = 565)
EWAS  R-HSA-5625607   GO:0005829      P54252  1       364                     ATXN3 [cytosol] (P54252 length = 361)
EWAS  R-HSA-140222    GO:0005741      Q9BXH1  1       193                     BBC3 [mitochondrial outer membrane] (Q9BXH1 length = )
EWAS  R-HSA-5667017   GO:0005829      Q8TCX5  1       695                     RHPN1 [cytosol] (Q8TCX5 length = 670)
EWAS  R-HSA-1955379   GO:0005758      Q16635  1       292                     TAZ [mitochondrial intermembrane space] (Q16635 length = 262)
EWAS  R-HSA-3215094   GO:0005654      P11309  1       404                     PIM1 [nucleoplasm] (P11309 length = 313)
EWAS  R-HSA-3777103   GO:0005829      O95278  1       331                     EPM2A [cytosol] (O95278 length = )
EWAS  R-HSA-70451     GO:0005829      P60174  1       286                     TPI1 [cytosol] (P60174 length = 249)
EWAS  R-HSA-6809866   GO:0005829      Q8IUG1  1       177                     KRTAP1-3 [cytosol] (Q8IUG1 length = 167)
EWAS  R-HSA-391290    GO:0005829      Q5XUX1  1       488                     FBXW9 [cytosol] (Q5XUX1 length = 458)
EWAS  R-HSA-2023876   GO:0005789      Q96BZ4  1       762                     PLD4(1-762) [endoplasmic reticulum membrane] (Q96BZ4 length = 506)
EWAS  R-HSA-2454158   GO:0005654      Q96N38  1       555                     ZNF714 [nucleoplasm] (Q96N38 length = 554)
EWAS  R-HSA-63500     GO:0005654      P0DPB5  1       133                     POLR1D [nucleoplasm] (P0DPB5 length = )
EWAS  R-HSA-8849372   GO:0005829      Q86Y91  1       864                     KIF18B [cytosol] (Q86Y91 length = 852)
EWAS  R-HSA-400325    GO:0005654      P35398  1       556                     RORA [nucleoplasm] (P35398 length = 523)
EWAS  R-HSA-912371    GO:0005635      O94901  1       812                     SUN1 [nuclear envelope] (O94901 length = 785)
EWAS  R-HSA-1299435   GO:0005743      Q16635  1       292                     TAZ [mitochondrial inner membrane] (Q16635 length = 262)
EWAS  R-HSA-5228649   GO:0005654      Q9HCS4  1       598                     TCF7L1 [nucleoplasm] (Q9HCS4 length = 588)
EWAS  R-HSA-6806195   GO:0005576      Q8N1F8  1       1099                    STK11IP [extracellular region] (Q8N1F8 length = 1088)
EWAS  R-HSA-52425     GO:0005886      P29973  1       690                     CNGA1 [plasma membrane] (P29973 length = 686)
EWAS  R-HSA-400502    GO:0005886      Q5NUL3  1       377                     FFAR4 [plasma membrane] (Q5NUL3 length = 361)
EWAS  R-HSA-1472879   GO:0005829      Q96AX9  1       1013                    MIB2 [cytosol] (Q96AX9 length = 955)
EWAS  R-HSA-2454148   GO:0005654      A8MWA4  1       302                     ZNF705E [nucleoplasm] (A8MWA4 length = 300)
EWAS  R-HSA-446173    GO:0005654      O95718  1       508                     ESRRB [nucleoplasm] (O95718 length = 433)
EWAS  R-HSA-5212665   GO:0005829      Q9UK80  1       1055                    USP21 [cytosol] (Q9UK80 length = 565)
EWAS  R-HSA-416413    GO:0005886      Q9NPC1  1       389                     LTB4R2 [plasma membrane] (Q9NPC1 length = 358)
EWAS  R-HSA-140220    GO:0005829      Q9BXH1  1       193                     BBC3 [cytosol] (Q9BXH1 length = )
EWAS  R-HSA-9033109   GO:0005829      Q6QHF9  1       649                     PAOX [cytosol] (Q6QHF9 length = 511)
EWAS  R-HSA-4127440   GO:0005829      Q3LFD5  1       358                     USP41 [cytosol] (Q3LFD5 length = )
EWAS  R-HSA-9751614   GO:0005789      Q8NGV7  1       314                     OR5H2 [endoplasmic reticulum membrane] (Q8NGV7 length = 309)
EWAS  R-HSA-141345    GO:0005782      Q6QHF9  1       649                     PAOX [peroxisomal matrix] (Q6QHF9 length = 511)
EWAS  R-HSA-5683239   GO:0005886      Q9NZS2  1       232                     KLRF1 [plasma membrane] (Q9NZS2 length = 231)
EWAS  R-HSA-8874123   GO:0000139      O43889  1       395                     CREB3 [Golgi membrane] (O43889 length = 371)
EWAS  R-HSA-8866665   GO:0005654      P60896  1       70                      SEM1 [nucleoplasm] (P60896 length = )
EWAS  R-HSA-5625371   GO:0097542      Q13099  1       832                     IFT88 [ciliary tip] (Q13099 length = 824)
EWAS  R-HSA-421234    GO:0030054      Q9Y5I7  1       305                     CLDN16 [cell junction] (Q9Y5I7 length = 235)
EWAS  R-HSA-9818451   GO:0005654      Q17RH7  1       258                     TPRXL [nucleoplasm] (Q17RH7 length = 139)
EWAS  R-HSA-6783285   GO:0005654      P54252  1       364                     ATXN3 [nucleoplasm] (P54252 length = 361)
EWAS  R-HSA-426060    GO:0005886      Q13574  1       1117                    DGKZ [plasma membrane] (Q13574 length = 928)
EWAS  R-HSA-977490    GO:0005886      Q9UGI6  1       736                     KCNN3 [plasma membrane] (Q9UGI6 length = 731)
EWAS  R-HSA-427902    GO:0031095      Q93084  1       1043                    ATP2A3 [platelet dense tubular network membrane] (Q93084 length = 999)
EWAS  R-HSA-5419294   GO:0005654      Q9GZS1  1       481                     POLR1E [nucleoplasm] (Q9GZS1 length = 419)
EWAS  R-HSA-947542    GO:0005829      O96033  1       88                      MOCS2 [cytosol] (O96033 length = )
EWAS  R-HSA-9645669   GO:0005829      Q8N726  1       132                     p14ARF [cytosol] (Q8N726 length = )
EWAS  R-HSA-1964466   GO:0005829      P0DPB5  1       133                     POLR1D [cytosol] (P0DPB5 length = )
EWAS  R-HSA-5358384   GO:0005829      Q9BZG8  1       443                     DPH1 [cytosol] (Q9BZG8 length = 438)
EWAS  R-HSA-198175    GO:0005886      P01848  1       142                     TRAC [plasma membrane] (P01848 length = 140)
EWAS  R-HSA-5690794   GO:0005829      Q15843  1       88                      NEDD8(1-88) [cytosol] (Q15843 length = 81)
EWAS  R-HSA-174242    GO:0005829      Q9UJX3  1       599                     ANAPC7 [cytosol] (Q9UJX3 length = 565)
EWAS  R-HSA-8875419   GO:0005829      Q9UJ41  1       708                     RABGEF1 [cytosol] (Q9UJ41 length = 491)
EWAS  R-HSA-2029032   GO:0031901      Q9UJ41  1       708                     RABGEF1 [early endosome membrane] (Q9UJ41 length = 491)
EWAS  R-HSA-164387    GO:0005886      P63092  1       394                     GNAS2 [plasma membrane] (P63092 length = )
EWAS  R-HSA-186631    GO:0005829      P10997  1       828                     IAPP(1-828) [cytosol] (P10997 length = 89)
EWAS  R-HSA-975008    GO:0005654      A8MXY4  1       1036                    ZNF99 [nucleoplasm] (A8MXY4 length = 864)
EWAS  R-HSA-3322944   GO:0005654      O75486  1       399                     SUPT3H [nucleoplasm] (O75486 length = 317)
EWAS  R-HSA-419772    GO:0097381      P03999  1       348                     OPN1SW [photoreceptor disc membrane] (P03999 length = 345)
EWAS  R-HSA-8855200   GO:0005829      Q99871  1       368                     HAUS7 [cytosol] (Q99871 length = 358)
EWAS  R-HSA-9707674   GO:0005635      O15172  1       223                     PGRMC2 [nuclear envelope] (O15172 length = 72)
EWAS  R-HSA-390809    GO:0005886      P21917  1       467                     DRD4 [plasma membrane] (P21917 length = 419)
EWAS  R-HSA-59932     GO:0005743      P56556  1       154                     NDUFA6 [mitochondrial inner membrane] (P56556 length = 128)
EWAS  R-HSA-8854063   GO:0005829      Q96ME1  1       805                     FBXL18 [cytosol] (Q96ME1 length = 718)
EWAS  R-HSA-52777     GO:0005654      P16220  1       341                     CREB1 [nucleoplasm] (P16220 length = 327)
EWAS  R-HSA-6809637   GO:0005829      P19013  1       534                     KRT4 [cytosol] (P19013 length = 520)
EWAS  R-HSA-9645695   GO:0005759      Q8N726  1       132                     p14ARF [mitochondrial matrix] (Q8N726 length = )
EWAS  R-HSA-49925     GO:0005829      P23109  1       780                     AMPD1 [cytosol] (P23109 length = 747)
EWAS  R-HSA-976950    GO:0005576      P01160  1       153                     NPPA(1-153) [extracellular region] (P01160 length = 151)
EWAS  R-HSA-442469    GO:0005654      Q9Y618  1       2525                    NCOR2 [nucleoplasm] (Q9Y618 length = 2514)
EWAS  R-HSA-162563    GO:0005829      Q06124  1       597                     PTPN11 [cytosol] (Q06124 length = 593)
EWAS  R-HSA-380260    GO:0005829      Q99996  1       3911                    AKAP9 [cytosol] (Q99996 length = 3907)
EWAS  R-HSA-376257    GO:0005829      Q9NXR1  1       346                     NDE1 [cytosol] (Q9NXR1 length = 335)
EWAS  R-HSA-430060    GO:0005829      O43602  1       441                     DCX [cytosol] (O43602 length = 365)
EWAS  R-HSA-5610385   GO:0005929      Q13099  1       832                     IFT88 [cilium] (Q13099 length = 824)
EWAS  R-HSA-6810279   GO:0005829      P60409  1       375                     KRTAP10-7 [cytosol] (P60409 length = 370)
EWAS  R-HSA-6809608   GO:0005829      O76011  1       436                     KRT34 [cytosol] (O76011 length = 394)
EWAS  R-HSA-8847510   GO:0000139      Q13948  1       678                     CUX1 [Golgi membrane] (Q13948 length = )
EWAS  R-HSA-5667147   GO:0005886      Q9UNG2  1       199                     TNFSF18 [plasma membrane] (Q9UNG2 length = 177)
EWAS  R-HSA-870499    GO:0005829      Q93008  1       2570                    USP9X [cytosol] (Q93008 length = 2554)
EWAS  R-HSA-8937745   GO:0005654      Q06124  1       597                     PTPN11 [nucleoplasm] (Q06124 length = 593)
EWAS  R-HSA-376229    GO:0005829      P49454  1       3207                    CENPF [cytosol] (P49454 length = 3114)
EWAS  R-HSA-2671913   GO:0005886      Q96FT7  1       647                     ASIC4 [plasma membrane] (Q96FT7 length = 539)
EWAS  R-HSA-6785941   GO:0005829      Q9BUX1  1       264                     CHAC1 [cytosol] (Q9BUX1 length = 222)
EWAS  R-HSA-8863953   GO:0033116      Q03518  1       808                     TAP1 [endoplasmic reticulum-Golgi intermediate compartment membrane] (Q03518 length = 748)
EWAS  R-HSA-6799230   GO:0035578      Q8N1F8  1       1099                    STK11IP [azurophil granule lumen] (Q8N1F8 length = 1088)
EWAS  R-HSA-8874165   GO:0005789      O43889  1       395                     CREB3 [endoplasmic reticulum membrane] (O43889 length = 371)
EWAS  R-HSA-9751682   GO:0005789      A6NMZ5  1       311                     OR4C45 [endoplasmic reticulum membrane] (A6NMZ5 length = 306)
EWAS  R-HSA-429782    GO:0000139      Q86VZ5  1       419                     SGMS1 [Golgi membrane] (Q86VZ5 length = 413)
EWAS  R-HSA-419966    GO:0005886      Q9Y5I7  1       305                     CLDN16 [plasma membrane] (Q9Y5I7 length = 235)
EWAS  R-HSA-2872286   GO:0030667      O60645  1       756                     EXOC3 [secretory granule membrane] (O60645 length = 745)
EWAS  R-HSA-1629813   GO:0005654      Q8N726  1       132                     p14ARF [nucleoplasm] (Q8N726 length = )
EWAS  R-HSA-2872485   GO:0005886      O00476  1       498                     SLC17A3(1-498) [plasma membrane] (O00476 length = 420)
EWAS  R-HSA-174889    GO:0005654      Q96AP0  1       544                     ACD [nucleoplasm] (Q96AP0 length = 458)
EWAS  R-HSA-8943140   GO:0005789      Q9MY60  1       181                     HLA-B B-60 [endoplasmic reticulum membrane] (Q9MY60 length = )
EWAS  R-HSA-2980987   GO:0005829      Q9UBK8  1       725                     MTRR [cytosol] (Q9UBK8 length = 698)
EWAS  R-HSA-913703    GO:0005796      Q8WXI7  1       22152                   MUC16 [Golgi lumen] (Q8WXI7 length = 14507)
EWAS  R-HSA-2586609   GO:0005654      A6NNF4  1       738                     ZNF726 [nucleoplasm] (A6NNF4 length = 616)
EWAS  R-HSA-388592    GO:0005886      P41968  1       360                     MC3R(1-360) [plasma membrane] (P41968 length = 323)
EWAS  R-HSA-5623379   GO:0005829      O60645  1       756                     EXOC3 [cytosol] (O60645 length = 745)
EWAS  R-HSA-6803747   GO:0005829      P51606  1       427                     RENBP [cytosol] (P51606 length = 417)
EWAS  R-HSA-49927     GO:0005829      Q01433  1       879                     AMPD2 [cytosol] (Q01433 length = 825)
EWAS  R-HSA-9714330   GO:0005829      P42167  1       633                     TMPO [cytosol] (P42167 length = )
EWAS  R-HSA-419522    GO:0005654      Q9BXW9  1       1471                    FANCD2 [nucleoplasm] (Q9BXW9 length = 1451)
deustp01 commented 8 months ago

Done - fixes logged here - mismatch list.xlsx

deustp01 commented 8 months ago

This issue arose because, when we updated the Reactome local copies of UniProt records to conform to UniProt, we did not include a check for changed chain lengths in the UniProt records (with an alert to re-edit affected entityWithAccessionedSequence instances. We still need to implement that update check feature so I'm re-opening the ticket to track progess there.

deustp01 commented 7 months ago

The update check feature should now be implemented at Reactome but the ticket stays open until the feature is confirmed to work as expected.