PHI-base / data

Archives of PHI-base data releases, and other data.
Creative Commons Attribution 4.0 International
7 stars 7 forks source link

Are PHI-base 4 interaction IDs violating their own uniqueness rules? #14

Open jseager7 opened 2 years ago

jseager7 commented 2 years ago

The PHI-base curator guidelines state the following about PHI-base accession IDs in PHI-base 4:

One number corresponding to one gene of one organism (even if there is more than one interaction) from one paper.

My understanding is that there is a one-to-one mapping between a PHI-base accession ID (PHI ID) and the following triple of data types:

(PMID, Pathogen NCBI Taxonomy ID, UniProtKB accession number)

So, when grouping by the above data types, I would expect there to be one PHI ID for each group. Instead, there are over 100 groups where there is more than one PHI ID (a one-to-many mapping), as shown in the table below.

@martin2urban Is there some other criteria for assigning PHI IDs that I'm missing, or is the data below a result of an error in the logic used to assign the IDs?

PMID Pathogen ID Protein ID PHI ID
PMID:10383767 5693 D3JLB9 PHI:2574, PHI:2595
PMID:10807578 5476 Q9P8U9 PHI:162, PHI:485
PMID:11038529 5116 Q00580 PHI:49, PHI:770
PMID:11310744 40559 Q9UW03 PHI:202, PHI:1160
PMID:12514128 318829 Q875L7 PHI:322, PHI:7226
PMID:12744465 40559 Q9C2Y1 PHI:278, PHI:1028
PMID:12828637 317 Q79LY0 PHI:992, PHI:7237
PMID:15306011 13684 Q5J4D6 PHI:365, PHI:2267
PMID:15811992 5518 I1RF62 PHI:712, PHI:3991
PMID:16113260 28901 H9L495 PHI:607, PHI:613
PMID:16113260 28901 Q8Z4N6 PHI:609, PHI:616
PMID:16113260 28901 Q8Z6M8 PHI:610, PHI:623
PMID:16272431 5270 Q705V7 PHI:526, PHI:1071
PMID:16278459 5518 I1RHS7 PHI:723, PHI:727
PMID:16278459 5518 I1RSU2 PHI:724, PHI:730
PMID:16353549 5507 Q2KN79 PHI:522, PHI:2808
PMID:16593517 317 P11437 PHI:3372, PHI:3442
PMID:16622070 13684 Q1L2E2 PHI:595, PHI:2248
PMID:17020577 13684 A9Z1V6 PHI:1083, PHI:2272
PMID:17020577 13684 Q00LS5 PHI:1082, PHI:2271
PMID:17189344 318829 Q3Y5V5 PHI:1018, PHI:2042
PMID:17250832 29003 A0ST42 PHI:737, PHI:2329
PMID:17353894 318829 G4MQ72 PHI:774, PHI:792
PMID:17353894 318829 G4MRZ0 PHI:773, PHI:791
PMID:17353894 318829 G4MSX7 PHI:772, PHI:797
PMID:17353894 318829 G4NF05 PHI:776, PHI:802
PMID:17353894 318829 G5EH19 PHI:780, PHI:787, PHI:808
PMID:17353894 318829 O42622 PHI:775, PHI:794
PMID:17379549 1047171 A5H456 PHI:867, PHI:1159
PMID:17511023 1047171 F9XG32 PHI:838, PHI:839, PHI:840, PHI:841, PHI:842, PHI:843, PHI:844, PHI:845, PHI:846
PMID:17555268 85558 Q7BT38 PHI:1001, PHI:1003
PMID:17560817 5111 A3KLI8 PHI:1038, PHI:1039
PMID:17624327 5693 H2DQH1 PHI:2576, PHI:3501
PMID:17722701 5507 A8QJI7 PHI:1020, PHI:1022, PHI:2362
PMID:18034832 318829 G4NDE1 PHI:1017, PHI:2067
PMID:18705871 5507 A6N6J8 PHI:1021, PHI:2981
PMID:19161356 5507 J9MAX2 PHI:1107, PHI:1108
PMID:19454732 318829 C4B8B9 PHI:2137, PHI:3498, PHI:3499
PMID:19459949 1047171 F9XG32 PHI:1149, PHI:1150, PHI:1152, PHI:1153, PHI:1154, PHI:1155, PHI:1156, PHI:1157, PHI:1158
PMID:19520179 1047171 C5J0G7 PHI:1075, PHI:2126
PMID:19520179 1047171 C5MK57 PHI:1072, PHI:2124
PMID:19520179 1047171 C6K2F1 PHI:1074, PHI:2125
PMID:19520179 1047171 C6KEF4 PHI:1073, PHI:2123
PMID:19520179 1047171 C6KEF5 PHI:1076, PHI:2127
PMID:19909822 5518 I1RKF3 PHI:2418, PHI:2419
PMID:20153837 5270 D2EAX7 PHI:2582, PHI:2603
PMID:20447276 5518 I1RJS9 PHI:2326, PHI:2491
PMID:20601497 272952 Q4VKJ6 PHI:4253, PHI:4254, PHI:4255, PHI:4256, PHI:4257
PMID:20618707 5518 I1RM09 PHI:2325, PHI:2502
PMID:20675574 318829 G4MS03 PHI:2006, PHI:2008
PMID:22028654 5518 A0A098D1L0 PHI:1499, PHI:1501
PMID:22028654 5518 A0A098DDX5 PHI:1648, PHI:1650
PMID:22028654 5518 A0A098E396 PHI:1505, PHI:1624
PMID:22028654 5518 A0A1C3YJ08 PHI:1544, PHI:1545
PMID:22028654 5518 A0A1C3YMR5 PHI:1903, PHI:1905
PMID:22416226 5270 Q4P380 PHI:2586, PHI:3510
PMID:22827542 5507 H9C592 PHI:2590, PHI:2611
PMID:22835272 5599 F8R4Y0 PHI:2587, PHI:2608
PMID:22841690 5037 B2CQJ9 PHI:2588, PHI:2609
PMID:22902811 5599 I3QHH8 PHI:2585, PHI:2606
PMID:23211925 272952 G3C9S3 PHI:4766, PHI:4767
PMID:23734779 272952 G3C9N8 PHI:2946, PHI:4774
PMID:23734779 272952 G3C9Q9 PHI:2945, PHI:4775
PMID:23734779 272952 G3C9T3 PHI:2947, PHI:4773
PMID:23734779 272952 G3C9T8 PHI:2944, PHI:4772
PMID:23883358 100787 S6G070 PHI:3706, PHI:3707
PMID:23937726 552 D4HUY4 PHI:3674, PHI:3680
PMID:23937726 552 D4HUY5 PHI:3673, PHI:3679
PMID:23937726 552 D4HX89 PHI:3675, PHI:3681
PMID:23937726 552 D4I0C5 PHI:3676, PHI:3682
PMID:23937726 552 D4IAW2 PHI:3672, PHI:3678
PMID:23937726 552 Q9X3T0 PHI:3671, PHI:3677
PMID:24261846 31870 C9W7X1 PHI:3933, PHI:9080
PMID:24473076 5270 G0X840 PHI:3130, PHI:4051
PMID:24722578 5518 I1RA07 PHI:4209, PHI:4236
PMID:25166864 287 Q9HWS6 PHI:3211, PHI:3212
PMID:25299517 318829 G4MZS3 PHI:3316, PHI:5605
PMID:25299517 318829 G4NGB1 PHI:3311, PHI:5661
PMID:25299517 318829 G4NII8 PHI:3307, PHI:3313
PMID:26368514 305 Q8XPQ6 PHI:5129, PHI:5166
PMID:26368514 305 Q8XRK9 PHI:5121, PHI:5163
PMID:26368514 305 Q8XYB9 PHI:5141, PHI:5177
PMID:26368514 305 Q8XYE3 PHI:5143, PHI:5179
PMID:26368514 305 Q8XYF8 PHI:5133, PHI:5172
PMID:26368514 305 Q8Y164 PHI:5139, PHI:5175
PMID:26764912 106654 A0A2T7FJE6 PHI:5507, PHI:5522
PMID:27226300 777 Q83FB9 PHI:6350, PHI:6355
PMID:27322386 34373 N1J7E2 PHI:6352, PHI:6357
PMID:27322386 34373 N1JJH4 PHI:6351, PHI:6356
PMID:27613851 317 Q887C1 PHI:6727, PHI:6728
PMID:27911947 632 P17778 PHI:6824, PHI:6830
PMID:28715477 287 A0A0H2ZGI5 PHI:7297, PHI:7304
PMID:28970272 813 A0A0H3MCG4 PHI:10157, PHI:10158
PMID:29109173 28901 D0ZWU0 PHI:9831, PHI:9832, PHI:9833
PMID:29970468 1311 Q8E3H1 PHI:8189, PHI:8190
PMID:30042200 287 A0A0H2Z8M3 PHI:8252, PHI:8254
PMID:30370586 27334 A0A0A2JZB1 PHI:8722, PHI:8724
PMID:30379939 1314 D4QE70 PHI:8604, PHI:8605
PMID:30642903 1280 W8U4S5 PHI:10293, PHI:10292
PMID:30828283 347 A0A0K0GGA9 PHI:9012, PHI:9013
PMID:30828283 347 A0A0K0GHK1 PHI:9016, PHI:9017
PMID:30833360 1781 B2HE54 PHI:10154, PHI:10155
PMID:31802604 5270 A0A0D1E1M6 PHI:11413, PHI:11415
PMID:32678853 5476 Q5ANH2 PHI:10624, PHI:11159
PMID:33200669 318829 G4N713 PHI:10874, PHI:10925
PMID:33475797 746128 B0XMW7 PHI:11453, PHI:11454
PMID:34151378 287 Q9HX66 PHI:11622, PHI:11623
PMID:9100386 317 O08243 PHI:971, PHI:972
PMID:9724634 5693 Q94795 PHI:2580, PHI:2601
PMID:9768518 40559 O94100 PHI:103, PHI:1027
martin2urban commented 8 months ago

PHI-base 4 is literature centric. This means if one gene is reported in two or more articles, than there could be many PHI-IDs per UniprotKB ID.