alkem-io / server

Core server in the Alkemio platform, offering a GraphQL api for interacting with the logical domain model.
http://alkem.io
European Union Public License 1.2
24 stars 4 forks source link

BUG: Deadlocks on StorageBucket updates #3938

Open valentinyanakiev opened 2 months ago

valentinyanakiev commented 2 months ago

Describe the bug Storage Bucket updates lead to deadlocks.

To Reproduce This is not easy to reproduce, it happens very rarely.

Expected behavior No deadlocks / managed deadlocks.

Additional context Deadlock Analysis Transaction 1:

Transaction ID: 6450668 Duration: Active for 12 seconds Query: UPDATE document SET storageBucketId = 'ceeb1185-1477-4587-9f80-4f669fbd0788', version = version + 1, updatedDate = CURRENT_TIMESTAMP WHERE id IN ('6881b4ef-43d3-4fbd-a6f6-9dd4a4754d37') Holds Lock On: PRIMARY index of document table, record 05ab7276-a50e-451c-a22e-6379bd. Waiting For: PRIMARY index of document table, record 6881b4ef-43d3-4fbd-a6f6-9dd4a4754d37. Transaction 2:

Transaction ID: 6450455 Duration: Active for 84 seconds Query: UPDATE document SET storageBucketId = NULL, version = version + 1, updatedDate = CURRENT_TIMESTAMP WHERE id = '05ab7276-a50e-451c-a22e-6379bd3071b2' Holds Lock On: PRIMARY index of document table, record 6881b4ef-43d3-4fbd-a6f6-9dd4a4754d37. Waiting For: PRIMARY index of document table, record 05ab7276-a50e-451c-a22e-6379bd.

=====================================
2024-05-17 07:49:20 139671936792320 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 8 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 7668821 srv_active, 0 srv_shutdown, 166306 srv_idle
srv_master_thread log flush and writes: 0
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 1293530
OS WAIT ARRAY INFO: signal count 547073
RW-shared spins 0, rounds 0, OS waits 0
RW-excl spins 0, rounds 0, OS waits 0
RW-sx spins 0, rounds 0, OS waits 0
Spin rounds per wait: 0.00 RW-shared, 0.00 RW-excl, 0.00 RW-sx
------------------------
LATEST DETECTED DEADLOCK
------------------------
2024-05-17 07:26:20 139672000919296
*** (1) TRANSACTION:
TRANSACTION 6450668, ACTIVE 12 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 12 lock struct(s), heap size 1128, 10 row lock(s), undo log entries 8
MySQL thread id 20528600, OS thread handle 139669807732480, query id 308166904 10.244.7.213 root updating
UPDATE `document` SET `storageBucketId` = 'ceeb1185-1477-4587-9f80-4f669fbd0788', `version` = `version` + 1, `updatedDate` = CURRENT_TIMESTAMP WHERE `id` IN ('6881b4ef-43d3-4fbd-a6f6-9dd4a4754d37')

*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 35 page no 1146 n bits 112 index PRIMARY of table `alkemio`.`document` trx id 6450668 lock_mode X locks rec but not gap
Record lock, heap no 45 PHYSICAL RECORD: n_fields 14; compact format; info bits 0
 0: len 30; hex 30356162373237362d613530652d343531632d613232652d363337396264; asc 05ab7276-a50e-451c-a22e-6379bd; (total 36 bytes);
 1: len 6; hex 000000626dec; asc    bm ;;
 2: len 7; hex 02000001a01abb; asc        ;;
 3: len 8; hex 99b36275e30d0e60; asc   bu   `;;
 4: len 8; hex 99b3627688000000; asc   bv    ;;
 5: len 30; hex 38336336646139642d333034652d343861322d613964612d356535376538; asc 83c6da9d-304e-48a2-a9da-5e57e8; (total 36 bytes);
 6: len 4; hex 80000004; asc     ;;
 7: len 30; hex 39373262393465382d633737642d346637392d386539652d356137346538; asc 972b94e8-c77d-4f79-8e9e-5a74e8; (total 36 bytes);
 8: len 30; hex 63656562313138352d313437372d343538372d396638302d346636363966; asc ceeb1185-1477-4587-9f80-4f669f; (total 36 bytes);
 9: len 9; hex 696d6167652e706e67; asc image.png;;
 10: len 30; hex 61623461383531382d306335342d346133612d396238392d316139623163; asc ab4a8518-0c54-4a3a-9b89-1a9b1c; (total 36 bytes);
 11: len 9; hex 696d6167652f706e67; asc image/png;;
 12: len 4; hex 8000a774; asc    t;;
 13: len 30; hex 356236316639353132646630636261653337633537396238336132303566; asc 5b61f9512df0cbae37c579b83a205f; (total 64 bytes);

*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 35 page no 172 n bits 120 index PRIMARY of table `alkemio`.`document` trx id 6450668 lock_mode X locks rec but not gap waiting
Record lock, heap no 47 PHYSICAL RECORD: n_fields 14; compact format; info bits 0
 0: len 30; hex 36383831623465662d343364332d346662642d613666362d396464346134; asc 6881b4ef-43d3-4fbd-a6f6-9dd4a4; (total 36 bytes);
 1: len 6; hex 000000626d36; asc    bm6;;
 2: len 7; hex 01000001a207c2; asc        ;;
 3: len 8; hex 99b3627313001c46; asc   bs   F;;
 4: len 8; hex 99b36274b8000000; asc   bt    ;;
 5: len 30; hex 38336336646139642d333034652d343861322d613964612d356535376538; asc 83c6da9d-304e-48a2-a9da-5e57e8; (total 36 bytes);
 6: len 4; hex 80000004; asc     ;;
 7: len 30; hex 62343763333730372d646634352d343231372d626638312d373330353064; asc b47c3707-df45-4217-bf81-73050d; (total 36 bytes);
 8: SQL NULL;
 9: len 30; hex 53636865726d61666265656c64696e6720323032342d30352d3037206f6d; asc Schermafbeelding 2024-05-07 om; (total 43 bytes);
 10: len 30; hex 36313365653839312d626332342d343965302d616234392d623138313563; asc 613ee891-bc24-49e0-ab49-b1815c; (total 36 bytes);
 11: len 9; hex 696d6167652f706e67; asc image/png;;
 12: len 4; hex 8003eda5; asc     ;;
 13: len 30; hex 323434636565396661613263653933333035303464356262343262366333; asc 244cee9faa2ce9330504d5bb42b6c3; (total 64 bytes);

*** (2) TRANSACTION:
TRANSACTION 6450455, ACTIVE 84 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 33 lock struct(s), heap size 3488, 29 row lock(s), undo log entries 16
MySQL thread id 20528603, OS thread handle 139669821470464, query id 308167492 10.244.7.213 root updating
UPDATE `document` SET `storageBucketId` = NULL, `version` = `version` + 1, `updatedDate` = CURRENT_TIMESTAMP WHERE `id` = '05ab7276-a50e-451c-a22e-6379bd3071b2'

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 35 page no 172 n bits 120 index PRIMARY of table `alkemio`.`document` trx id 6450455 lock_mode X locks rec but not gap
Record lock, heap no 47 PHYSICAL RECORD: n_fields 14; compact format; info bits 0
 0: len 30; hex 36383831623465662d343364332d346662642d613666362d396464346134; asc 6881b4ef-43d3-4fbd-a6f6-9dd4a4; (total 36 bytes);
 1: len 6; hex 000000626d36; asc    bm6;;
 2: len 7; hex 01000001a207c2; asc        ;;
 3: len 8; hex 99b3627313001c46; asc   bs   F;;
 4: len 8; hex 99b36274b8000000; asc   bt    ;;
 5: len 30; hex 38336336646139642d333034652d343861322d613964612d356535376538; asc 83c6da9d-304e-48a2-a9da-5e57e8; (total 36 bytes);
 6: len 4; hex 80000004; asc     ;;
 7: len 30; hex 62343763333730372d646634352d343231372d626638312d373330353064; asc b47c3707-df45-4217-bf81-73050d; (total 36 bytes);
 8: SQL NULL;
 9: len 30; hex 53636865726d61666265656c64696e6720323032342d30352d3037206f6d; asc Schermafbeelding 2024-05-07 om; (total 43 bytes);
 10: len 30; hex 36313365653839312d626332342d343965302d616234392d623138313563; asc 613ee891-bc24-49e0-ab49-b1815c; (total 36 bytes);
 11: len 9; hex 696d6167652f706e67; asc image/png;;
 12: len 4; hex 8003eda5; asc     ;;
 13: len 30; hex 323434636565396661613263653933333035303464356262343262366333; asc 244cee9faa2ce9330504d5bb42b6c3; (total 64 bytes);

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 35 page no 1146 n bits 112 index PRIMARY of table `alkemio`.`document` trx id 6450455 lock_mode X locks rec but not gap waiting
Record lock, heap no 45 PHYSICAL RECORD: n_fields 14; compact format; info bits 0
 0: len 30; hex 30356162373237362d613530652d343531632d613232652d363337396264; asc 05ab7276-a50e-451c-a22e-6379bd; (total 36 bytes);
 1: len 6; hex 000000626dec; asc    bm ;;
 2: len 7; hex 02000001a01abb; asc        ;;
 3: len 8; hex 99b36275e30d0e60; asc   bu   `;;
 4: len 8; hex 99b3627688000000; asc   bv    ;;
 5: len 30; hex 38336336646139642d333034652d343861322d613964612d356535376538; asc 83c6da9d-304e-48a2-a9da-5e57e8; (total 36 bytes);
 6: len 4; hex 80000004; asc     ;;
 7: len 30; hex 39373262393465382d633737642d346637392d386539652d356137346538; asc 972b94e8-c77d-4f79-8e9e-5a74e8; (total 36 bytes);
 8: len 30; hex 63656562313138352d313437372d343538372d396638302d346636363966; asc ceeb1185-1477-4587-9f80-4f669f; (total 36 bytes);
 9: len 9; hex 696d6167652e706e67; asc image.png;;
 10: len 30; hex 61623461383531382d306335342d346133612d396238392d316139623163; asc ab4a8518-0c54-4a3a-9b89-1a9b1c; (total 36 bytes);
 11: len 9; hex 696d6167652f706e67; asc image/png;;
 12: len 4; hex 8000a774; asc    t;;
 13: len 30; hex 356236316639353132646630636261653337633537396238336132303566; asc 5b61f9512df0cbae37c579b83a205f; (total 64 bytes);

*** WE ROLL BACK TRANSACTION (1)
------------
TRANSACTIONS
------------
Trx id counter 6452226
Purge done for trx's n:o < 6452225 undo n:o < 0 state: running but idle
History list length 0
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421147443209696, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443206464, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443235552, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443229896, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443212928, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443207272, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443256560, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443255752, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443254944, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443251712, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443247672, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443234744, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443230704, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443222624, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443254136, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443237168, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443269488, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443268680, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443267872, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443267064, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443253328, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443252520, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443250904, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443248480, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443208888, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443250096, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443249288, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443246864, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443246056, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443245248, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443244440, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443243632, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443242824, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443242016, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443241208, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443240400, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443239592, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443238784, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443236360, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443233936, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443233128, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443232320, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443231512, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443229088, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443228280, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443227472, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443226664, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443225856, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443225048, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443224240, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443223432, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443221816, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443221008, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443220200, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443219392, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443218584, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443217776, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443216968, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443216160, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443214544, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443213736, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443212120, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443211312, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443210504, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443208080, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443200808, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443199192, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443200000, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443205656, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443204848, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443204040, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443203232, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443202424, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443201616, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443198384, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421147443197576, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests ((null))
I/O thread 1 state: waiting for completed aio requests (insert buffer thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
 ibuf aio reads:
Pending flushes (fsync) log: 0; buffer pool: 0
23209995 OS file reads, 23319692 OS file writes, 15839746 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 1.34 writes/s, 1.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 11, seg size 13, 95404 merges
merged operations:
 insert 81250, delete mark 30544, delete 1530
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 34679, node heap has 11 buffer(s)
Hash table size 34679, node heap has 44 buffer(s)
Hash table size 34679, node heap has 6 buffer(s)
Hash table size 34679, node heap has 53 buffer(s)
Hash table size 34679, node heap has 8 buffer(s)
Hash table size 34679, node heap has 2 buffer(s)
Hash table size 34679, node heap has 7 buffer(s)
Hash table size 34679, node heap has 12 buffer(s)
550.56 hash searches/s, 79.99 non-hash searches/s
---
LOG
---
Log sequence number          6125963308
Log buffer assigned up to    6125963308
Log buffer completed up to   6125963308
Log written up to            6125963308
Log flushed up to            6125963308
Added dirty pages up to      6125963308
Pages flushed up to          6125907862
Last checkpoint at           6125907862
Log minimum file id is       1868
Log maximum file id is       1870
13995296 log i/o's done, 1.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 0
Dictionary memory allocated 2775211
Buffer pool size   8192
Free buffers       1024
Database pages     7025
Old database pages 2573
Modified db pages  84
Pending reads      0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 14837885, not young 119755565
0.11 youngs/s, 0.00 non-youngs/s
Pages read 23209175, created 156774, written 7237299
0.00 reads/s, 0.11 creates/s, 0.22 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 7025, unzip_LRU len: 0
I/O sum[45]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Process ID=1, Main thread ID=139671975741184 , state=sleeping
Number of rows inserted 631686, updated 2619742, deleted 40638, read 3698410391
0.00 inserts/s, 0.12 updates/s, 0.00 deletes/s, 491.19 reads/s
Number of system rows inserted 1707965, updated 95372, deleted 1702786, read 6272092
0.12 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
me-andre commented 2 weeks ago

I'm not an expert in SQL but the query looks like a good candidate for an explicit lock with SELECT FOR UPDATE. Any thoughts anyone?