1.6.1偶发性全局锁死锁

xqdd commented 1 year ago

[x] I have searched the issues of this repository and believe that this is not a duplicate.

Ⅰ. Issue Description

释放全局锁时有死锁发生，偶现

Ⅱ. Describe what happened

2023-06-03T16:25:39.576102Z 0 [Note] [MY-012468] [InnoDB] Transactions deadlock detected, dumping detailed information. (lock0lock.cc:6482)
2023-06-03T16:25:39.576135Z 0 [Note] [MY-012469] [InnoDB]  *** (1) TRANSACTION:  (lock0lock.cc:6496)
TRANSACTION 1421161786, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 3
MySQL thread id 11664211, OS thread handle 70369984151280, query id 4436721816 10.103.104.197 xxx_db update
insert into lock_table(xid, transaction_id, branch_id, resource_id, table_name, pk, row_key, gmt_create, gmt_modified,status) values ('10.103.104.197:8091:2594469194175440306', 2594469194175440306, 2594469194175440312, 'jdbc:mysql://xxx:3306/xxx_db', 'task', '3620248418881662466', 'jdbc:mysql://xxx:3306/xxx_db^^^task^^^3620248418881662466', now(), now(), 0),('10.103.104.197:8091:2594469194175440306', 2594469194175440306, 2594469194175440312, 'jdbc:mysql://xxx:3306/xxx_db', 'table_c', '3620248418881662468', 'jdbc:mysql://xxx:3306/xxx_db^
2023-06-03T16:25:39.576182Z 0 [Note] [MY-012469] [InnoDB]  *** (1) HOLDS THE LOCK(S):  (lock0lock.cc:6496)
RECORD LOCKS space id 1178 page no 4 n bits 96 index PRIMARY of table `production_seata_server`.`lock_table` trx id 1421161786 lock_mode X locks rec but not gap
Record lock, heap no 29 PHYSICAL RECORD: n_fields 12; compact format; info bits 0
 0: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 161 bytes);
 1: len 6; hex 000054b5353a; asc   T 5:;;
 2: len 7; hex 810000011901a9; asc        ;;
 3: len 30; hex 31302e3130332e3130342e3139373a383039313a32353934343639313934; asc 10.103.104.197:8091:2594469194; (total 39 bytes);
 4: len 8; hex a40167fc6b8565b2; asc   g k e ;;
 5: len 8; hex a40167fc6b8565b8; asc   g k e ;;
 6: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 114 bytes);
 7: len 22; hex 72656c696162696c6974795f73656e645f7374617465; asc table_c;;
 8: len 19; hex 33363230323438343138383831363632343638; asc 3620248418881662468;;
 9: len 1; hex 80; asc  ;;
 10: len 5; hex 99b0480667; asc   H g;;
 11: len 5; hex 99b0480667; asc   H g;;

2023-06-03T16:25:39.576707Z 0 [Note] [MY-012469] [InnoDB]  *** (1) WAITING FOR THIS LOCK TO BE GRANTED:  (lock0lock.cc:6496)
RECORD LOCKS space id 1178 page no 4 n bits 96 index PRIMARY of table `production_seata_server`.`lock_table` trx id 1421161786 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 24 PHYSICAL RECORD: n_fields 12; compact format; info bits 32
 0: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 163 bytes);
 1: len 6; hex 000054b5353b; asc   T 5;;;
 2: len 7; hex 02000005c50f8a; asc        ;;
 3: len 30; hex 31302e3130332e3130342e3139373a383039313a32353934343639313934; asc 10.103.104.197:8091:2594469194; (total 39 bytes);
 4: len 8; hex a40167fc6b8565af; asc   g k e ;;
 5: len 8; hex a40167fc6b8565b6; asc   g k e ;;
 6: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 114 bytes);
 7: len 24; hex 72656c696162696c6974795f73656e645f6d657373616765; asc table_b;;
 8: len 19; hex 33363230323438343139373438303835383130; asc 3620248419748085810;;
 9: len 1; hex 80; asc  ;;
 10: len 5; hex 99b0480667; asc   H g;;
 11: len 5; hex 99b0480667; asc   H g;;

2023-06-03T16:25:39.577229Z 0 [Note] [MY-012469] [InnoDB]  *** (2) TRANSACTION:  (lock0lock.cc:6496)
TRANSACTION 1421161787, ACTIVE 0 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 11 row lock(s), undo log entries 3
MySQL thread id 11660753, OS thread handle 70369968439024, query id 4436721817 10.103.104.197 xxx_db updating
delete from lock_table where xid = '10.103.104.197:8091:2594469194175440303'
2023-06-03T16:25:39.577260Z 0 [Note] [MY-012469] [InnoDB]  *** (2) HOLDS THE LOCK(S):  (lock0lock.cc:6496)
RECORD LOCKS space id 1178 page no 4 n bits 96 index PRIMARY of table `production_seata_server`.`lock_table` trx id 1421161787 lock_mode X
Record lock, heap no 4 PHYSICAL RECORD: n_fields 12; compact format; info bits 32
 0: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 161 bytes);
 1: len 6; hex 000054b5345c; asc   T 4\;;
 2: len 7; hex 01000017722c6e; asc     r,n;;
 3: len 30; hex 31302e3130332e302e36343a383039313a35303839343633333837343538; asc 10.103.0.64:8091:5089463387458; (total 36 bytes);
 4: len 8; hex c6a167fc5ad71b67; asc   g Z  g;;
 5: len 8; hex c6a167fc5ad71b6b; asc   g Z  k;;
 6: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 114 bytes);
 7: len 22; hex 666163655f737761705f7461736b5f73657474696e67; asc table_a;;
 8: len 19; hex 33363230323438343138323539323032303234; asc 3620248418259202024;;
 9: len 1; hex 80; asc  ;;
 10: len 5; hex 99b0480666; asc   H f;;
 11: len 5; hex 99b0480666; asc   H f;;

Record lock, heap no 5 PHYSICAL RECORD: n_fields 12; compact format; info bits 32
 0: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 163 bytes);
 1: len 6; hex 000054b5345c; asc   T 4\;;
 2: len 7; hex 01000017722f98; asc     r/ ;;
 3: len 30; hex 31302e3130332e302e36343a383039313a35303839343633333837343538; asc 10.103.0.64:8091:5089463387458; (total 36 bytes);
 4: len 8; hex c6a167fc5ad71b67; asc   g Z  g;;
 5: len 8; hex c6a167fc5ad71b6b; asc   g Z  k;;
 6: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 114 bytes);
 7: len 24; hex 72656c696162696c6974795f73656e645f6d657373616765; asc table_b;;
 8: len 19; hex 33363230323438343138323539323032303236; asc 3620248418259202026;;
 9: len 1; hex 80; asc  ;;
 10: len 5; hex 99b0480666; asc   H f;;
 11: len 5; hex 99b0480666; asc   H f;;

Record lock, heap no 6 PHYSICAL RECORD: n_fields 12; compact format; info bits 32
 0: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 161 bytes);
 1: len 6; hex 000054b5345c; asc   T 4\;;
 2: len 7; hex 010000177232c2; asc     r2 ;;
 3: len 30; hex 31302e3130332e302e36343a383039313a35303839343633333837343538; asc 10.103.0.64:8091:5089463387458; (total 36 bytes);
 4: len 8; hex c6a167fc5ad71b67; asc   g Z  g;;
 5: len 8; hex c6a167fc5ad71b6b; asc   g Z  k;;
 6: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 114 bytes);
 7: len 22; hex 72656c696162696c6974795f73656e645f7374617465; asc table_c;;
 8: len 19; hex 33363230323438343138323539323032303236; asc 3620248418259202026;;
 9: len 1; hex 80; asc  ;;
 10: len 5; hex 99b0480666; asc   H f;;
 11: len 5; hex 99b0480666; asc   H f;;

Record lock, heap no 7 PHYSICAL RECORD: n_fields 12; compact format; info bits 32
 0: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 163 bytes);
 1: len 6; hex 000054b5345c; asc   T 4\;;
 2: len 7; hex 01000017722e01; asc     r. ;;
 3: len 30; hex 31302e3130332e302e36343a383039313a35303839343633333837343538; asc 10.103.0.64:8091:5089463387458; (total 36 bytes);
 4: len 8; hex c6a167fc5ad71b67; asc   g Z  g;;
 5: len 8; hex c6a167fc5ad71b6b; asc   g Z  k;;
 6: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 114 bytes);
 7: len 24; hex 72656c696162696c6974795f73656e645f6d657373616765; asc table_b;;
 8: len 19; hex 33363230323438343138323539323032303235; asc 3620248418259202025;;
 9: len 1; hex 80; asc  ;;
 10: len 5; hex 99b0480666; asc   H f;;
 11: len 5; hex 99b0480666; asc   H f;;

Record lock, heap no 8 PHYSICAL RECORD: n_fields 12; compact format; info bits 32
 0: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 161 bytes);
 1: len 6; hex 000054b5345c; asc   T 4\;;
 2: len 7; hex 0100001772312f; asc     r1/;;
 3: len 30; hex 31302e3130332e302e36343a383039313a35303839343633333837343538; asc 10.103.0.64:8091:5089463387458; (total 36 bytes);
 4: len 8; hex c6a167fc5ad71b67; asc   g Z  g;;
 5: len 8; hex c6a167fc5ad71b6b; asc   g Z  k;;
 6: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 114 bytes);
 7: len 22; hex 72656c696162696c6974795f73656e645f7374617465; asc table_c;;
 8: len 19; hex 33363230323438343138323539323032303235; asc 3620248418259202025;;
 9: len 1; hex 80; asc  ;;
 10: len 5; hex 99b0480666; asc   H f;;
 11: len 5; hex 99b0480666; asc   H f;;

Record lock, heap no 15 PHYSICAL RECORD: n_fields 12; compact format; info bits 32
 0: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 161 bytes);
 1: len 6; hex 000054b53476; asc   T 4v;;
 2: len 7; hex 0100000e0122cd; asc      " ;;
 3: len 30; hex 31302e3130332e302e36343a383039313a35303839343633333837343538; asc 10.103.0.64:8091:5089463387458; (total 36 bytes);
 4: len 8; hex c6a167fc5ad71b6d; asc   g Z  m;;
 5: len 8; hex c6a167fc5ad71b71; asc   g Z  q;;
 6: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 114 bytes);
 7: len 22; hex 72656c696162696c6974795f73656e645f7374617465; asc table_c;;
 8: len 19; hex 33363230323438343138323539323032303238; asc 3620248418259202028;;
 9: len 1; hex 80; asc  ;;
 10: len 5; hex 99b0480666; asc   H f;;
 11: len 5; hex 99b0480666; asc   H f;;

Record lock, heap no 16 PHYSICAL RECORD: n_fields 12; compact format; info bits 32
 0: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 163 bytes);
 1: len 6; hex 000054b53476; asc   T 4v;;
 2: len 7; hex 0100000e012136; asc      !6;;
 3: len 30; hex 31302e3130332e302e36343a383039313a35303839343633333837343538; asc 10.103.0.64:8091:5089463387458; (total 36 bytes);
 4: len 8; hex c6a167fc5ad71b6d; asc   g Z  m;;
 5: len 8; hex c6a167fc5ad71b71; asc   g Z  q;;
 6: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 114 bytes);
 7: len 24; hex 72656c696162696c6974795f73656e645f6d657373616765; asc table_b;;
 8: len 19; hex 33363230323438343138323539323032303238; asc 3620248418259202028;;
 9: len 1; hex 80; asc  ;;
 10: len 5; hex 99b0480666; asc   H f;;
 11: len 5; hex 99b0480666; asc   H f;;

Record lock, heap no 24 PHYSICAL RECORD: n_fields 12; compact format; info bits 32
 0: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 163 bytes);
 1: len 6; hex 000054b5353b; asc   T 5;;;
 2: len 7; hex 02000005c50f8a; asc        ;;
 3: len 30; hex 31302e3130332e3130342e3139373a383039313a32353934343639313934; asc 10.103.104.197:8091:2594469194; (total 39 bytes);
 4: len 8; hex a40167fc6b8565af; asc   g k e ;;
 5: len 8; hex a40167fc6b8565b6; asc   g k e ;;
 6: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 114 bytes);
 7: len 24; hex 72656c696162696c6974795f73656e645f6d657373616765; asc table_b;;
 8: len 19; hex 33363230323438343139373438303835383130; asc 3620248419748085810;;
 9: len 1; hex 80; asc  ;;
 10: len 5; hex 99b0480667; asc   H g;;
 11: len 5; hex 99b0480667; asc   H g;;

Record lock, heap no 26 PHYSICAL RECORD: n_fields 12; compact format; info bits 32
 0: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 161 bytes);
 1: len 6; hex 000054b5353b; asc   T 5;;;
 2: len 7; hex 02000005c50df4; asc        ;;
 3: len 30; hex 31302e3130332e3130342e3139373a383039313a32353934343639313934; asc 10.103.104.197:8091:2594469194; (total 39 bytes);
 4: len 8; hex a40167fc6b8565af; asc   g k e ;;
 5: len 8; hex a40167fc6b8565b6; asc   g k e ;;
 6: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 114 bytes);
 7: len 22; hex 666163655f737761705f7461736b5f73657474696e67; asc table_a;;
 8: len 19; hex 33363230323438343139373438303835383039; asc 3620248419748085809;;
 9: len 1; hex 80; asc  ;;
 10: len 5; hex 99b0480667; asc   H g;;
 11: len 5; hex 99b0480667; asc   H g;;

Record lock, heap no 27 PHYSICAL RECORD: n_fields 12; compact format; info bits 32
 0: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 163 bytes);
 1: len 6; hex 000054b5353b; asc   T 5;;;
 2: len 7; hex 02000005c51124; asc       $;;
 3: len 30; hex 31302e3130332e3130342e3139373a383039313a32353934343639313934; asc 10.103.104.197:8091:2594469194; (total 39 bytes);
 4: len 8; hex a40167fc6b8565af; asc   g k e ;;
 5: len 8; hex a40167fc6b8565b6; asc   g k e ;;
 6: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 114 bytes);
 7: len 24; hex 72656c696162696c6974795f73656e645f6d657373616765; asc table_b;;
 8: len 19; hex 33363230323438343139373438303835383131; asc 3620248419748085811;;
 9: len 1; hex 80; asc  ;;
 10: len 5; hex 99b0480667; asc   H g;;
 11: len 5; hex 99b0480667; asc   H g;;

2023-06-03T16:25:39.582068Z 0 [Note] [MY-012469] [InnoDB]  *** (2) WAITING FOR THIS LOCK TO BE GRANTED:  (lock0lock.cc:6496)
RECORD LOCKS space id 1178 page no 4 n bits 96 index PRIMARY of table `production_seata_server`.`lock_table` trx id 1421161787 lock_mode X waiting
Record lock, heap no 29 PHYSICAL RECORD: n_fields 12; compact format; info bits 0
 0: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 161 bytes);
 1: len 6; hex 000054b5353a; asc   T 5:;;
 2: len 7; hex 810000011901a9; asc        ;;
 3: len 30; hex 31302e3130332e3130342e3139373a383039313a32353934343639313934; asc 10.103.104.197:8091:2594469194; (total 39 bytes);
 4: len 8; hex a40167fc6b8565b2; asc   g k e ;;
 5: len 8; hex a40167fc6b8565b8; asc   g k e ;;
 6: len 30; hex 6a6462633a6d7973716c3a2f2f61692d6261636b656e642d6175726f7261; asc jdbc:mysql://xxx; (total 114 bytes);
 7: len 22; hex 72656c696162696c6974795f73656e645f7374617465; asc table_c;;
 8: len 19; hex 33363230323438343138383831363632343638; asc 3620248418881662468;;
 9: len 1; hex 80; asc  ;;
 10: len 5; hex 99b0480667; asc   H g;;
 11: len 5; hex 99b0480667; asc   H g;;

2023-06-03T16:25:39.582564Z 0 [Note] [MY-012469] [InnoDB] *** WE ROLL BACK TRANSACTION (2)  (lock0lock.cc:6496)

Ⅲ. Describe what you expected to happen

框架不应该出现死锁

Ⅳ. How to reproduce it (as minimally and precisely as possible)

暂未验证

Ⅴ. Anything else we need to know?

server端1.5.2遇到过，升到1.6.1问题依旧表结构

# 生产表结构
# Schema: production_seata_server Table: lock_table  
-- auto-generated definition
create table lock_table
(
    row_key        varchar(512)      not null
        primary key,
    xid            varchar(128)      null,
    transaction_id bigint            null,
    branch_id      bigint            not null,
    resource_id    varchar(256)      null,
    table_name     varchar(32)       null,
    pk             varchar(36)       null,
    status         tinyint default 0 not null comment '0:locked ,1:rollbacking',
    gmt_create     datetime          null,
    gmt_modified   datetime          null
);

create index idx_branch_id
    on lock_table (branch_id);

create index idx_status
    on lock_table (status);

create index idx_xid
    on lock_table (xid);

Ⅵ. Environment:

JDK version(e.g. java -version): jdk8
Seata client/server version: 1.5.2/1.6.1
Database version: mysql8
OS(e.g. uname -a): linux

funky-eyes commented 1 year ago

seata server日志发出来

funky-eyes commented 1 year ago

把performance_schema打开看看更详细的内容，这两条sql没有看出有什么锁定冲突，但实际上产生了死锁，怀疑数据库bug，修改为rc隔离级别可能会解决问题，执行show index from lock_table 截图给我看下结果

Bughue commented 1 year ago

是不是频繁删除和插入？S锁和X锁打架，mysql的5.7和8本身都有这种问题 https://bugs.mysql.com/bug.php?id=68021 https://bugs.mysql.com/bug.php?id=21356

xqdd commented 1 year ago

把performance_schema打开看看更详细的内容，这两条sql没有看出有什么锁定冲突，但实际上产生了死锁，怀疑数据库bug，修改为rc隔离级别可能会解决问题，执行show index from lock_table 截图给我看下结果

Table	Non_unique	Key_name	Seq_in_index	Column_name	Collation	Sub_part	Packed	Null	Index_type	Visible	Expression
lock_table	0	PRIMARY	1	row_key	A	null	null		BTREE	YES	null
lock_table	1	idx_status	1	status	A	null	null		BTREE	YES	null
lock_table	1	idx_branch_id	1	branch_id	A	null	null		BTREE	YES	null
lock_table	1	idx_xid	1	xid	A	null	null	YES	BTREE	YES	null

xqdd commented 1 year ago

seata server日志发出来

这个日志得申请一下，暂时还导不出来

xqdd commented 1 year ago

是不是频繁删除和插入？S锁和X锁打架，mysql的5.7和8本身都有这种问题 https://bugs.mysql.com/bug.php?id=68021 https://bugs.mysql.com/bug.php?id=21356

最高TPS=2左右，每次全局事务涉及10个表，这样算插入或删除lock_table的QPS=20，不算特别频繁吧？

funky-eyes commented 1 year ago

索引看着没问题，大量用户反馈该问题的后续跟进反馈结果是正常了，可能跟mysql版本有关，这个sql是常见的不能再常见的sql了，delete只有一个条件xid，之前的死锁是xid 和in branchid导致的，拆分开后给xid单独加上索引就没这个问题了，你可以先改成rc读已提交为数据库的默认隔离级别来观察一段时间

xqdd commented 1 year ago

索引看着没问题，大量用户反馈该问题的后续跟进反馈结果是正常了，可能跟mysql版本有关，这个sql是常见的不能再常见的sql了，delete只有一个条件xid，之前的死锁是xid 和in branchid导致的，拆分开后给xid单独加上索引就没这个问题了，你可以先改成rc读已提交为数据库的默认隔离级别来观察一段时间

好的，感谢，我再研究一下

xqdd commented 1 year ago

@a364176773 复现了，mysql数据库版本8.0.31，事务隔离级别RR

建表语句

CREATE TABLE `lock_table` (
  `row_key` varchar(512) NOT NULL,
  `xid` varchar(128) DEFAULT NULL,
  PRIMARY KEY (`row_key`),
  KEY `idx_xid` (`xid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

tx:1分支事务注册成功（本地测试数据需要大于等于9条）

insert into lock_table (row_key, xid)
values
       ('lock:10', 'tx:1'),
       ('lock:20', 'tx:1'),
       ('lock:30', 'tx:1'),
       ('lock:40', 'tx:1'),
       ('lock:50', 'tx:1'),
       ('lock:60', 'tx:1'),
       ('lock:70', 'tx:1'),
       ('lock:80', 'tx:1'),
       ('lock:90', 'tx:1');

tx:2分支事务注册，涉及多个表，即有多个锁记录，先锁第一条

insert into lock_table (row_key, xid) values('lock:21', 'tx:2');

https://github.com/seata/seata/blob/cd339cde06308a5692762a101303efcbdde32da8/server/src/main/java/io/seata/server/storage/db/lock/LockStoreDataBaseDAO.java#L354

在tx:2分支添加锁记录期间，tx:1全局事务提交

delete
from lock_table
where xid = 'tx:1';

https://github.com/seata/seata/blob/cd339cde06308a5692762a101303efcbdde32da8/server/src/main/java/io/seata/server/storage/db/lock/LockStoreDataBaseDAO.java#L244

tx:2分支事务注册，涉及多个表，即有多个锁记录，再锁第二条，死锁触发

insert into lock_table (row_key, xid) values('lock:19', 'tx:2');

funky-eyes commented 1 year ago

我也是8.0.31 没有复现

funky-eyes commented 1 year ago

隔离级别索引也完全与你一致

xqdd commented 1 year ago

@a364176773 用一下我给的数据

delete
from lock_table
where xid = 'tx:1';

insert into lock_table (row_key, xid, branch_id)
values ('lock:10', 'tx:1', '1'),
       ('lock:20', 'tx:1', '1'),
       ('lock:30', 'tx:1', '1'),
       ('lock:40', 'tx:1', '1'),
       ('lock:50', 'tx:1', '1'),
       ('lock:60', 'tx:1', '1'),
       ('lock:70', 'tx:1', '1'),
       ('lock:80', 'tx:1', '1'),
       ('lock:90', 'tx:1', '1');

------分隔线---

insert into lock_table (row_key, xid,branch_id) values('lock:21', 'tx:2','2');

insert into lock_table (row_key, xid,branch_id) values('lock:19', 'tx:2','2');

https://github.com/seata/seata/assets/20871761/947b9291-826a-4503-9666-d413767990ae

xqdd commented 1 year ago

应该是间隙锁或next key锁导致的，具体原因我还在研究

funky-eyes commented 1 year ago

应该是间隙锁或next key锁导致的，具体原因我还在研究

我也怀疑是这个问题，你试着改成rc隔离级别，是不是就复现不出来了

funky-eyes commented 1 year ago

@a364176773 用一下我给的数据

delete
from lock_table
where xid = 'tx:1';

insert into lock_table (row_key, xid, branch_id)
values ('lock:10', 'tx:1', '1'),
       ('lock:20', 'tx:1', '1'),
       ('lock:30', 'tx:1', '1'),
       ('lock:40', 'tx:1', '1'),
       ('lock:50', 'tx:1', '1'),
       ('lock:60', 'tx:1', '1'),
       ('lock:70', 'tx:1', '1'),
       ('lock:80', 'tx:1', '1'),
       ('lock:90', 'tx:1', '1');

------分隔线---

insert into lock_table (row_key, xid,branch_id) values('lock:21', 'tx:2','2');

insert into lock_table (row_key, xid,branch_id) values('lock:19', 'tx:2','2');

Clipchamp.mp4

好，晚点我试下

xqdd commented 1 year ago

应该是间隙锁或next key锁导致的，具体原因我还在研究

我也怀疑是这个问题，你试着改成rc隔离级别，是不是就复现不出来了

是的，RC没这个问题

xqdd commented 1 year ago

@a364176773 原因应该知道了：

当表中索引区分度很低并重复值达到一定条数时候，delete操作不会走索引，走的是全表扫描

根据InnoDB事务锁系统及其实现这里给出的结论

RR+无索引

在Repeatable Read隔离级别下，如果进行全表扫描的当前读（可能是没有索引，也可能是有索引，但是查询计划并没有走索引），那么会对主键索引上的所有记录加上next key locks，对supremum加上next key locks，对满足条件的记录的索引加上记录锁（lock_x | lock_rec_not_gap），杜绝所有的并发更新/删除/插入操作。

此时的delete操作会尝试将整个表数据都加上next key锁，并且观察到加这个锁是按顺序的，那么可以复现我们的场景：

数据库中已存在数据

insert into lock_table (row_key) values(1); insert into lock_table (row_key) values(5); insert into lock_table (row_key) values(10);

全局事务a注册分支时，需要添加多个全局锁，假设row_key分别是3，7，并先添加了7

insert into lock_table (row_key) values(7);

这时候事务a持有主键值为7的记录锁

全局事务a释放全局锁

此时区分度过低，delete操作走的全表扫描，会尝试全表加next key锁，按顺序的话，会锁(-∞,1]、(1,5]，然后遇到了事务a持有的7的记录锁，只会锁间隙(5,7)，这时候会等待事务a释放记录锁

全局事务b注册分支时，插入row_key为3的记录

这时候发现事务b持有(1,5]的next key锁，触发死锁

xqdd commented 1 year ago

RC隔离级别下没问题的原因是此时的delete操作不会锁间隙

funky-eyes commented 1 year ago

我们计划将server的connection全部设置为rc，因为rr的特效对server其实是没有用到，rr效率也比rc低，如果你感兴趣的话可以尝试认领这个任务去提交pr，这样就对这个问题的死锁基本上一劳永逸了

xqdd commented 1 year ago

我们计划将server的connection全部设置为rc，因为rr的特效对server其实是没有用到，rr效率也比rc低，如果你感兴趣的话可以尝试认领这个任务去提交pr，这样就对这个问题的死锁基本上一劳永逸了

周末可以试试

funky-eyes commented 1 year ago

我们计划将server的connection全部设置为rc，因为rr的特效对server其实是没有用到，rr效率也比rc低，如果你感兴趣的话可以尝试认领这个任务去提交pr，这样就对这个问题的死锁基本上一劳永逸了

周末可以试试

好，那先把这个任务交给你啦，有问题再沟通

apache / incubator-seata