codership / mysql-wsrep

wsrep API patch for MySQL server
Other
64 stars 34 forks source link

Multi-table OPTIMIZE, REPAIR leads to BF-BF conflict, node abort #198

Open philip-galera opened 9 years ago

philip-galera commented 9 years ago

With multi-table OPTIMIZE TABLE:

Sep 23 19:36:13 mysqld: 150923 19:36:13 [Note] WSREP: MDL BF-BF conflict
Sep 23 19:36:13 mysqld: request: (3 #011seqno 2188616 #011wsrep (1, 1, 0) cmd 0 45 #011OPTIMIZE TABLE t1,t2,t3)
Sep 23 19:36:13 mysqld: granted: (1 #011seqno 2188623 #011wsrep (1, 0, 0) cmd 0 146 #011(null))
Sep 23 19:36:13 mysqld: 150923 19:36:13 [ERROR] WSREP: Trx 2188616 tries to abort slave trx 2188623. This could be caused by:

Teemu says:

[13:30:09] Teemu Ollakka: so, optimize is executed in TOI
[13:30:14] Teemu Ollakka: and there is BF-BF conflict
[13:30:36] Teemu Ollakka: which suggests that it does not take all table names as keys

[13:34:32] Teemu Ollakka:   WSREP_TO_ISOLATION_BEGIN(first_table->db, first_table->table_name, NULL)
[13:34:45] Teemu Ollakka: in Sql_cmd_optimize_table::execute()
[13:34:57] Teemu Ollakka: so full table list is not passed to OPTIMIZE TOI
ayurchen commented 9 years ago

What was the conclusion from our prior discussion? Is OPTIMIZE TABLES binlogged? Do we even want to replicate it?

philip-galera commented 9 years ago

Optimize table is binlogged, so should be replicated.

philip-galera commented 9 years ago

Test case:

--source include/galera_cluster.inc
--source include/have_innodb.inc
--source include/have_binlog_format_row.inc

CREATE TABLE t1 (id INT PRIMARY KEY) ENGINE=InnoDB;
CREATE TABLE t2 (id INT PRIMARY KEY) ENGINE=InnoDB;

--connection node_2
LOCK TABLE t2 WRITE;

--connect node_2a, 127.0.0.1, root, , test, $NODE_MYPORT_2
--connection node_2a
--send OPTIMIZE TABLE t1,t2;

--connection node_2
SET SESSION wsrep_sync_wait = 0;
--let $wait_condition = SELECT COUNT(*) = 1 FROM INFORMATION_SCHEMA.PROCESSLIST WHERE STATE = 'Waiting for table metadata lock'
--source include/wait_condition.inc

--connection node_1
INSERT INTO t2 VALUES (1);

--connection node_2
UNLOCK TABLES;

--connection node_2a
--reap

DROP TABLE t1;
DROP TABLE t2;
2015-09-25 14:31:02 2148 [Note] WSREP: MDL BF-BF conflict
request: (2     seqno 4         wsrep (1, 1, 0) cmd 0 133       (null))
granted: (7     seqno 3         wsrep (2, 1, 0) cmd 3 45        OPTIMIZE TABLE t1,t2)
2015-09-25 14:31:02 2148 [Note] WSREP: MDL BF-BF conflict
request: (7     seqno 3         wsrep (2, 1, 0) cmd 3 45        OPTIMIZE TABLE t1,t2)
granted: (2     seqno 4         wsrep (1, 1, 0) cmd 0 133       (null))
2015-09-25 14:31:02 2148 [Note] WSREP: MDL BF-BF conflict
request: (7     seqno 3         wsrep (2, 1, 0) cmd 3 45        OPTIMIZE TABLE t1,t2)
granted: (2     seqno 4         wsrep (1, 1, 0) cmd 0 133       (null))
2015-09-25 14:31:02 2148 [Note] WSREP: MDL BF-BF conflict
request: (7     seqno 3         wsrep (2, 1, 0) cmd 3 45        OPTIMIZE TABLE t1,t2)
granted: (2     seqno 4         wsrep (1, 0, 0) cmd 0 133       (null))
2015-09-25 14:31:02 2148 [Note] WSREP: MDL BF-BF conflict
request: (7     seqno 3         wsrep (2, 1, 0) cmd 3 45        OPTIMIZE TABLE t1,t2)
granted: (2     seqno 4         wsrep (1, 0, 0) cmd 0 133       (null))
2015-09-25 14:31:02 2148 [ERROR] WSREP: Trx 3 tries to abort slave trx 4. This could be caused by:
#6  0x0000000000b8052c in wsrep_abort_slave_trx (bf_seqno=3, victim_seqno=4) at /home/philips/git/mysql-wsrep/storage/innobase/handler/ha_innodb.cc:16812
#7  0x0000000000b81418 in wsrep_innobase_kill_one_trx (bf_thd_ptr=0x1e45ad0, bf_trx=0x7f9c48033f68, victim_trx=0x7f9c6c0106c8, signal=1) at /home/philips/git/mysql-wsrep/storage/innobase/handler/ha_innodb.cc:16973
#8  0x0000000000bbe4fc in wsrep_kill_victim (trx=0x7f9c48033f68, lock=0x7f9c6c012dd8) at /home/philips/git/mysql-wsrep/storage/innobase/lock/lock0lock.cc:1680
#9  0x0000000000bc4eec in lock_table_other_has_incompatible (trx=0x7f9c48033f68, wait=256, table=0x7f9c6c042db8, mode=LOCK_X) at /home/philips/git/mysql-wsrep/storage/innobase/lock/lock0lock.cc:4739
#10 0x0000000000bc501f in lock_table (flags=0, table=0x7f9c6c042db8, mode=LOCK_X, thr=0x7f9c480414a8) at /home/philips/git/mysql-wsrep/storage/innobase/lock/lock0lock.cc:4795
#11 0x0000000000c54033 in row_merge_lock_table (trx=0x7f9c48033f68, table=0x7f9c6c042db8, mode=LOCK_X) at /home/philips/git/mysql-wsrep/storage/innobase/row/row0merge.cc:2626
#12 0x0000000000b931db in ha_innobase::commit_inplace_alter_table (this=0x7f9c4803a0e0, altered_table=0x7f9c48035c40, ha_alter_info=0x7f9c8814b4d0, commit=true) at /home/philips/git/mysql-wsrep/storage/innobase/handler/handler0alter.cc:5422
#13 0x0000000000653d49 in handler::ha_commit_inplace_alter_table (this=0x7f9c4803a0e0, altered_table=0x7f9c48035c40, ha_alter_info=0x7f9c8814b4d0, commit=true) at /home/philips/git/mysql-wsrep/sql/handler.cc:4497
#14 0x000000000084b210 in mysql_inplace_alter_table (thd=0x1e45ad0, table_list=0x7f9c480052d0, table=0x7f9c48033500, altered_table=0x7f9c48035c40, ha_alter_info=0x7f9c8814b4d0, inplace_supported=HA_ALTER_INPLACE_NO_LOCK_AFTER_PREPARE, target_mdl_request=0x7f9c8814af20, alter_ctx=0x7f9c8814ba30) at /home/philips/git/mysql-wsrep/sql/sql_table.cc:6683
#15 0x000000000084f3b5 in mysql_alter_table (thd=0x1e45ad0, new_db=0x0, new_name=0x0, create_info=0x7f9c8814c8d0, table_list=0x7f9c480052d0, alter_info=0x7f9c8814c840, order_num=0, order=0x0, ignore=false) at /home/philips/git/mysql-wsrep/sql/sql_table.cc:8518
#16 0x00000000008513fa in mysql_recreate_table (thd=0x1e45ad0, table_list=0x7f9c480052d0, table_copy=false) at /home/philips/git/mysql-wsrep/sql/sql_table.cc:9230
#17 0x0000000000980264 in mysql_admin_table(THD *, TABLE_LIST *, HA_CHECK_OPT *, const char *, thr_lock_type, bool, bool, uint, int (*)(THD *, TABLE_LIST *, HA_CHECK_OPT *), struct {...}, int (*)(THD *, TABLE_LIST *)) (thd=0x1e45ad0, tables=0x7f9c48004d30, check_opt=0x1e48838, operator_name=0x1022530 "optimize", lock_type=TL_WRITE, open_for_modify=false, repair_table_use_frm=false, extra_open_options=0, prepare_func=0x0, operator_func=(int (handler::*)(handler * const, THD *, HA_CHECK_OPT *)) 0x653940 <handler::ha_optimize(THD*, st_ha_check_opt*)>, view_operator_func=0x0) at /home/philips/git/mysql-wsrep/sql/sql_admin.cc:798
#18 0x00000000009812ac in Sql_cmd_optimize_table::execute (this=0x7f9c48005838, thd=0x1e45ad0) at /home/philips/git/mysql-wsrep/sql/sql_admin.cc:1133
#19 0x00000000007e3e25 in mysql_execute_command (thd=0x1e45ad0) at /home/philips/git/mysql-wsrep/sql/sql_parse.cc:5467
#20 0x00000000007e7e4f in mysql_parse (thd=0x1e45ad0, rawbuf=0x7f9c48004c50 "OPTIMIZE TABLE t1,t2", length=20, parser_state=0x7f9c8814e6c0) at /home/philips/git/mysql-wsrep/sql/sql_parse.cc:7043
#21 0x00000000007e728c in wsrep_mysql_parse (thd=0x1e45ad0, rawbuf=0x7f9c48004c50 "OPTIMIZE TABLE t1,t2", length=20, parser_state=0x7f9c8814e6c0) at /home/philips/git/mysql-wsrep/sql/sql_parse.cc:6791
#22 0x00000000007d8d13 in dispatch_command (command=COM_QUERY, thd=0x1e45ad0, packet=0x1e49841 "", packet_length=21) at /home/philips/git/mysql-wsrep/sql/sql_parse.cc:1552
#23 0x00000000007d76c3 in do_command (thd=0x1e45ad0) at /home/philips/git/mysql-wsrep/sql/sql_parse.cc:1125
#24 0x00000000007a2feb in do_handle_one_connection (thd_arg=0x1e45ad0) at /home/philips/git/mysql-wsrep/sql/sql_connect.cc:1007
#25 0x00000000007a2b21 in handle_one_connection (arg=0x1e45ad0) at /home/philips/git/mysql-wsrep/sql/sql_connect.cc:916
#26 0x0000000000ad05b6 in pfs_spawn_thread (arg=0x1cc06f0) at /home/philips/git/mysql-wsrep/storage/perfschema/pfs.cc:1860
#27 0x0000003e90c0753a in start_thread () from /lib64/libpthread.so.0
#28 0x0000003e905045fd in clone () from /lib64/libc.so.6
philip-galera commented 9 years ago

REPAIR TABLE is also affected, using the same test case as above, but the assertion is different:

mysqld: /home/philips/git/mysql-wsrep/sql/table_cache.cc:372: void Table_cache_manager::free_table(THD*, enum_tdc_remove_table_type, TABLE_SHARE*): Assertion `0' failed.

#7  0x0000003be5e2dd82 in __assert_fail () from /lib64/libc.so.6
#8  0x00000000008a1d32 in Table_cache_manager::free_table (this=0x17ef680 <table_cache_manager>, thd=0x3b30e30, remove_type=TDC_RT_REMOVE_NOT_OWN, share=0x7fa374089e00) at /home/philips/git/mysql-wsrep/sql/table_cache.cc:372
#9  0x000000000077d87c in tdc_remove_table (thd=0x3b30e30, remove_type=TDC_RT_REMOVE_NOT_OWN, db=0x7fa37408a120 "test", table_name=0x7fa37408a125 "t2", has_lock=false) at /home/philips/git/mysql-wsrep/sql/sql_base.cc:9341
#10 0x0000000000770078 in wait_while_table_is_used (thd=0x3b30e30, table=0x7fa3680330d0, function=HA_EXTRA_PREPARE_FOR_RENAME) at /home/philips/git/mysql-wsrep/sql/sql_base.cc:2305
#11 0x000000000098178a in mysql_admin_table(THD *, TABLE_LIST *, HA_CHECK_OPT *, const char *, thr_lock_type, bool, bool, uint, int (*)(THD *, TABLE_LIST *, HA_CHECK_OPT *), struct {...}, int (*)(THD *, TABLE_LIST *)) (thd=0x3b30e30, tables=0x7fa368004d28, check_opt=0x3b33b98, operator_name=0x101a6c5 "repair", lock_type=TL_WRITE, open_for_modify=false, repair_table_use_frm=false, extra_open_options=32, prepare_func=0x98020b <prepare_for_repair(THD*, TABLE_LIST*, HA_CHECK_OPT*)>, operator_func=(int (handler::*)(handler * const, THD *, HA_CHECK_OPT *)) 0x6505aa <handler::ha_repair(THD*, st_ha_check_opt*)>, view_operator_func=0x0) at /home/philips/git/mysql-wsrep/sql/sql_admin.cc:577
#12 0x0000000000983798 in Sql_cmd_repair_table::execute (this=0x7fa368005830, thd=0x3b30e30) at /home/philips/git/mysql-wsrep/sql/sql_admin.cc:1165
#13 0x00000000007e2002 in mysql_execute_command (thd=0x3b30e30) at /home/philips/git/mysql-wsrep/sql/sql_parse.cc:5467
#14 0x00000000007e5d18 in mysql_parse (thd=0x3b30e30, rawbuf=0x7fa368004c50 "REPAIR TABLE t1,t2", length=18, parser_state=0x7fa3a83be6e0) at /home/philips/git/mysql-wsrep/sql/sql_parse.cc:7043
#15 0x00000000007e517b in wsrep_mysql_parse (thd=0x3b30e30, rawbuf=0x7fa368004c50 "REPAIR TABLE t1,t2", length=18, parser_state=0x7fa3a83be6e0) at /home/philips/git/mysql-wsrep/sql/sql_parse.cc:6791
#16 0x00000000007d6ea7 in dispatch_command (command=COM_QUERY, thd=0x3b30e30, packet=0x3b34ba1 "", packet_length=19) at /home/philips/git/mysql-wsrep/sql/sql_parse.cc:1552
#17 0x00000000007d5899 in do_command (thd=0x3b30e30) at /home/philips/git/mysql-wsrep/sql/sql_parse.cc:1125
#18 0x00000000007a020f in do_handle_one_connection (thd_arg=0x3b30e30) at /home/philips/git/mysql-wsrep/sql/sql_connect.cc:1007
#19 0x000000000079fd06 in handle_one_connection (arg=0x3b30e30) at /home/philips/git/mysql-wsrep/sql/sql_connect.cc:916
#20 0x0000000000ad20de in pfs_spawn_thread (arg=0x39b9670) at /home/philips/git/mysql-wsrep/storage/perfschema/pfs.cc:1860
#21 0x0000003be66076ca in start_thread () from /lib64/libpthread.so.0
#22 0x0000003be5f0779d in clone () from /lib64/libc.so.6
philip-galera commented 9 years ago

ANALYZE, CHECK, CHECKSUM TABLE do not seem to be affected (though I guess it is best to confirm this by code inspection).

DROP TABLE, RENAME TABLE, DROP VIEW do not seem to be affected. The DML reports that the table is no longer there.

mmpower commented 9 years ago

We just got the same issue:

2015-10-13 16:27:32 3489 [Note] WSREP: MDL BF-BF conflict
request: (72    seqno 5604256   wsrep (1, 1, 0) cmd 0 48        OPTIMIZE TABLE wp_2314908723409blc_links, wp_2314908723409blc_instances, wp_2314908723409blc_synch)
granted: (41    seqno 5604261   wsrep (1, 0, 0) cmd 0 144       (null))
2015-10-13 16:27:32 3489 [Note] WSREP: MDL BF-BF conflict
request: (72    seqno 5604256   wsrep (1, 1, 0) cmd 0 48        OPTIMIZE TABLE wp_2314908723409blc_links, wp_2314908723409blc_instances, wp_2314908723409blc_synch)
granted: (41    seqno 5604261   wsrep (1, 0, 0) cmd 0 144       (null))
2015-10-13 16:27:32 3489 [Note] WSREP: MDL BF-BF conflict
request: (72    seqno 5604256   wsrep (1, 1, 0) cmd 0 48        OPTIMIZE TABLE wp_2314908723409blc_links, wp_2314908723409blc_instances, wp_2314908723409blc_synch)
granted: (41    seqno 5604261   wsrep (1, 0, 0) cmd 0 144       (null))
2015-10-13 16:27:32 3489 [Note] WSREP: MDL BF-BF conflict
request: (72    seqno 5604256   wsrep (1, 1, 0) cmd 0 48        OPTIMIZE TABLE wp_2314908723409blc_links, wp_2314908723409blc_instances, wp_2314908723409blc_synch)
granted: (41    seqno 5604261   wsrep (1, 0, 0) cmd 0 144       (null))
2015-10-13 16:27:32 3489 [Note] WSREP: MDL BF-BF conflict
request: (72    seqno 5604256   wsrep (1, 1, 0) cmd 0 48        OPTIMIZE TABLE wp_2314908723409blc_links, wp_2314908723409blc_instances, wp_2314908723409blc_synch)
granted: (41    seqno 5604261   wsrep (1, 0, 0) cmd 0 144       (null))
2015-10-13 16:27:32 3489 [ERROR] WSREP: Trx 5604256 tries to abort slave trx 5604261. This could be caused by:
        1) unsupported configuration options combination, please check documentation.
        2) a bug in the code.
        3) a database corruption.
 Node consistency compromized, need to abort. Restart the node to resync with cluster.
20:27:32 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=109
max_threads=4098
thread_count=85
connection_count=4
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 17376750 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7fc2bc000990
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fc3ef4719a0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x3b)[0x90243b]
/usr/sbin/mysqld(handle_fatal_signal+0x471)[0x676521]
/usr/lib64/libpthread.so.0(+0xf130)[0x7fc5f9734130]
/usr/lib64/libc.so.6(gsignal+0x37)[0x7fc5f7acb5d7]
/usr/lib64/libc.so.6(abort+0x148)[0x7fc5f7acccc8]
/usr/sbin/mysqld[0x97818b]
/usr/sbin/mysqld[0x97880c]
/usr/sbin/mysqld[0x9a6da5]
/usr/sbin/mysqld[0xa0c6f3]
/usr/sbin/mysqld[0x980f01]
/usr/sbin/mysqld(_Z17mysql_alter_tableP3THDPcS1_P24st_ha_create_informationP10TABLE_LISTP10Alter_infojP8st_orderb+0x1908)[0x7544c8]
/usr/sbin/mysqld(_Z20mysql_recreate_tableP3THDP10TABLE_LISTb+0x17c)[0x75646c]
/usr/sbin/mysqld[0x8370cb]
/usr/sbin/mysqld(_ZN22Sql_cmd_optimize_table7executeEP3THD+0xe6)[0x837976]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x33ff)[0x6fdb2f]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x5b8)[0x702c68]
/usr/sbin/mysqld(_ZN15Query_log_event14do_apply_eventEPK14Relay_log_infoPKcj+0x625)[0x88f055]
/usr/sbin/mysqld(_ZN9Log_event11apply_eventEP14Relay_log_info+0x6a)[0x88d37a]
/usr/sbin/mysqld(_Z14wsrep_apply_cbPvPKvmjPK14wsrep_trx_meta+0x628)[0x5ad008]
/usr/lib64/libgalera_smm.so(_ZNK6galera9TrxHandle5applyEPvPF15wsrep_cb_statusS1_PKvmjPK14wsrep_trx_metaERS6_+0xd8)[0x7fc5b02f7da8]
/usr/lib64/libgalera_smm.so(+0x1e610d)[0x7fc5b033310d]
/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM9apply_trxEPvPNS_9TrxHandleE+0xb0)[0x7fc5b0335d90]
/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM11process_trxEPvPNS_9TrxHandleE+0x10e)[0x7fc5b0338bce]
/usr/lib64/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_actionRb+0x1b8)[0x7fc5b0316358]
/usr/lib64/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPvRb+0x4c)[0x7fc5b0317aec]
/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x6b)[0x7fc5b03390eb]
/usr/lib64/libgalera_smm.so(galera_recv+0x18)[0x7fc5b0347768]
/usr/sbin/mysqld[0x5adb44]
/usr/sbin/mysqld(start_wsrep_THD+0x293)[0x594293]
/usr/lib64/libpthread.so.0(+0x7df5)[0x7fc5f972cdf5]
/usr/lib64/libc.so.6(clone+0x6d)[0x7fc5f7b8c1ad]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fc2bc008ff0): is an invalid pointer
Connection ID (thread ID): 72
Status: NOT_KILLED

You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.

our database version is:

Percona-XtraDB-Cluster-56 version 5.6.24-25.11.1.el7; it's a 2 mysqld node + garbd cluster setup.

philip-galera commented 9 years ago

@mmpower . You will need to disable the Broken Link Checker WordPress plugin until this is fixed. Alternatively you can edit the source of the plugin to not perform multi-table OPTIMIZE TABLEs.

mmpower commented 9 years ago

thanks @philip-galera .

we just tested the newer percona xtradb version seems it does not happen any more....seem it's already fixed in percona release https://www.percona.com/doc/percona-xtradb-cluster/5.6/release-notes/Percona-XtraDB-Cluster-5.6.25-25.12.html?

temeo commented 9 years ago

Possibly related commit in PXC tree: https://github.com/percona/percona-xtradb-cluster/commit/5065122f94a8002d4da231528a46f8d9ddbffdc2