when i tested the crash failover scenario of mysql, i just kill the mysql process in a mysql container of a slave mysql instance,then the container will restart, but the init container will not Re-execute. After the slave mysql started, then i found the slave had some duplicate transactions in replications from the master, ('1062' for example).
I studied the recovery process of the operator,It will follow the following steps:
the slave mysql boots, then it executes the operator-init.sql,
...
DROP TABLE IF EXISTS sys_operator.status;
CREATE TABLE IF NOT EXISTS sys_operator.status ( name varchar(64) PRIMARY KEY, value varchar(8192) NOT NULL
);
REPLACE INTO sys_operator.status VALUES ('configured', '0');
REPLACE INTO sys_operator.status VALUES ('backup_gtid_purged', '93ab952a-b6b5-11ed-bcba-2677191faf66:1-75');
...
so the "sys_operator.status" table will be dropped,and the two field values("configured" and "backup_gtid_purged") will be reset
Then the operator call 'initializeMySQL' to the instance,which will do 'SetPurgedGTID':the 'GLOBAL.GTID_PURGED' value will be set by the value of 'backup_gtid_purged'
query := fmt.Sprintf(`
SET @@SESSION.SQL_LOG_BIN = 0;
START TRANSACTION;
SELECT value INTO @gtid FROM %[1]s.%[2]s WHERE name='%[3]s';
RESET MASTER;
SET @@GLOBAL.GTID_PURGED = @gtid;
REPLACE INTO %[1]s.%[2]s VALUES ('%[4]s', @gtid);
COMMIT;
`, constants.OperatorDbName, constants.OperatorStatusTableName, "backup_gtid_purged", "set_gtid_purged")
The slave mysql instance will set 'gtid_purged' and 'gtid_executed' in the same time, so the value of 'gtid_executed' will be set to old value which already executed
the operator executed "run CHANGE MASTER TO on pod" for the slave,it will found the duplicate transactions in replications
hello,
when i tested the crash failover scenario of mysql, i just kill the mysql process in a mysql container of a slave mysql instance,then the container will restart, but the init container will not Re-execute. After the slave mysql started, then i found the slave had some duplicate transactions in replications from the master, ('1062' for example).
I studied the recovery process of the operator,It will follow the following steps:
the slave mysql boots, then it executes the operator-init.sql,
so the "sys_operator.status" table will be dropped,and the two field values("configured" and "backup_gtid_purged") will be reset
Then the operator call 'initializeMySQL' to the instance,which will do 'SetPurgedGTID':the 'GLOBAL.GTID_PURGED' value will be set by the value of 'backup_gtid_purged'
The slave mysql instance will set 'gtid_purged' and 'gtid_executed' in the same time, so the value of 'gtid_executed' will be set to old value which already executed
the operator executed "run CHANGE MASTER TO on pod" for the slave,it will found the duplicate transactions in replications