bitpoke / mysql-operator

Asynchronous MySQL Replication on Kubernetes using Percona Server and Openark's Orchestrator.
https://www.bitpoke.io/docs/mysql-operator/getting-started/
Apache License 2.0
1.03k stars 275 forks source link

should operator-init.sql be executed everytime when the mysql instance started? #877

Open hit1943 opened 1 year ago

hit1943 commented 1 year ago

hello,

when i tested the crash failover scenario of mysql, i just kill the mysql process in a mysql container of a slave mysql instance,then the container will restart, but the init container will not Re-execute. After the slave mysql started, then i found the slave had some duplicate transactions in replications from the master, ('1062' for example).

I studied the recovery process of the operator,It will follow the following steps:

  1. the slave mysql boots, then it executes the operator-init.sql,

    ...
    DROP TABLE IF EXISTS sys_operator.status;
    CREATE TABLE IF NOT EXISTS sys_operator.status (  name varchar(64) PRIMARY KEY,  value varchar(8192) NOT NULL
    );
    REPLACE INTO sys_operator.status VALUES ('configured', '0');
    REPLACE INTO sys_operator.status VALUES ('backup_gtid_purged', '93ab952a-b6b5-11ed-bcba-2677191faf66:1-75');
    ...
  2. so the "sys_operator.status" table will be dropped,and the two field values("configured" and "backup_gtid_purged") will be reset

  3. Then the operator call 'initializeMySQL' to the instance,which will do 'SetPurgedGTID':the 'GLOBAL.GTID_PURGED' value will be set by the value of 'backup_gtid_purged'

    query := fmt.Sprintf(`
      SET @@SESSION.SQL_LOG_BIN = 0;
      START TRANSACTION;
        SELECT value INTO @gtid FROM %[1]s.%[2]s WHERE name='%[3]s';
        RESET MASTER;
        SET @@GLOBAL.GTID_PURGED = @gtid;
        REPLACE INTO %[1]s.%[2]s VALUES ('%[4]s', @gtid);
      COMMIT;
    `, constants.OperatorDbName, constants.OperatorStatusTableName, "backup_gtid_purged", "set_gtid_purged")
  4. The slave mysql instance will set 'gtid_purged' and 'gtid_executed' in the same time, so the value of 'gtid_executed' will be set to old value which already executed

  5. the operator executed "run CHANGE MASTER TO on pod" for the slave,it will found the duplicate transactions in replications