bitpoke / mysql-operator

Asynchronous MySQL Replication on Kubernetes using Percona Server and Openark's Orchestrator.
https://www.bitpoke.io/docs/mysql-operator/getting-started/
Apache License 2.0
1.03k stars 276 forks source link

MySQL Cluster failover failture #362

Open liyongxian opened 5 years ago

liyongxian commented 5 years ago

kubernetes info:

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:26:52Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

Chart version v0.3.0-rc.3 Problem description: Mysql Cluster has 3 node,mysql-cluster-0 is Master Node,others are Slave Node. When I delete pod mysql-cluster-0(Cluster Master Node),Master Node failover to mysql-cluster-2. After the node mysql-cluster-0 running, the node should be one slave.Then the slave status of node mysql-cluster-0 happens to wrong. The Info: Node:mysql-cluster-0 MySQL CMD: show slave status\G;

Slave_IO_State: 
                  Master_Host: mysql-cluster-db-mysql-2.mysql.mysql-operator
                  Master_User: sys_replication
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: 
          Read_Master_Log_Pos: 4
               Relay_Log_File: mysql-cluster-db-mysql-0-relay-bin.000001
                Relay_Log_Pos: 4
        Relay_Master_Log_File: 
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 0
              Relay_Log_Space: 154
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 1236
                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 102
                  Master_UUID: 20554341-9d3f-11e9-ae4b-ae3830e16ff6
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 190703 03:41:00
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 50d2e21a-9d44-11e9-a604-46a8049f1cfb:1-9
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 

Node:mysql-cluster-2 MySQL CMD: show slave status\G;

Slave_IO_State: Connecting to master
                  Master_Host: //mysql-cluster-db-mysql-0.mysql.mysql-operator
                  Master_User: sys_replication
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: 
          Read_Master_Log_Pos: 4
               Relay_Log_File: mysql-cluster-db-mysql-2-relay-bin.000001
                Relay_Log_Pos: 4
        Relay_Master_Log_File: 
             Slave_IO_Running: Connecting
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 0
              Relay_Log_Space: 154
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 2005
                Last_IO_Error: error connecting to master 'sys_replication@//mysql-cluster-db-mysql-0.mysql.mysql-operator:3306' - retry-time: 10  retries: 111
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 0
                  Master_UUID: 
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 190703 03:58:40
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 20554341-9d3f-11e9-ae4b-ae3830e16ff6:1-1106,
5dacb80e-9d3d-11e9-a8c1-7aaa9a0ae85c:1-2901
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 

Thanks a lot.

AMecea commented 5 years ago

Hi @liyongxian , node-2 it's ok, it in a detached mode, set by orchestrator, the tool that we use for fast failovers.

Indeed the node-0 should connect successfully to node-2. Can you give me a little more context, did you set some custom MySQL config?

Also, the resource description and controller logs will be very useful to debug this.

Thank you!

lizhongxuan commented 3 years ago

@AMecea @liyongxian I have the same problem. Has it been resolved?