mysql-master-ha fails to disable slave on a new master

GoogleCodeExporter commented 9 years ago

Hi.

Testing mysql-master-ha (with 3 slaves and one master), I discovered that the 
new master will still be seen as a slave and masterha_manager then refuses to 
start.
It also won't remove the failed master from the config when I run:
# masterha_manager --remove_dead_master_conf --conf=/etc/mha/app1.cnf

This is part of the log telling that mysql-master-ha failed to remove the slave 
part from the new master and that it still runs as slave:

Tue Sep 25 14:25:45 2012 - [info] * Phase 5: New master cleanup phease..
Tue Sep 25 14:25:45 2012 - [info]
Tue Sep 25 14:25:45 2012 - [info] Resetting slave info on the new master..
Tue Sep 25 14:25:45 2012 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, 
ln674]  SHOW SLAVE STATUS shows new master replicates from somewhere. Check for 
details!
Tue Sep 25 14:25:45 2012 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, 
ln688]  db02.db.cert.fronter.net: Resetting slave info failed.
Tue Sep 25 14:25:45 2012 - 
[error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln1537] Master 
failover to db02.mynetwork.net(11.22.33.2:3306) done, but recovery on slave 
partially failed.
Tue Sep 25 14:25:45 2012 - [info]

This is output of show slave status:

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State:
                  Master_Host: db01.mynetwork.net
                  Master_User: replica
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: mysql-bin.000049
          Read_Master_Log_Pos: 107
               Relay_Log_File: mysqld-relay-bin.000004
                Relay_Log_Pos: 253
        Relay_Master_Log_File: mysql-bin.000049
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 107
              Relay_Log_Space: 839
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:   
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 2003
                Last_IO_Error: error reconnecting to master 'replica@db01.mynetwork.net:3306' - retry-time: 10  retries: 86400
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
1 row in set (0.00 sec)

And finally this is the error I get running
# masterha_manager --remove_dead_master_conf --conf=/etc/mha/app1.cnf

Tue Sep 25 15:28:10 2012 - [warning] SQL Thread is stopped(no error) on 
db02.mynetwork.net(11.22.33.2:3306)
Tue Sep 25 15:28:10 2012 - 
[error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln732] Multi-master 
configuration is detected, but two or more masters are either writable 
(read-only is not set) or dead! Check configurations for details. Master 
configurations are as below: 
Master db01.mynetwork.net(11.22.33.1:3306), dead
Master db02.db.cert.fronter.net(11.22.33.2:3306), replicating from 
db01.mynetwork.net(11.22.33.1:3306)

Tue Sep 25 15:28:10 2012 - 
[error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln383] Error happend 
on checking configurations.  at 
/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 298
Tue Sep 25 15:28:10 2012 - 
[error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln478] Error 
happened on monitoring servers.
Tue Sep 25 15:28:10 2012 - [info] Got exit code 1 (Not master dead).

Is it a known issue? Any idea why this fails?

Original issue reported on code.google.com by m.je...@gmail.com on 25 Sep 2012 at 1:31

GoogleCodeExporter commented 9 years ago

What MySQL version are you using?

Original comment by Yoshinor...@gmail.com on 25 Sep 2012 at 5:36

GoogleCodeExporter commented 9 years ago

# mysql -V
mysql  Ver 14.14 Distrib 5.5.24, for Linux (x86_64) using readline 5.1

Original comment by m.je...@gmail.com on 25 Sep 2012 at 5:40

GoogleCodeExporter commented 9 years ago

Is MySQL server version also 5.5.24? You can check by connecting to MySQL 
server.

Original comment by Yoshinor...@gmail.com on 25 Sep 2012 at 5:44

GoogleCodeExporter commented 9 years ago

All of them are running on CentOS release 6.3 (Final) and MySQL version is:
Server version: 5.5.24-log MySQL Community Server (GPL)

Original comment by m.je...@gmail.com on 25 Sep 2012 at 5:47

GoogleCodeExporter commented 9 years ago

The error shows that "RESET SLAVE /*!50516 ALL */" on the new slave didn't 
clear master information. MHA checks whether show slave status returns empty 
results. Failover itself succeeded. But master information in general should be 
removed, otherwise new master still replicates from original master. RESET 
SLAVE ALL removes master information.

Would you please check below if possible:
- Run "RESET SLAVE /*!50516 ALL */" on the new master, and check SHOW SLAVE 
STATUS returns empty result
- Do the whole failover steps again and check whether this behavior repeats

Original comment by Yoshinor...@gmail.com on 25 Sep 2012 at 6:02

GoogleCodeExporter commented 9 years ago

Running "RESET SLAVE /*!50516 ALL */" on the new master seems to have solved 
the problem.
- Did we catch a bug? 
- And why running masterha_manager --remove_dead_master_conf 
--conf=/etc/mha/app1.cnf did not remove the old master (that failed) from the 
config ?

Original comment by m.je...@gmail.com on 25 Sep 2012 at 10:54

GoogleCodeExporter commented 9 years ago

> - Did we catch a bug? 
Is it possible for you to repeat whole failover steps and this behavior 
repeats? If it repeats it's highly likely MHA bug.

> - And why running masterha_manager --remove_dead_master_conf 
--conf=/etc/mha/app1.cnf did not remove the old master (that failed) from the 
config ?

--remove_dead_master_conf does not remove the old master entry if failover 
error code is not 0. Error code is 10 if RESET SLAVE fails. The logic is 
implemented around MasterFailover.pm line 1537 and 1638.

Original comment by Yoshinor...@gmail.com on 25 Sep 2012 at 11:07

GoogleCodeExporter commented 9 years ago

I can reproduce this problem each time I run mysql-master-ha. 
I tried to run with 2 and with 3 slaves and one master but it resulted in the 
same error. 
I sent you my config and logs in private email.

Original comment by m.je...@gmail.com on 26 Sep 2012 at 10:58

GoogleCodeExporter commented 9 years ago

I ran into this problem too.  In my case, the reason was that my mha4mysql user 
(the one specified in the masterha_manager config file) didn't have the 
'RELOAD' privilege.

Original comment by l...@deviantart.com on 5 Mar 2014 at 8:30

blueloveyu / mysql-master-ha

mysql-master-ha fails to disable slave on a new master #34