Open GoogleCodeExporter opened 9 years ago
What steps will reproduce the problem? 1. OS:redhat AS5 mysql:5.1.32 conf file: # Mysql checker # (restarts after 10000 checks to prevent memory leaks) check mysql check_period 3 trap_period 2 timeout 2 restart_after 10000 # Mysql replication threads checker # (restarts after 10000 checks to prevent memory leaks) check rep_threads check_period 2 #trap_period 5 trap_period 2 timeout 2 restart_after 10000 2. i build a 3M HA database, db1 db2 and monitor reside different hosts, on monitor side, [root@Filesvr1 sbin]# mmm_control show MySQL Multi-Master Replication Manager Version: 1.2.6 Config file: mmm_mon.conf Daemon is running! =============================== Cluster failover method: AUTO =============================== Servers status: btspreport0(192.168.35.41): master/ONLINE. Roles: reader(192.168.35.147;) btspreport1(192.168.35.42): master/ONLINE. Roles: reader (192.168.35.146;), writer(192.168.35.148;) [root@Filesvr1 sbin]# mmm_control move_role writer btspreport0 MySQL Multi-Master Replication Manager Version: 1.2.6 Config file: mmm_mon.conf Daemon is running! Command sent to monitoring host. Result: ERROR: Unknown role name (write)! Valid roles are: reader, _count, writer [root@Filesvr1 sbin]# mmm_control move_role writer btspreport0 MySQL Multi-Master Replication Manager Version: 1.2.6 Config file: mmm_mon.conf Daemon is running! Command sent to monitoring host. Result: OK: Role 'writer' has been moved from 'btspreport1' to 'btspreport0'. Now you can wait some time and check new roles info! [root@Filesvr1 sbin]# mmm_control show MySQL Multi-Master Replication Manager Version: 1.2.6 Config file: mmm_mon.conf Daemon is running! =============================== Cluster failover method: AUTO =============================== Servers status: btspreport0(192.168.35.41): master/ONLINE. Roles: reader (192.168.35.147;), writer(192.168.35.148;) btspreport1(192.168.35.42): master/ONLINE. Roles: reader(192.168.35.146;) but after about 5 minites, [root@Filesvr1 sbin]# mmm_control show MySQL Multi-Master Replication Manager Version: 1.2.6 Config file: mmm_mon.conf Daemon is running! =============================== Cluster failover method: AUTO =============================== Servers status: btspreport0(192.168.35.41): master/ONLINE. Roles: reader(192.168.35.147;) btspreport1(192.168.35.42): master/ONLINE. Roles: reader (192.168.35.146;), writer(192.168.35.148;) the mmm's coresponding log is: [2010-04-19 08:24:27]: 16779: Daemon: Admin Move role(writer): btspreport1 -> btspreport0 [2010-04-19 08:43:04]: 16779: Check: CHECK_FAIL('btspreport0', 'mysql') Returned message: ERROR: Connect error (host = 192.168.35.41:3306, user = rep_monitor, pass = 'xxxxxxxxxx')! Can't connect to MySQL server on '192.168.35.41' (4) [2010-04-19 08:43:05]: 16779: Daemon: State change(btspreport0): ONLINE -> HARD_OFFLINE [2010-04-19 08:43:05]: 16779: Check: CHECK_OK('btspreport0', 'mysql') [2010-04-19 08:43:15]: 16779: Daemon: State change(btspreport0): HARD_OFFLINE -> AWAITING_RECOVERY [2010-04-19 08:43:16]: 16779: Daemon: State change(btspreport0): AWAITING_RECOVERY -> ONLINE. Uptime diff = 10.6200000010431 seconds; Status change diff = 1 and i am sure that the 41 host 's network works well and the mysqld works well too. what is the problem ? why the monitor automatically switch it? tail -20f /usr/local/mmm/var/mmm-debug.log [2010-04-20 06:17:15]: 2119: Check: CHECK_FAIL('btspreport0', 'mysql') Returned message: ERROR: Connect error (host = 192.168.35.41:3306, user = rep_monitor, pass = 'xxxxxxxxxx')! Can't connect to MySQL server on '192.168.35.41' (4) [2010-04-20 06:17:16]: 2119: Daemon: State change(btspreport0): ONLINE -> HARD_OFFLINE [2010-04-20 06:17:24]: 2119: Check: CHECK_FAIL ('btspreport0', 'rep_threads') Returned message: ERROR: Timeout [2010-04-20 06:17:24]: 2119: Check: CHECK_OK('btspreport0', 'mysql') [2010-04-20 06:17:24]: 2119: Daemon: State change(btspreport0): HARD_OFFLINE -> AWAITING_RECOVERY [2010-04-20 06:17:26]: 2119: Check: CHECK_OK('btspreport0', 'rep_threads') [2010-04-20 06:17:26]: 2119: Daemon: State change(btspreport0): AWAITING_RECOVERY -> ONLINE. Uptime diff = 10.5 seconds; Status change diff = 2 [2010-04-20 06:23:40]: 2119: Check: CHECK_FAIL('btspreport0', 'mysql') Returned message: ERROR: Connect error (host = 192.168.35.41:3306, user = rep_monitor, pass = 'xxxxxxxxxx')! Can't connect to MySQL server on '192.168.35.41' (4) [2010-04-20 06:23:40]: 2119: Daemon: State change(btspreport0): ONLINE -> HARD_OFFLINE [2010-04-20 06:23:51]: 2119: Check: CHECK_FAIL ('btspreport0', 'rep_threads') Returned message: ERROR: Timeout [2010-04-20 06:23:53]: 2119: Check: CHECK_OK('btspreport0', 'rep_threads') [2010-04-20 06:23:53]: 2119: Check: CHECK_OK('btspreport0', 'mysql') [2010-04-20 06:23:54]: 2119: Daemon: State change(btspreport0): HARD_OFFLINE -> AWAITING_RECOVERY [2010-04-20 06:23:56]: 2119: Daemon: State change(btspreport0): AWAITING_RECOVERY -> ONLINE. Uptime diff = 15.570000000298 seconds; Status change diff = 2 [2010-04-20 06:47:05]: 2119: Check: CHECK_FAIL('btspreport0', 'mysql') Returned message: ERROR: Connect error (host = 192.168.35.41:3306, user = rep_monitor, pass = 'xxxxxxxxxx')! Can't connect to MySQL server on '192.168.35.41' (4) [2010-04-20 06:47:05]: 2119: Daemon: State change(btspreport0): ONLINE -> HARD_OFFLINE [2010-04-20 06:47:19]: 2119: Check: CHECK_FAIL ('btspreport0', 'rep_threads') Returned message: ERROR: Timeout [2010-04-20 06:47:19]: 2119: Check: CHECK_OK('btspreport0', 'mysql') [2010-04-20 06:47:20]: 2119: Daemon: State change(btspreport0): HARD_OFFLINE -> AWAITING_RECOVERY [2010-04-20 06:47:21]: 2119: Check: CHECK_OK('btspreport0', 'rep_threads') [2010-04-20 06:47:22]: 2119: Daemon: State change(btspreport0): AWAITING_RECOVERY -> ONLINE. Uptime diff = 16.5800000019372 seconds; Status change diff = 2 thank you in advance!
Original issue reported on code.google.com by wendywon...@gmail.com on 20 Apr 2010 at 3:51
wendywon...@gmail.com
Original issue reported on code.google.com by
wendywon...@gmail.com
on 20 Apr 2010 at 3:51