chusiang / mysql-master-master

Automatically exported from code.google.com/p/mysql-master-master
GNU General Public License v2.0
1 stars 1 forks source link

monitor and agent #51

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. OS:redhat AS5 mysql:5.1.32
conf file:
# Mysql checker 
# (restarts after 10000 checks to prevent memory leaks)
check mysql
    check_period 3 
    trap_period  2
    timeout 2
    restart_after 10000
# Mysql replication threads checker 
# (restarts after 10000 checks to prevent memory leaks)
check rep_threads
    check_period 2
    #trap_period 5
    trap_period 2
    timeout 2
    restart_after 10000

2.

i build a 3M HA database, db1 db2 and monitor reside different hosts, on 
monitor side, [root@Filesvr1 sbin]# mmm_control show
MySQL Multi-Master Replication Manager
Version: 1.2.6
Config file: mmm_mon.conf
Daemon is running!
===============================
Cluster failover method: AUTO
===============================
Servers status:
  btspreport0(192.168.35.41): master/ONLINE. Roles: reader(192.168.35.147;)
  btspreport1(192.168.35.42): master/ONLINE. Roles: reader
(192.168.35.146;), writer(192.168.35.148;)

[root@Filesvr1 sbin]# mmm_control  move_role writer btspreport0
MySQL Multi-Master Replication Manager
Version: 1.2.6
Config file: mmm_mon.conf
Daemon is running!
Command sent to monitoring host. Result: ERROR: Unknown role name (write)! 
Valid roles are: reader, _count, writer

[root@Filesvr1 sbin]# mmm_control  move_role writer btspreport0
MySQL Multi-Master Replication Manager
Version: 1.2.6
Config file: mmm_mon.conf
Daemon is running!
Command sent to monitoring host. Result: OK: Role 'writer' has been moved 
from 'btspreport1' to 'btspreport0'. Now you can wait some time and check 
new roles info!

[root@Filesvr1 sbin]# mmm_control show
MySQL Multi-Master Replication Manager
Version: 1.2.6
Config file: mmm_mon.conf
Daemon is running!
===============================
Cluster failover method: AUTO
===============================
Servers status:
  btspreport0(192.168.35.41): master/ONLINE. Roles: reader
(192.168.35.147;), writer(192.168.35.148;)
  btspreport1(192.168.35.42): master/ONLINE. Roles: reader(192.168.35.146;)

but after about 5 minites,

[root@Filesvr1 sbin]# mmm_control show
MySQL Multi-Master Replication Manager
Version: 1.2.6
Config file: mmm_mon.conf
Daemon is running!
===============================
Cluster failover method: AUTO
===============================
Servers status:
  btspreport0(192.168.35.41): master/ONLINE. Roles: reader(192.168.35.147;)
  btspreport1(192.168.35.42): master/ONLINE. Roles: reader
(192.168.35.146;), writer(192.168.35.148;)

the mmm's coresponding log is:
[2010-04-19 08:24:27]: 16779: Daemon: Admin Move role(writer): 
btspreport1 -> btspreport0
[2010-04-19 08:43:04]: 16779: Check: CHECK_FAIL('btspreport0', 'mysql')  
Returned message: ERROR: Connect error (host = 192.168.35.41:3306, user = 
rep_monitor, pass = 'xxxxxxxxxx')! Can't connect to MySQL server 
on '192.168.35.41' (4)
[2010-04-19 08:43:05]: 16779: Daemon: State change(btspreport0): ONLINE -> 
HARD_OFFLINE
[2010-04-19 08:43:05]: 16779: Check: CHECK_OK('btspreport0', 'mysql')
[2010-04-19 08:43:15]: 16779: Daemon: State change(btspreport0): 
HARD_OFFLINE -> AWAITING_RECOVERY
[2010-04-19 08:43:16]: 16779: Daemon: State change(btspreport0): 
AWAITING_RECOVERY -> ONLINE. Uptime diff = 10.6200000010431 seconds; 
Status change diff = 1

and i am sure that the 41 host 's network works well and the mysqld works 
well too.

what is the problem ? why the monitor automatically switch it?

tail -20f /usr/local/mmm/var/mmm-debug.log

[2010-04-20 06:17:15]: 2119: Check: CHECK_FAIL('btspreport0', 'mysql')  
Returned message: ERROR: Connect error (host = 192.168.35.41:3306, user = 
rep_monitor, pass = 'xxxxxxxxxx')! Can't connect to MySQL server 
on '192.168.35.41' (4)
[2010-04-20 06:17:16]: 2119: Daemon: State change(btspreport0): ONLINE -> 
HARD_OFFLINE
[2010-04-20 06:17:24]: 2119: Check: CHECK_FAIL
('btspreport0', 'rep_threads')  Returned message: ERROR: Timeout
[2010-04-20 06:17:24]: 2119: Check: CHECK_OK('btspreport0', 'mysql')
[2010-04-20 06:17:24]: 2119: Daemon: State change(btspreport0): 
HARD_OFFLINE -> AWAITING_RECOVERY
[2010-04-20 06:17:26]: 2119: Check: CHECK_OK('btspreport0', 'rep_threads')
[2010-04-20 06:17:26]: 2119: Daemon: State change(btspreport0): 
AWAITING_RECOVERY -> ONLINE. Uptime diff = 10.5 seconds; Status change 
diff = 2
[2010-04-20 06:23:40]: 2119: Check: CHECK_FAIL('btspreport0', 'mysql')  
Returned message: ERROR: Connect error (host = 192.168.35.41:3306, user = 
rep_monitor, pass = 'xxxxxxxxxx')! Can't connect to MySQL server 
on '192.168.35.41' (4)
[2010-04-20 06:23:40]: 2119: Daemon: State change(btspreport0): ONLINE -> 
HARD_OFFLINE
[2010-04-20 06:23:51]: 2119: Check: CHECK_FAIL
('btspreport0', 'rep_threads')  Returned message: ERROR: Timeout
[2010-04-20 06:23:53]: 2119: Check: CHECK_OK('btspreport0', 'rep_threads')
[2010-04-20 06:23:53]: 2119: Check: CHECK_OK('btspreport0', 'mysql')
[2010-04-20 06:23:54]: 2119: Daemon: State change(btspreport0): 
HARD_OFFLINE -> AWAITING_RECOVERY
[2010-04-20 06:23:56]: 2119: Daemon: State change(btspreport0): 
AWAITING_RECOVERY -> ONLINE. Uptime diff = 15.570000000298 seconds; Status 
change diff = 2
[2010-04-20 06:47:05]: 2119: Check: CHECK_FAIL('btspreport0', 'mysql')  
Returned message: ERROR: Connect error (host = 192.168.35.41:3306, user = 
rep_monitor, pass = 'xxxxxxxxxx')! Can't connect to MySQL server 
on '192.168.35.41' (4)
[2010-04-20 06:47:05]: 2119: Daemon: State change(btspreport0): ONLINE -> 
HARD_OFFLINE
[2010-04-20 06:47:19]: 2119: Check: CHECK_FAIL
('btspreport0', 'rep_threads')  Returned message: ERROR: Timeout
[2010-04-20 06:47:19]: 2119: Check: CHECK_OK('btspreport0', 'mysql')
[2010-04-20 06:47:20]: 2119: Daemon: State change(btspreport0): 
HARD_OFFLINE -> AWAITING_RECOVERY
[2010-04-20 06:47:21]: 2119: Check: CHECK_OK('btspreport0', 'rep_threads')
[2010-04-20 06:47:22]: 2119: Daemon: State change(btspreport0): 
AWAITING_RECOVERY -> ONLINE. Uptime diff = 16.5800000019372 seconds; 
Status change diff = 2

thank you in advance!

Original issue reported on code.google.com by wendywon...@gmail.com on 20 Apr 2010 at 3:51