andreaspe / mysql-master-ha

Automatically exported from code.google.com/p/mysql-master-ha
0 stars 0 forks source link

New problem, I have neve watched like this.......So strange #53

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Hi~Yoshinor

mysql master-slave replication has been successful

This is 5 servers in IDC room,one master and the others is slave

master:10.10.1.109
slave1:10.10.1.110
slave2:10.10.1.193
slave3:10.10.1.194
slave4+mha_manage:10.10.1.195

In the past two days, I run the command: masterha_check_repl - conf = / 
etc/app1.cnf

It's also display OK

But today It's display not ok,But In fact,The master-slave replication is 
Running well

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
This is my /etc/app1.cnf
[server default]
log_level=debug
# mysql user and password
user=root
password=0ps.iz3n3
ssh_user=ops
repl_user=rep
repl_password=bm5123
master_binlog_dir=/var/lib/mysql
# working directory on the manager
manager_workdir=/var/log/masterha/app1
# manager log file
manager_log=/var/log/masterha/app1/app1.log
# working directory on MySQL servers
remote_workdir=/var/log/masterha/app1
[server1]
hostname=10.10.1.109
[server2]
hostname=10.10.1.110
[server3]
hostname=10.10.1.193
[server4]
hostname=10.10.1.194
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Now I run masterha_check_repl --conf=/etc/app1 

display the following

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

ops@B5M-D5:/var/log/masterha/app1$ masterha_check_repl --conf=/etc/app1.cnf 
Wed Jan  9 11:12:57 2013 - [warning] Global configuration file 
/etc/masterha_default.cnf not found. Skipping.
Wed Jan  9 11:12:57 2013 - [info] Reading application default configurations 
from /etc/app1.cnf..
Wed Jan  9 11:12:57 2013 - [info] Reading server configurations from 
/etc/app1.cnf..
Wed Jan  9 11:12:57 2013 - [info] MHA::MasterMonitor version 0.53.
Wed Jan  9 11:12:57 2013 - [debug] Connecting to servers..
Wed Jan  9 11:12:57 2013 - [debug]  Connected to: 
10.10.1.109(10.10.1.109:3306), user=root
Wed Jan  9 11:12:57 2013 - [debug]  Connected to: 
10.10.1.110(10.10.1.110:3306), user=root
Wed Jan  9 11:12:57 2013 - [debug]  Connected to: 
10.10.1.193(10.10.1.193:3306), user=root
Wed Jan  9 11:12:57 2013 - [debug]  Connected to: 
10.10.1.194(10.10.1.194:3306), user=root
Wed Jan  9 11:12:57 2013 - [debug]  Comparing MySQL versions..
Wed Jan  9 11:12:57 2013 - [debug]   Comparing MySQL versions done.
Wed Jan  9 11:12:57 2013 - [debug] Connecting to servers done.
Wed Jan  9 11:12:57 2013 - [info] Dead Servers:
Wed Jan  9 11:12:57 2013 - [info] Alive Servers:
Wed Jan  9 11:12:57 2013 - [info]   10.10.1.109(10.10.1.109:3306)
Wed Jan  9 11:12:57 2013 - [info]   10.10.1.110(10.10.1.110:3306)
Wed Jan  9 11:12:57 2013 - [info]   10.10.1.193(10.10.1.193:3306)
Wed Jan  9 11:12:57 2013 - [info]   10.10.1.194(10.10.1.194:3306)
Wed Jan  9 11:12:57 2013 - [info] Alive Slaves:
Wed Jan  9 11:12:57 2013 - [info]   10.10.1.110(10.10.1.110:3306)  
Version=5.1.41-3ubuntu12.10-log (oldest major version between slaves) 
log-bin:enabled
Wed Jan  9 11:12:57 2013 - [debug]    Relay log info repository: FILE
Wed Jan  9 11:12:57 2013 - [info]     Replicating from 
10.10.1.109(10.10.1.109:3306)
Wed Jan  9 11:12:57 2013 - [info]   10.10.1.193(10.10.1.193:3306)  
Version=5.1.66-0+squeeze1-log (oldest major version between slaves) 
log-bin:enabled
Wed Jan  9 11:12:57 2013 - [debug]    Relay log info repository: FILE
Wed Jan  9 11:12:57 2013 - [info]     Replicating from 
10.10.1.109(10.10.1.109:3306)
Wed Jan  9 11:12:57 2013 - [info]   10.10.1.194(10.10.1.194:3306)  
Version=5.1.66-0+squeeze1-log (oldest major version between slaves) 
log-bin:enabled
Wed Jan  9 11:12:57 2013 - [debug]    Relay log info repository: FILE
Wed Jan  9 11:12:57 2013 - [info]     Replicating from 
10.10.1.109(10.10.1.109:3306)
Wed Jan  9 11:12:57 2013 - [info] Current Alive Master: 
10.10.1.109(10.10.1.109:3306)
Wed Jan  9 11:12:57 2013 - [info] Checking slave configurations..
Wed Jan  9 11:12:57 2013 - [info]  read_only=1 is not set on slave 
10.10.1.110(10.10.1.110:3306).
Wed Jan  9 11:12:57 2013 - [warning]  relay_log_purge=0 is not set on slave 
10.10.1.110(10.10.1.110:3306).
Wed Jan  9 11:12:57 2013 - [info]  read_only=1 is not set on slave 
10.10.1.193(10.10.1.193:3306).
Wed Jan  9 11:12:57 2013 - [warning]  relay_log_purge=0 is not set on slave 
10.10.1.193(10.10.1.193:3306).
Wed Jan  9 11:12:57 2013 - [info]  read_only=1 is not set on slave 
10.10.1.194(10.10.1.194:3306).
Wed Jan  9 11:12:57 2013 - [warning]  relay_log_purge=0 is not set on slave 
10.10.1.194(10.10.1.194:3306).
Wed Jan  9 11:12:57 2013 - [info] Checking replication filtering settings..
Wed Jan  9 11:12:57 2013 - [info]  binlog_do_db= , binlog_ignore_db= 
Wed Jan  9 11:12:57 2013 - [info]  Replication filtering check ok.
Wed Jan  9 11:12:57 2013 - [info] Starting SSH connection tests..
Wed Jan  9 11:12:59 2013 - [debug] 
Wed Jan  9 11:12:57 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.109(10.10.1.109:22) to ops@10.10.1.110(10.10.1.110:22)..
Wed Jan  9 11:12:58 2013 - [debug]   ok.
Wed Jan  9 11:12:58 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.109(10.10.1.109:22) to ops@10.10.1.193(10.10.1.193:22)..
Wed Jan  9 11:12:58 2013 - [debug]   ok.
Wed Jan  9 11:12:58 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.109(10.10.1.109:22) to ops@10.10.1.194(10.10.1.194:22)..
Wed Jan  9 11:12:59 2013 - [debug]   ok.
Wed Jan  9 11:12:59 2013 - [debug] 
Wed Jan  9 11:12:58 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.193(10.10.1.193:22) to ops@10.10.1.109(10.10.1.109:22)..
Wed Jan  9 11:12:59 2013 - [debug]   ok.
Wed Jan  9 11:12:59 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.193(10.10.1.193:22) to ops@10.10.1.110(10.10.1.110:22)..
Wed Jan  9 11:12:59 2013 - [debug]   ok.
Wed Jan  9 11:12:59 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.193(10.10.1.193:22) to ops@10.10.1.194(10.10.1.194:22)..
Wed Jan  9 11:12:59 2013 - [debug]   ok.
Wed Jan  9 11:13:00 2013 - [debug] 
Wed Jan  9 11:12:59 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.194(10.10.1.194:22) to ops@10.10.1.109(10.10.1.109:22)..
Wed Jan  9 11:12:59 2013 - [debug]   ok.
Wed Jan  9 11:12:59 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.194(10.10.1.194:22) to ops@10.10.1.110(10.10.1.110:22)..
Wed Jan  9 11:12:59 2013 - [debug]   ok.
Wed Jan  9 11:12:59 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.194(10.10.1.194:22) to ops@10.10.1.193(10.10.1.193:22)..
Wed Jan  9 11:13:00 2013 - [debug]   ok.
Wed Jan  9 11:13:05 2013 - [error][/usr/share/perl5/MHA/SSHCheck.pm, ln63] 
Wed Jan  9 11:12:58 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.110(10.10.1.110:22) to ops@10.10.1.109(10.10.1.109:22)..
Wed Jan  9 11:13:01 2013 - [debug]   ok.
Wed Jan  9 11:13:01 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.110(10.10.1.110:22) to ops@10.10.1.193(10.10.1.193:22)..
ssh: connect to host 10.10.1.110 port 22: Connection timed out
Wed Jan  9 11:13:05 2013 - [error][/usr/share/perl5/MHA/SSHCheck.pm, ln107] SSH 
connection from ops@10.10.1.110(10.10.1.110:22) to 
ops@10.10.1.193(10.10.1.193:22) failed!
Wed Jan  9 11:13:05 2013 - [error][/usr/share/perl5/MHA/MasterMonitor.pm, 
ln383] Error happend on checking configurations. SSH Configuration Check Failed!
 at /usr/share/perl5/MHA/MasterMonitor.pm line 339
Wed Jan  9 11:13:05 2013 - [error][/usr/share/perl5/MHA/MasterMonitor.pm, 
ln478] Error happened on monitoring servers.
Wed Jan  9 11:13:05 2013 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

How do I troubleshoot?Thank you Yoshinor

Original issue reported on code.google.com by yanq...@b5m.com on 9 Jan 2013 at 3:16

GoogleCodeExporter commented 9 years ago
I think maybe it is network problem

Look at this, run masterha_check_ssh --conf=/etc/app1.cnf It's OK
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

ops@B5M-D5:/var/log/masterha/app1$ masterha_check_ssh --conf=/etc/app1.cnf 
Wed Jan  9 11:21:05 2013 - [warning] Global configuration file 
/etc/masterha_default.cnf not found. Skipping.
Wed Jan  9 11:21:05 2013 - [info] Reading application default configurations 
from /etc/app1.cnf..
Wed Jan  9 11:21:05 2013 - [info] Reading server configurations from 
/etc/app1.cnf..
Wed Jan  9 11:21:05 2013 - [info] Starting SSH connection tests..
Wed Jan  9 11:21:07 2013 - [debug] 
Wed Jan  9 11:21:06 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.193(10.10.1.193:22) to ops@10.10.1.109(10.10.1.109:22)..
Wed Jan  9 11:21:07 2013 - [debug]   ok.
Wed Jan  9 11:21:07 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.193(10.10.1.193:22) to ops@10.10.1.110(10.10.1.110:22)..
Wed Jan  9 11:21:07 2013 - [debug]   ok.
Wed Jan  9 11:21:07 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.193(10.10.1.193:22) to ops@10.10.1.194(10.10.1.194:22)..
Wed Jan  9 11:21:07 2013 - [debug]   ok.
Wed Jan  9 11:21:07 2013 - [debug] 
Wed Jan  9 11:21:05 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.109(10.10.1.109:22) to ops@10.10.1.110(10.10.1.110:22)..
Wed Jan  9 11:21:06 2013 - [debug]   ok.
Wed Jan  9 11:21:06 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.109(10.10.1.109:22) to ops@10.10.1.193(10.10.1.193:22)..
Wed Jan  9 11:21:07 2013 - [debug]   ok.
Wed Jan  9 11:21:07 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.109(10.10.1.109:22) to ops@10.10.1.194(10.10.1.194:22)..
Wed Jan  9 11:21:07 2013 - [debug]   ok.
Wed Jan  9 11:21:08 2013 - [debug] 
Wed Jan  9 11:21:07 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.194(10.10.1.194:22) to ops@10.10.1.109(10.10.1.109:22)..
Wed Jan  9 11:21:07 2013 - [debug]   ok.
Wed Jan  9 11:21:07 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.194(10.10.1.194:22) to ops@10.10.1.110(10.10.1.110:22)..
Wed Jan  9 11:21:08 2013 - [debug]   ok.
Wed Jan  9 11:21:08 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.194(10.10.1.194:22) to ops@10.10.1.193(10.10.1.193:22)..
Wed Jan  9 11:21:08 2013 - [debug]   ok.
Wed Jan  9 11:21:10 2013 - [debug] 
Wed Jan  9 11:21:06 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.110(10.10.1.110:22) to ops@10.10.1.109(10.10.1.109:22)..
Wed Jan  9 11:21:06 2013 - [debug]   ok.
Wed Jan  9 11:21:06 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.110(10.10.1.110:22) to ops@10.10.1.193(10.10.1.193:22)..
Wed Jan  9 11:21:07 2013 - [debug]   ok.
Wed Jan  9 11:21:07 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.110(10.10.1.110:22) to ops@10.10.1.194(10.10.1.194:22)..
Wed Jan  9 11:21:10 2013 - [debug]   ok.
Wed Jan  9 11:21:10 2013 - [info] All SSH connection tests passed successfully.
ops@B5M-D5:/var/log/masterha/app1$ 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
and,I do it again

ops@B5M-D5:/var/log/masterha/app1$ masterha_check_ssh --conf=/etc/app1.cnf 
Wed Jan  9 11:27:54 2013 - [warning] Global configuration file 
/etc/masterha_default.cnf not found. Skipping.
Wed Jan  9 11:27:54 2013 - [info] Reading application default configurations 
from /etc/app1.cnf..
Wed Jan  9 11:27:54 2013 - [info] Reading server configurations from 
/etc/app1.cnf..
Wed Jan  9 11:27:54 2013 - [info] Starting SSH connection tests..
Wed Jan  9 11:27:56 2013 - [debug] 
Wed Jan  9 11:27:54 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.109(10.10.1.109:22) to ops@10.10.1.110(10.10.1.110:22)..
Wed Jan  9 11:27:55 2013 - [debug]   ok.
Wed Jan  9 11:27:55 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.109(10.10.1.109:22) to ops@10.10.1.193(10.10.1.193:22)..
Wed Jan  9 11:27:56 2013 - [debug]   ok.
Wed Jan  9 11:27:56 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.109(10.10.1.109:22) to ops@10.10.1.194(10.10.1.194:22)..
Wed Jan  9 11:27:56 2013 - [debug]   ok.
Wed Jan  9 11:27:57 2013 - [debug] 
Wed Jan  9 11:27:56 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.194(10.10.1.194:22) to ops@10.10.1.109(10.10.1.109:22)..
Wed Jan  9 11:27:56 2013 - [debug]   ok.
Wed Jan  9 11:27:56 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.194(10.10.1.194:22) to ops@10.10.1.110(10.10.1.110:22)..
Wed Jan  9 11:27:57 2013 - [debug]   ok.
Wed Jan  9 11:27:57 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.194(10.10.1.194:22) to ops@10.10.1.193(10.10.1.193:22)..
Wed Jan  9 11:27:57 2013 - [debug]   ok.
Wed Jan  9 11:27:59 2013 - [debug] 
Wed Jan  9 11:27:55 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.193(10.10.1.193:22) to ops@10.10.1.109(10.10.1.109:22)..
Wed Jan  9 11:27:56 2013 - [debug]   ok.
Wed Jan  9 11:27:56 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.193(10.10.1.193:22) to ops@10.10.1.110(10.10.1.110:22)..
Wed Jan  9 11:27:59 2013 - [debug]   ok.
Wed Jan  9 11:27:59 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.193(10.10.1.193:22) to ops@10.10.1.194(10.10.1.194:22)..
Wed Jan  9 11:27:59 2013 - [debug]   ok.
Wed Jan  9 11:28:03 2013 - [error][/usr/share/perl5/MHA/SSHCheck.pm, ln63] 
Wed Jan  9 11:27:55 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.110(10.10.1.110:22) to ops@10.10.1.109(10.10.1.109:22)..
Wed Jan  9 11:27:59 2013 - [debug]   ok.
Wed Jan  9 11:27:59 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.110(10.10.1.110:22) to ops@10.10.1.193(10.10.1.193:22)..
Wed Jan  9 11:27:59 2013 - [debug]   ok.
Wed Jan  9 11:27:59 2013 - [debug]  Connecting via SSH from 
ops@10.10.1.110(10.10.1.110:22) to ops@10.10.1.194(10.10.1.194:22)..
ssh: connect to host 10.10.1.194 port 22: Connection timed out
Wed Jan  9 11:28:03 2013 - [error][/usr/share/perl5/MHA/SSHCheck.pm, ln107] SSH 
connection from ops@10.10.1.110(10.10.1.110:22) to 
ops@10.10.1.194(10.10.1.194:22) failed!
SSH Configuration Check Failed!
 at /usr/bin/masterha_check_ssh line 44

Hi~~~Dear's Yoshinor

Do you think it is a network problem

Original comment by yanq...@b5m.com on 9 Jan 2013 at 3:29

GoogleCodeExporter commented 9 years ago
Network may have problems. Check your network admin. 
The latest MHA (0.55) has a parameter to change ssh connection timeout 
(http://code.google.com/p/mysql-master-ha/wiki/Parameters). Default is 5 
seconds. You may increase timeouts.

Original comment by Yoshinor...@gmail.com on 9 Jan 2013 at 3:43