blueloveyu / mysql-master-ha

Automatically exported from code.google.com/p/mysql-master-ha
2 stars 0 forks source link

Dead slave during the switch #43

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.Stop original  master 
2. while the mha monitor is electing slave A to master power down the slave A  
3.check the log 
-------------------
What is the expected output? What do you see instead?
Not sure if this behaviour  is by design, but i would expect that the manger 
when it detects that the slave is not reachable via ssh would try another slave 
( my test environment is 1 master and 3 slaves ) 

What version of the product are you using? On what operating system?
Linux ubuntu 12.04 - mha-5.3 

Please provide any additional information below.
Please see the log below At this line "Fri Dec  7 17:05:33 2012 - [warning] 
HealthCheck: SSH to ip-10-0-1-248 is NOT reachable." manger know that the 
elected master is not reachable and fail the switch. ( make sense to make a 
second check ? ) 

Thanks 

Fri Dec  7 17:03:44 2012 - [debug]  Disconnected from 
ip-10-0-1-248(10.0.1.248:3306)
Fri Dec  7 17:03:44 2012 - [debug]  Disconnected from 
ip-10-0-1-49(10.0.1.49:3306)
Fri Dec  7 17:03:44 2012 - [debug]  Disconnected from 
ip-10-0-1-171(10.0.1.171:3306)
Fri Dec  7 17:03:44 2012 - [debug] SSH check command: save_binary_logs 
--command=test --start_pos=4 --binlog_dir=/var/log/mysql 
--output_file=/var/log/masterha/app1/save_binary_logs_test 
--manager_version=0.53 --binlog_prefix=mysql-bin --debug 
Fri Dec  7 17:03:44 2012 - [info] Set master ping interval 3 seconds.
Fri Dec  7 17:03:44 2012 - [warning] secondary_check_script is not defined. It 
is highly recommended setting it to check master reachability from two or more 
routes.
Fri Dec  7 17:03:44 2012 - [info] Starting ping health check on 
ip-10-0-1-149(10.0.1.149:3306)..
Fri Dec  7 17:03:44 2012 - [debug] Connected on master.
Fri Dec  7 17:03:44 2012 - [debug] Set short wait_timeout on master: 6 seconds
Fri Dec  7 17:03:44 2012 - [info] Ping(SELECT) succeeded, waiting until MySQL 
doesn't respond..
root@ip-10-0-1-45:/var/log/masterha# tail -f app1.log
Fri Dec  7 17:03:44 2012 - [debug]  Disconnected from 
ip-10-0-1-248(10.0.1.248:3306)
Fri Dec  7 17:03:44 2012 - [debug]  Disconnected from 
ip-10-0-1-49(10.0.1.49:3306)
Fri Dec  7 17:03:44 2012 - [debug]  Disconnected from 
ip-10-0-1-171(10.0.1.171:3306)
Fri Dec  7 17:03:44 2012 - [debug] SSH check command: save_binary_logs 
--command=test --start_pos=4 --binlog_dir=/var/log/mysql 
--output_file=/var/log/masterha/app1/save_binary_logs_test 
--manager_version=0.53 --binlog_prefix=mysql-bin --debug 
Fri Dec  7 17:03:44 2012 - [info] Set master ping interval 3 seconds.
Fri Dec  7 17:03:44 2012 - [warning] secondary_check_script is not defined. It 
is highly recommended setting it to check master reachability from two or more 
routes.
Fri Dec  7 17:03:44 2012 - [info] Starting ping health check on 
ip-10-0-1-149(10.0.1.149:3306)..
Fri Dec  7 17:03:44 2012 - [debug] Connected on master.
Fri Dec  7 17:03:44 2012 - [debug] Set short wait_timeout on master: 6 seconds
Fri Dec  7 17:03:44 2012 - [info] Ping(SELECT) succeeded, waiting until MySQL 
doesn't respond..
Fri Dec  7 17:05:17 2012 - [warning] Got error on MySQL select ping: 2006 
(MySQL server has gone away)
Fri Dec  7 17:05:17 2012 - [info] Executing SSH check script: save_binary_logs 
--command=test --start_pos=4 --binlog_dir=/var/log/mysql 
--output_file=/var/log/masterha/app1/save_binary_logs_test 
--manager_version=0.53 --binlog_prefix=mysql-bin --debug 
Fri Dec  7 17:05:18 2012 - [info] HealthCheck: SSH to ip-10-0-1-149 is 
reachable.
Fri Dec  7 17:05:20 2012 - [warning] Got error on MySQL connect: 2003 (Can't 
connect to MySQL server on '10.0.1.149' (111))
Fri Dec  7 17:05:20 2012 - [warning] Connection failed 1 time(s)..
Fri Dec  7 17:05:23 2012 - [warning] Got error on MySQL connect: 2003 (Can't 
connect to MySQL server on '10.0.1.149' (111))
Fri Dec  7 17:05:23 2012 - [warning] Connection failed 2 time(s)..
Fri Dec  7 17:05:26 2012 - [warning] Got error on MySQL connect: 2003 (Can't 
connect to MySQL server on '10.0.1.149' (111))
Fri Dec  7 17:05:26 2012 - [warning] Connection failed 3 time(s)..
Fri Dec  7 17:05:26 2012 - [warning] Master is not reachable from health 
checker!
Fri Dec  7 17:05:26 2012 - [warning] Master ip-10-0-1-149(10.0.1.149:3306) is 
not reachable!
Fri Dec  7 17:05:26 2012 - [warning] SSH is reachable.
Fri Dec  7 17:05:26 2012 - [info] Connecting to a master server failed. Reading 
configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and 
trying to connect to all servers to check server status..
Fri Dec  7 17:05:26 2012 - [info] Reading default configuratoins from 
/etc/masterha_default.cnf..
Fri Dec  7 17:05:26 2012 - [info] Reading application default configurations 
from /etc/app1.cnf..
Fri Dec  7 17:05:26 2012 - [info] Reading server configurations from 
/etc/app1.cnf..
Fri Dec  7 17:05:26 2012 - [debug] Skipping connecting to dead master 
ip-10-0-1-149(10.0.1.149:3306).
Fri Dec  7 17:05:26 2012 - [debug] Connecting to servers..
Fri Dec  7 17:05:26 2012 - [debug]  Connected to: 
ip-10-0-1-248(10.0.1.248:3306), user=root
Fri Dec  7 17:05:26 2012 - [debug]  Connected to: ip-10-0-1-49(10.0.1.49:3306), 
user=root
Fri Dec  7 17:05:26 2012 - [debug]  Connected to: 
ip-10-0-1-171(10.0.1.171:3306), user=root
Fri Dec  7 17:05:26 2012 - [debug]  Comparing MySQL versions..
Fri Dec  7 17:05:26 2012 - [debug]   Comparing MySQL versions done.
Fri Dec  7 17:05:26 2012 - [debug] Connecting to servers done.
Fri Dec  7 17:05:26 2012 - [info] Dead Servers:
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-149(10.0.1.149:3306)
Fri Dec  7 17:05:26 2012 - [info] Alive Servers:
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-248(10.0.1.248:3306)
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-49(10.0.1.49:3306)
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-171(10.0.1.171:3306)
Fri Dec  7 17:05:26 2012 - [info] Alive Slaves:
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-248(10.0.1.248:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:26 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:26 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-49(10.0.1.49:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:26 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:26 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-171(10.0.1.171:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:26 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:26 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:26 2012 - [info] Checking slave configurations..
Fri Dec  7 17:05:26 2012 - [info]  read_only=1 is not set on slave 
ip-10-0-1-248(10.0.1.248:3306).
Fri Dec  7 17:05:26 2012 - [warning]  relay_log_purge=0 is not set on slave 
ip-10-0-1-248(10.0.1.248:3306).
Fri Dec  7 17:05:26 2012 - [info]  read_only=1 is not set on slave 
ip-10-0-1-49(10.0.1.49:3306).
Fri Dec  7 17:05:26 2012 - [warning]  relay_log_purge=0 is not set on slave 
ip-10-0-1-49(10.0.1.49:3306).
Fri Dec  7 17:05:26 2012 - [info]  read_only=1 is not set on slave 
ip-10-0-1-171(10.0.1.171:3306).
Fri Dec  7 17:05:26 2012 - [warning]  relay_log_purge=0 is not set on slave 
ip-10-0-1-171(10.0.1.171:3306).
Fri Dec  7 17:05:26 2012 - [info] Checking replication filtering settings..
Fri Dec  7 17:05:26 2012 - [info]  Replication filtering check ok.
Fri Dec  7 17:05:26 2012 - [info] Master is down!
Fri Dec  7 17:05:26 2012 - [info] Terminating monitoring script.
Fri Dec  7 17:05:26 2012 - [info] Got exit code 20 (Master dead).
Fri Dec  7 17:05:26 2012 - [info] MHA::MasterFailover version 0.53.
Fri Dec  7 17:05:26 2012 - [info] Starting master failover.
Fri Dec  7 17:05:26 2012 - [info] 
Fri Dec  7 17:05:26 2012 - [info] * Phase 1: Configuration Check Phase..
Fri Dec  7 17:05:26 2012 - [info] 
Fri Dec  7 17:05:26 2012 - [debug] Skipping connecting to dead master 
ip-10-0-1-149.
Fri Dec  7 17:05:26 2012 - [debug] Connecting to servers..
Fri Dec  7 17:05:26 2012 - [debug]  Connected to: 
ip-10-0-1-248(10.0.1.248:3306), user=root
Fri Dec  7 17:05:26 2012 - [debug]  Connected to: ip-10-0-1-49(10.0.1.49:3306), 
user=root
Fri Dec  7 17:05:26 2012 - [debug]  Connected to: 
ip-10-0-1-171(10.0.1.171:3306), user=root
Fri Dec  7 17:05:26 2012 - [debug]  Comparing MySQL versions..
Fri Dec  7 17:05:26 2012 - [debug]   Comparing MySQL versions done.
Fri Dec  7 17:05:26 2012 - [debug] Connecting to servers done.
Fri Dec  7 17:05:26 2012 - [info] Dead Servers:
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-149(10.0.1.149:3306)
Fri Dec  7 17:05:26 2012 - [info] Checking master reachability via mysql(double 
check)..
Fri Dec  7 17:05:26 2012 - [info]  ok.
Fri Dec  7 17:05:26 2012 - [info] Alive Servers:
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-248(10.0.1.248:3306)
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-49(10.0.1.49:3306)
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-171(10.0.1.171:3306)
Fri Dec  7 17:05:26 2012 - [info] Alive Slaves:
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-248(10.0.1.248:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:26 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:26 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-49(10.0.1.49:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:26 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:26 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-171(10.0.1.171:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:26 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:26 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:26 2012 - [info] ** Phase 1: Configuration Check Phase 
completed.
Fri Dec  7 17:05:26 2012 - [info] 
Fri Dec  7 17:05:26 2012 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri Dec  7 17:05:26 2012 - [info] 
Fri Dec  7 17:05:26 2012 - [debug]  Stopping IO thread on 
ip-10-0-1-248(10.0.1.248:3306)..
Fri Dec  7 17:05:26 2012 - [debug]  Stopping IO thread on 
ip-10-0-1-49(10.0.1.49:3306)..
Fri Dec  7 17:05:26 2012 - [debug]  Stop IO thread on 
ip-10-0-1-248(10.0.1.248:3306) done.
Fri Dec  7 17:05:26 2012 - [info] Forcing shutdown so that applications never 
connect to the current master..
Fri Dec  7 17:05:26 2012 - [debug]  Stopping IO thread on 
ip-10-0-1-171(10.0.1.171:3306)..
Fri Dec  7 17:05:26 2012 - [debug]  Stop IO thread on 
ip-10-0-1-49(10.0.1.49:3306) done.
Fri Dec  7 17:05:26 2012 - [info] Executing master IP deactivatation script:
Fri Dec  7 17:05:26 2012 - [info]   /opt/scripts/master_ip_failover 
--orig_master_host=ip-10-0-1-149 --orig_master_ip=10.0.1.149 
--orig_master_port=3306 --command=stopssh --ssh_user=root  
Fri Dec  7 17:05:26 2012 - [debug]  Stop IO thread on 
ip-10-0-1-171(10.0.1.171:3306) done.
Fri Dec  7 17:05:27 2012 - [info]  done.
Fri Dec  7 17:05:27 2012 - [warning] shutdown_script is not set. Skipping 
explicit shutting down of the dead master.
Fri Dec  7 17:05:27 2012 - [info] * Phase 2: Dead Master Shutdown Phase 
completed.
Fri Dec  7 17:05:27 2012 - [info] 
Fri Dec  7 17:05:27 2012 - [info] * Phase 3: Master Recovery Phase..
Fri Dec  7 17:05:27 2012 - [info] 
Fri Dec  7 17:05:27 2012 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri Dec  7 17:05:27 2012 - [info] 
Fri Dec  7 17:05:27 2012 - [debug] Fetching current slave status..
Fri Dec  7 17:05:27 2012 - [debug]  Fetching current slave status done.
Fri Dec  7 17:05:27 2012 - [info] The latest binary log file/position on all 
slaves is mysql-bin.000009:82776781
Fri Dec  7 17:05:27 2012 - [info] Latest slaves (Slaves that received relay log 
files to the latest):
Fri Dec  7 17:05:27 2012 - [info]   ip-10-0-1-248(10.0.1.248:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:27 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:27 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:27 2012 - [info]   ip-10-0-1-49(10.0.1.49:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:27 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:27 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:27 2012 - [info]   ip-10-0-1-171(10.0.1.171:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:27 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:27 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:27 2012 - [info] The oldest binary log file/position on all 
slaves is mysql-bin.000009:82776781
Fri Dec  7 17:05:27 2012 - [info] Oldest slaves:
Fri Dec  7 17:05:27 2012 - [info]   ip-10-0-1-248(10.0.1.248:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:27 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:27 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:27 2012 - [info]   ip-10-0-1-49(10.0.1.49:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:27 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:27 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:27 2012 - [info]   ip-10-0-1-171(10.0.1.171:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:27 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:27 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:27 2012 - [info] 
Fri Dec  7 17:05:27 2012 - [info] * Phase 3.2: Saving Dead Master's Binlog 
Phase..
Fri Dec  7 17:05:27 2012 - [info] 
Fri Dec  7 17:05:27 2012 - [info] Fetching dead master's binary logs..
Fri Dec  7 17:05:27 2012 - [info] Executing command on the dead master 
ip-10-0-1-149(10.0.1.149:3306): save_binary_logs --command=save 
--start_file=mysql-bin.000009  --start_pos=82776781 --binlog_dir=/var/log/mysql 
--output_file=/var/log/masterha/app1/saved_master_binlog_from_ip-10-0-1-149_3306
_20121207170526.binlog --handle_raw_binlog=1 --disable_log_bin=0 
--manager_version=0.53 --debug 
  Creating /var/log/masterha/app1 if not exists..    ok.
 Concat binary/relay logs from mysql-bin.000009 pos 82776781 to mysql-bin.000009 EOF into /var/log/masterha/app1/saved_master_binlog_from_ip-10-0-1-149_3306_20121207170526.binlog ..
parse_init_headers: file=mysql-bin.000009 event_type=15 server_id=10 length=103 
nextmpos=107 prevrelay=4 cur(post)relay=107
parse_init_headers: file=mysql-bin.000009 event_type=2 server_id=10 length=78 
nextmpos=185 prevrelay=107 cur(post)relay=185
  Dumping binlog format description event, from position 0 to 107.. ok.
  Dumping effective binlog data from /var/log/mysql/mysql-bin.000009 position 82776781 to tail(82777069).. ok.
parse_init_headers: 
file=saved_master_binlog_from_ip-10-0-1-149_3306_20121207170526.binlog 
event_type=15 server_id=10 length=103 nextmpos=107 prevrelay=4 
cur(post)relay=107
parse_init_headers: 
file=saved_master_binlog_from_ip-10-0-1-149_3306_20121207170526.binlog 
event_type=2 server_id=10 length=78 nextmpos=82776859 prevrelay=107 
cur(post)relay=185
 Concat succeeded.
Fri Dec  7 17:05:29 2012 - [info] scp from 
root@10.0.1.149:/var/log/masterha/app1/saved_master_binlog_from_ip-10-0-1-149_33
06_20121207170526.binlog to 
local:/var/log/masterha/app1/saved_master_binlog_from_ip-10-0-1-149_3306_2012120
7170526.binlog succeeded.
Fri Dec  7 17:05:33 2012 - [warning] HealthCheck: SSH to ip-10-0-1-248 is NOT 
reachable.
Fri Dec  7 17:05:34 2012 - [info] HealthCheck: SSH to ip-10-0-1-49 is reachable.
Fri Dec  7 17:05:35 2012 - [info] HealthCheck: SSH to ip-10-0-1-171 is 
reachable.
Fri Dec  7 17:05:35 2012 - [info] 
Fri Dec  7 17:05:35 2012 - [info] * Phase 3.3: Determining New Master Phase..
Fri Dec  7 17:05:35 2012 - [info] 
Fri Dec  7 17:05:35 2012 - [info] Finding the latest slave that has all relay 
logs for recovering other slaves..
Fri Dec  7 17:05:35 2012 - [info] All slaves received relay logs to the same 
position. No need to resync each other.
Fri Dec  7 17:05:35 2012 - [info] Dead Servers:
Fri Dec  7 17:05:35 2012 - [info]   ip-10-0-1-149(10.0.1.149:3306)
Fri Dec  7 17:05:35 2012 - [info]   ip-10-0-1-248(10.0.1.248:3306) Not 
reachable via SSH  Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version 
between slaves) log-bin:enabled
Fri Dec  7 17:05:35 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:35 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:35 2012 - [info] Alive Slaves:
Fri Dec  7 17:05:35 2012 - [info]   ip-10-0-1-49(10.0.1.49:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:35 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:35 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:35 2012 - [info]   ip-10-0-1-171(10.0.1.171:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:35 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:35 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:35 2012 - 
[error][/usr/local/share/perl/5.14.2/MHA/ServerManager.pm, ln443]  Server 
ip-10-0-1-248(10.0.1.248:3306) is dead, but must be alive! Check server 
settings.
Fri Dec  7 17:05:35 2012 - 
[error][/usr/local/share/perl/5.14.2/MHA/ManagerUtil.pm, ln178] Got ERROR:  at 
/usr/local/share/perl/5.14.2/MHA/MasterFailover.pm line 1456
Fri Dec  7 17:05:35 2012 - [debug]  Disconnected from 
ip-10-0-1-49(10.0.1.49:3306)
Fri Dec  7 17:05:35 2012 - [debug]  Disconnected from 
ip-10-0-1-171(10.0.1.171:3306)
Fri Dec  7 17:05:35 2012 - [info] 

----- Failover Report -----

app1: MySQL Master failover ip-10-0-1-149

Master ip-10-0-1-149 is down!

Check MHA Manager logs at ip-10-0-1-45:/var/log/masterha/app1.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on ip-10-0-1-149.
The latest slave ip-10-0-1-248(10.0.1.248:3306) has all relay logs for recovery.
Got Error so couldn't continue failover from here.
_
Andrea Ceresoni

Original issue reported on code.google.com by andrea.c...@gmail.com on 7 Dec 2012 at 5:26

GoogleCodeExporter commented 9 years ago
This is an expected behavior. MHA checks all configured instances' status in 
3.1 and if an elected new master is down after that MHA does not retry to 
promote other slave anymore. MHA master promotion is an automated process so 
your test situation is very rare case, and retrying to promote other slave does 
not solve the problem entirely (what if the newly elected slave is down after 
elected?). So stopping failover with error is fine I guess.

Original comment by Yoshinor...@gmail.com on 7 Dec 2012 at 6:22

GoogleCodeExporter commented 9 years ago
Thanks for the clarification. 

Original comment by andrea.c...@gmail.com on 10 Dec 2012 at 11:52

GoogleCodeExporter commented 9 years ago

Original comment by Yoshinor...@gmail.com on 12 Dec 2012 at 8:40