Appdynamics / HA-toolkit

AppDynamics Controller High Availability Toolkit
https://docs.appdynamics.com/display/PRO42/Using+the+High+Availability+(HA)+Toolkit
Apache License 2.0
13 stars 11 forks source link

replicate.sh exits with exit code 1 when old replication logs do not exist on secondary #68

Closed psrpardhasaradhi closed 7 years ago

psrpardhasaradhi commented 7 years ago

HA Version : 3.24 Tested with : sh, bash and tcsh

In replicate.sh:

# clean out the old relay and bin-logs
#
message "Removing old replication logs"
runcmd ssh $secondary "find $datadir -print | grep bin-log | xargs rm  -f"
runcmd ssh $secondary "find $datadir -print | grep relay-log | xargs rm  -f"
runcmd ssh $secondary rm -f $datadir/master.info
function runcmd {
 local cmd="$*"
 if ! $cmd ; then
 fatal 1 "\"$cmd\" command failed"
 fi
}

replicate.sh quits when runcmd encounters exitcode 1 . When the old replication logs do not exist on the secondary controller, the exit code for the commands to delete the replication logs would be 1. See

Failure on secondary because the replication logs do not exist:

bash-4.1$ find /local/mnt/AppD/Controller/db/data -print | grep bin-log

bash-4.1$ echo $?
1

bash-4.1$ touch /local/mnt/AppD/Controller/db/data/bin-log

bash-4.1$ find /local/mnt/AppD/Controller/db/data -print | grep bin-log
/local/mnt/AppD/Controller/db/data/bin-log

bash-4.1$ echo $?
0

bash-4.1$ find /local/mnt/AppD/Controller/db/data -print | grep bin-log | xargs rm -f
bash-4.1$ echo $?
0

runcmd should not treat this as fatal. For example if replication was being set up for first time, there may not be old replication logs on secondary.

17:48:36.503 §+ message 'Removing old replication logs' 
17:48:36.503 §+ local out=/dev/tty 
17:48:36.504 §+ tty -s 
17:48:36.504 §+ echo ' -- ' 'Removing old replication logs' 
17:48:36.505 § -- Removing old replication logs 
17:48:36.505 §+ logmsg 'Removing old replication logs' 
17:48:36.505 §+ echo ' -- ' 'Removing old replication logs' 
17:48:36.505 §+ runcmd ssh secondary_host 'find /local/mnt/AppD/Controller/db/data -print | grep bin-log | xargs rm -f' 
17:48:36.505 §+ local 'cmd=ssh secondary_host find /local/mnt/AppD/Controller/db/data -print | grep bin-log | xargs rm -f' 
17:48:36.507 §+ ssh secondary_host find /local/mnt/AppD/Controller/db/data -print '|' grep bin-log '|' xargs rm -f 
17:48:36.747 §+ fatal 1 '"ssh secondary_host find /local/mnt/AppD/Controller/db/data -print | grep bin-log | xargs rm -f" command failed' 
17:48:36.747 §+ local exitcode=1 
17:48:36.747 §+ shift 
17:48:36.747 §+ gripe '"ssh secondary_host find /local/mnt/AppD/Controller/db/data -print | grep bin-log | xargs rm -f" command failed' 
17:48:36.747 §+ local out=/dev/tty 
17:48:36.749 §+ tty -s 
17:48:36.775 §+ echo '"ssh secondary_host find /local/mnt/AppD/Controller/db/data -print | grep bin-log | xargs rm -f" command failed' 
17:48:36.775 §"ssh secondary_host find /local/mnt/AppD/Controller/db/data -print | grep bin-log | xargs rm -f" command failed 
17:48:36.775 §+ gripe 'exit code 1' 
17:48:36.775 §+ local out=/dev/tty 
17:48:36.775 §+ tty -s 
17:48:36.775 §+ echo 'exit code 1' 
17:48:36.776 §exit code 1 
17:48:36.776 §+ kill -INT 16236 
17:48:36.777 §++ handle_interrupt 
17:48:36.777 §++ echo 'Caught interrupt.' 
17:48:36.778 §Caught interrupt. 
17:48:36.778 §+++ jobs -p 
17:48:36.779 §++ [[ -n '' ]] 
17:48:36.779 §++ echo Exiting 
17:48:36.779 §Exiting 
17:48:36.779 §++ exit 
17:48:36.780 §+ cleanup 
17:48:36.780 §+ rm -rf /tmp/ha.16236 
17:48:36.780 §+ kill_rsyncd 
17:48:36.782 §++ ssh secondary_host cat /tmp/replicate.rsync.pid 
17:48:36.996 §+ rsyncd_pid= 
17:48:36.997 §+ '[' '!' -z '' ']' 
17:48:36.998 §+ ssh secondary_host rm -f /tmp/replicate.rsync.pid

Temporary fix done in replicate.sh and the replication succeeds. But we may need to differentiate the check between file exists and cannot delete for a reason (permissions may be) and file does not exist and hence exit code is 1.

Changed:

message "Removing old replication logs"
runcmd ssh $secondary "find $datadir -print | grep bin-log | xargs rm  -f"
runcmd ssh $secondary "find $datadir -print | grep relay-log | xargs rm  -f"
runcmd ssh $secondary rm -f $datadir/master.info

TO:

message "Removing old replication logs"
ssh $secondary "find $datadir -print | grep bin-log | xargs rm  -f"
ssh $secondary "find $datadir -print | grep relay-log | xargs rm  -f"
ssh $secondary rm -f $datadir/master.info
cmayer68 commented 7 years ago

fix staged for 3.26