Open claflico opened 6 years ago
Hey, looks like the use of the wait
command. I remember having issues with this a while back on a different OS. I will update tomorrow.
Thanks for your patience. Any chance you can try replacing lines 100-103 in the keepalived.sh file with what follows, rebuilding the container and seeing if that works better:
while true; do
# Check if Keepalived is STILL running by recording it's PID (if it's not running $pid will be null):
pid=$(pidof keepalived)
# If it is not, lets kill our PID1 process (this script) by breaking out of this while loop:
# This ensures Docker 'sees' the failure and handles it as necessary
if [ -z "$pid" ]; then
echo "Keepalived is no longer running, exiting so Docker can restart the container..."
break
fi
# If it is, give the CPU a rest
sleep 0.5
done
I can do so myself and test accordingly but it might be a couple of days.
Hey Cory, thanks again for your patience, I've made the necessary changes. Please rebuild, test as appropriate and let me know if you have any further issues. I've tested and it works for me.
Spun up some new load balancers docker hosts last night and attempted to migrate the keepalived service to those hosts but the VIP would never come up.
This is a snippet of the logs:
I saw that the new hosts were using an image that was created 5 weeks ago. I went to the previous host that had the image that was created 13 months ago, tagged it & pushed it to our Docker image server. I configured the service to use that tagged image and the VIP came up on the new hosts so there's something in this new image since it's the only thing that changed.
Also, the check port script should probably be changed from
grep ':${CHECK_PORT}'"
togrep 'dpt:${CHECK_PORT} '"
because otherwise the script could show a false positive when something is also running on port 8000 (i.e.traefik) on that host: