NeoAssist / docker-keepalived

Dockerized keepalived to ease HA in deployments with multiple hosts. Provides failover for Virtual IPs (VIP) to be always online even if a host fails. Initially aimed to help Rancher HA deployments
MIT License
65 stars 37 forks source link

Not working for Rancher v1.2.1? #16

Closed wilsontayar closed 7 years ago

wilsontayar commented 7 years ago

Hey guys,

I've just updated my rancher server to the latest version (v1.2.1) on a CoreOS host (stable 1185.5.0) and now my keepalived container does not seem to be working.

Maybe this has something to do with Rancher's new network service?

Here's the container logs:

12/20/2016 9:57:43 PM Name = vethbba019de
12/20/2016 9:57:43 PMKeepalived_vrrp[21]:  Name = vethbba019de
12/20/2016 9:57:43 PM index = 20
12/20/2016 9:57:43 PMKeepalived_vrrp[21]:  index = 20
12/20/2016 9:57:43 PM IPv4 address = 0.0.0.0
12/20/2016 9:57:43 PMKeepalived_vrrp[21]:  IPv4 address = 0.0.0.0
12/20/2016 9:57:43 PM IPv6 address = fe80::e8db:b8ff:fee2:57b6
12/20/2016 9:57:43 PMKeepalived_vrrp[21]:  IPv6 address = fe80::e8db:b8ff:fee2:57b6
12/20/2016 9:57:43 PM MAC = ea:db:b8:e2:57:b6
12/20/2016 9:57:43 PMKeepalived_vrrp[21]:  MAC = ea:db:b8:e2:57:b6
12/20/2016 9:57:43 PM is UP
12/20/2016 9:57:43 PMKeepalived_vrrp[21]:  is UP
12/20/2016 9:57:43 PM is RUNNING
12/20/2016 9:57:43 PMKeepalived_vrrp[21]:  is RUNNING
12/20/2016 9:57:43 PM MTU = 1500
12/20/2016 9:57:43 PMKeepalived_vrrp[21]:  MTU = 1500
12/20/2016 9:57:43 PM HW Type = ETHERNET
12/20/2016 9:57:43 PMKeepalived_vrrp[21]:  HW Type = ETHERNET
12/20/2016 9:57:43 PM Enabling NIC ioctl refresh polling
12/20/2016 9:57:43 PMKeepalived_vrrp[21]:  Enabling NIC ioctl refresh polling
12/20/2016 9:57:43 PM------< NIC >------
12/20/2016 9:57:43 PMKeepalived_vrrp[21]: ------< NIC >------
12/20/2016 9:57:43 PM Name = vethc273286f
12/20/2016 9:57:43 PMKeepalived_vrrp[21]:  Name = vethc273286f
12/20/2016 9:57:43 PM index = 21
12/20/2016 9:57:43 PMKeepalived_vrrp[21]:  index = 21
12/20/2016 9:57:43 PM IPv4 address = 0.0.0.0
12/20/2016 9:57:43 PMKeepalived_vrrp[21]:  IPv4 address = 0.0.0.0
12/20/2016 9:57:43 PM IPv6 address = fe80::1827:9dff:fe00:81a4
12/20/2016 9:57:43 PMKeepalived_vrrp[21]:  IPv6 address = fe80::1827:9dff:fe00:81a4
12/20/2016 9:57:43 PM MAC = 1a:27:9d:00:81:a4
12/20/2016 9:57:43 PMKeepalived_vrrp[21]:  MAC = 1a:27:9d:00:81:a4
12/20/2016 9:57:43 PM is UP
12/20/2016 9:57:43 PMKeepalived_vrrp[21]:  is UP
12/20/2016 9:57:43 PM is RUNNING
12/20/2016 9:57:43 PMKeepalived_vrrp[21]:  is RUNNING
12/20/2016 9:57:43 PM MTU = 1500
12/20/2016 9:57:43 PMKeepalived_vrrp[21]:  MTU = 1500
12/20/2016 9:57:43 PM HW Type = ETHERNET
12/20/2016 9:57:43 PMKeepalived_vrrp[21]:  HW Type = ETHERNET
12/20/2016 9:57:43 PM Enabling NIC ioctl refresh polling
12/20/2016 9:57:43 PMKeepalived_vrrp[21]:  Enabling NIC ioctl refresh polling
12/20/2016 9:57:43 PMUsing LinkWatch kernel netlink reflector...
12/20/2016 9:57:43 PMKeepalived_vrrp[21]: Using LinkWatch kernel netlink reflector...
12/20/2016 9:57:43 PMVRRP_Instance(lb-vips) Entering BACKUP STATE
12/20/2016 9:57:43 PMKeepalived_vrrp[21]: VRRP_Instance(lb-vips) Entering BACKUP STATE
12/20/2016 9:57:43 PMVRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)]
12/20/2016 9:57:43 PMKeepalived_vrrp[21]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)]
12/20/2016 9:57:43 PMpid 22 exited with status 256
12/20/2016 9:57:43 PMKeepalived_vrrp[21]: pid 22 exited with status 256
12/20/2016 9:57:44 PMpid 26 exited with status 256
12/20/2016 9:57:44 PMKeepalived_vrrp[21]: pid 26 exited with status 256
12/20/2016 9:57:45 PMDisplaying resulting /etc/keepalived/keepalived.conf contents...
12/20/2016 9:57:45 PM    global_defs {
12/20/2016 9:57:45 PM        router_id your_hostname
12/20/2016 9:57:45 PM        vrrp_version 2
12/20/2016 9:57:45 PM        vrrp_garp_master_delay 1
12/20/2016 9:57:45 PM        vrrp_garp_master_refresh
12/20/2016 9:57:45 PM        #Uncomment the next line if you'd like to use unique multicast groups
12/20/2016 9:57:45 PM        #vrrp_mcast_group4 224.0.0.150 
12/20/2016 9:57:45 PM    }   
12/20/2016 9:57:45 PM
12/20/2016 9:57:45 PM    vrrp_script chk_haproxy {
12/20/2016 9:57:45 PM        script       "ss -ltn 'src any' | grep 80"
12/20/2016 9:57:45 PM        timeout 1
12/20/2016 9:57:45 PM        interval 1   # check every 1 second
12/20/2016 9:57:45 PM        fall 2       # require 2 failures for KO
12/20/2016 9:57:45 PM        rise 2       # require 2 successes for OK
12/20/2016 9:57:45 PM    }   
12/20/2016 9:57:45 PM
12/20/2016 9:57:45 PM    vrrp_instance lb-vips {
12/20/2016 9:57:45 PM        state BACKUP
12/20/2016 9:57:45 PM        interface eth0
12/20/2016 9:57:45 PM        virtual_router_id 150
12/20/2016 9:57:45 PM        priority 100
12/20/2016 9:57:45 PM        advert_int 1
12/20/2016 9:57:45 PM        nopreempt
12/20/2016 9:57:45 PM        track_script {
12/20/2016 9:57:45 PM            chk_haproxy
12/20/2016 9:57:45 PM        }
12/20/2016 9:57:45 PM        authentication {
12/20/2016 9:57:45 PM            auth_type PASS
12/20/2016 9:57:45 PM            auth_pass blahblah
12/20/2016 9:57:45 PM        }
12/20/2016 9:57:45 PM        virtual_ipaddress {
12/20/2016 9:57:45 PM            192.168.32.74/24 dev eth0
12/20/2016 9:57:45 PM        }
12/20/2016 9:57:45 PM    } 
12/20/2016 9:57:45 PMStarting Keepalived in the background...
12/20/2016 9:57:45 PMStarting Keepalived v1.2.24 (11/19,2016), git commit v3.5.0_rc2-45-g813ce7d+
12/20/2016 9:57:45 PMKeepalived[31]: Starting Keepalived v1.2.24 (11/19,2016), git commit v3.5.0_rc2-45-g813ce7d+
12/20/2016 9:57:45 PMOpening file '/etc/keepalived/keepalived.conf'.
12/20/2016 9:57:45 PMKeepalived[31]: Opening file '/etc/keepalived/keepalived.conf'.
12/20/2016 9:57:45 PMdaemon is already running
12/20/2016 9:57:45 PMKeepalived[31]: daemon is already running
12/20/2016 9:57:45 PM/usr/bin/keepalived.sh: line 93: wait: pid 21 is not a child of this shell
12/20/2016 9:57:45 PMpid 33 exited with status 256
12/20/2016 9:57:45 PMKeepalived_vrrp[21]: pid 33 exited with status 256
12/20/2016 9:57:46 PMpid 37 exited with status 256
12/20/2016 9:57:46 PMKeepalived_vrrp[21]: pid 37 exited with status 256
12/20/2016 9:57:47 PMVRRP_Instance(lb-vips) Now in FAULT state
12/20/2016 9:57:47 PMKeepalived_vrrp[21]: VRRP_Instance(lb-vips) Now in FAULT state
12/20/2016 9:57:47 PMpid 41 exited with status 256
12/20/2016 9:57:47 PMKeepalived_vrrp[21]: pid 41 exited with status 256
12/20/2016 9:57:48 PMpid 45 exited with status 256
12/20/2016 9:57:48 PMKeepalived_vrrp[21]: pid 45 exited with status 256
12/20/2016 9:57:49 PMpid 49 exited with status 256
12/20/2016 9:57:49 PMKeepalived_vrrp[21]: pid 49 exited with status 256
12/20/2016 9:57:50 PMpid 53 exited with status 256
12/20/2016 9:57:50 PMKeepalived_vrrp[21]: pid 53 exited with status 256
12/20/2016 9:57:51 PMpid 57 exited with status 256
12/20/2016 9:57:51 PMKeepalived_vrrp[21]: pid 57 exited with status 256
12/20/2016 9:57:52 PMpid 61 exited with status 256
12/20/2016 9:57:52 PMKeepalived_vrrp[21]: pid 61 exited with status 256
12/20/2016 9:57:53 PMpid 65 exited with status 256
12/20/2016 9:57:53 PMKeepalived_vrrp[21]: pid 65 exited with status 256
12/20/2016 9:57:54 PMpid 69 exited with status 256
...
sjiveson commented 7 years ago

Assuming you're running this container in host networking mode (which you should be) then the changes in v1.2.x shouldn't have an impact - but hey, you never know.

Could you post the beginning of the logs please? From the above it looks like keepalived is already running before the script starts it, which it shouldn't be but perhaps some of the looping logic in the entrypoint script is subtly flawed.

wilsontayar commented 7 years ago

For some reason, chk_haproxy wasn't getting port 80 when running ss -ltn 'src any'. I didn't have time to investigate it further, but I guess the new Rancher's LB might have something to do with it.

Anyway, I've changed to a port that appeared when running ss -ltn from inside the container (eg. 22) and everything seems to be working just as before the update.

Thanks for the assistance, @sjiveson. I'm closing the issue now. :)

sjiveson commented 7 years ago

OK, good stuff and you're welcome. Hope you get to the bottom of why the port is missing.