evrardjp / ansible-keepalived

Keepalived role for ansible deployment
Apache License 2.0
98 stars 98 forks source link

virtual IP not coming up #18

Closed a1git closed 8 years ago

a1git commented 8 years ago

Hi,

i have been testing stable/newton and one thing I find is that the vip does not come up (I test on 3 controllers). This makes the repo build fail during setup-infrastructure , and the solution is to manually login to all the 3 controllers and do a keepalive restart. This makes one of them working and can continue with the rest of the process.

jardleex commented 8 years ago

Hello @a1git, I had a similar issue on Ubuntu 16.04. If you also have systemd as init system on your side check:

:~$ systemctl status keepalived
● keepalived.service - LVS and VRRP High Availability Monitor
   Loaded: loaded (/etc/init.d/keepalived; bad; vendor preset: enabled)
   Active: active (exited) since Wed 2016-10-19 13:15:41 UTC; 3min 27s ago

Oct 19 13:15:41 dnsmasq02 systemd[1]: Starting LSB: Starts keepalived...
Oct 19 13:15:41 dnsmasq02 keepalived[13396]:  * Starting keepalived keepalived
Oct 19 13:15:41 dnsmasq02 keepalived[13396]:    ...fail!
Oct 19 13:15:41 dnsmasq02 systemd[1]: Started LSB: Starts keepalived.

As in version 1.2.23 no systemd unit file is included, systemd's systemd-sysv-generator kicks in. The result seems to lead to the above issue.

To fix this, I fetched the original systemd unit file of the package maintainers https://github.com/acassen/keepalived/blob/master/keepalived/keepalived.service placed it under /etc/systemd/system/keepalived.service, exec systemd daemon-reload and systemctl restart keepalived.

Afterwards keeplaived behaves as expected.

Maybe this helps you.

evrardjp commented 8 years ago

I'll investigate this further.

evrardjp commented 8 years ago

Hello,

I just created 3 new 16.04, with uca enabled or disabled and I don't see this appearing with the latest code.

Could you try to update to latest version (current version of the role is 2.2.1) and try again?

Note: I'd prefer try to work it out with the package maintainer to fix the package instead of implementing a work around here.

jardleex commented 8 years ago

@evrardjp, I ain't used uca, only the PPA packages. @a1git may you check this on your side?

I agree with you that using the latest package version of the maintainer would be the nicer way of doing it. But since version 1.2.24 of keepalived is not included in the PPA yet, I had to use this bypass. Building it from source won't be an option here.

Let's see what @a1git tests will show.

evrardjp commented 8 years ago

@jardleex Could you still check with the latest version of the role?

jardleex commented 8 years ago

@evrardjp, the issue is still there in role version 2.2.1. I tested it in a 16.04. Vagrant box.

Variables

---
keepalived_instances:
  vip1:
    interface: "{{ ansible_default_ipv4.interface }}"
    state: MASTER
    virtual_router_id: 11
    priority: 100
    authentication_password: 'foo'
    vips:
      - "10.0.2.16/24 dev eth0"

Box output

root@vagrantbox:~# systemctl status keepalived
● keepalived.service - LSB: Starts keepalived
   Loaded: loaded (/etc/init.d/keepalived; bad; vendor preset: enabled)
   Active: active (exited) since Fr 2016-10-21 09:20:45 UTC; 28s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 13568 ExecStop=/etc/init.d/keepalived stop (code=exited, status=0/SUCCESS)
  Process: 13580 ExecStart=/etc/init.d/keepalived start (code=exited, status=0/SUCCESS)

Okt 21 09:20:45 vagrantbox systemd[1]: Starting LSB: Starts keepalived...
Okt 21 09:20:45 vagrantbox keepalived[13580]:  * Starting keepalived keepalived
Okt 21 09:20:45 vagrantbox keepalived[13580]:    ...fail!
Okt 21 09:20:45 vagrantbox systemd[1]: Started LSB: Starts keepalived.
root@vagrantbox:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:ce:dd:78 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fece:dd78/64 scope link 
       valid_lft forever preferred_lft forever
root@vagrantbox:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.1 LTS
Release:    16.04
Codename:   xenial
root@vagrantbox:~# uname -a
Linux vagrantbox 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Bypass/Solution

root@vagrantbox:~# wget -q https://raw.githubusercontent.com/acassen/keepalived/v1.2.24/keepalived/keepalived.service
root@vagrantbox:~# mv keepalived.service /etc/systemd/system/
root@vagrantbox:~# systemctl daemon-reload
root@vagrantbox:~# systemctl restart keepalived
root@vagrantbox:~# systemctl status keepalived
● keepalived.service - LVS and VRRP High Availability Monitor
   Loaded: loaded (/etc/systemd/system/keepalived.service; disabled; vendor preset: enabled)
   Active: active (running) since Fr 2016-10-21 09:21:55 UTC; 3min 49s ago
  Process: 13673 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 13676 (keepalived)
    Tasks: 3
   Memory: 780.0K
      CPU: 66ms
   CGroup: /system.slice/keepalived.service
           ├─13676 /usr/sbin/keepalived -D
           ├─13677 /usr/sbin/keepalived -D
           └─13678 /usr/sbin/keepalived -D

Okt 21 09:21:57 vagrantbox Keepalived_vrrp[13678]: Sending gratuitous ARP on eth0 for 10.0.2.16
Okt 21 09:21:57 vagrantbox Keepalived_vrrp[13678]: Sending gratuitous ARP on eth0 for 10.0.2.16
Okt 21 09:21:57 vagrantbox Keepalived_vrrp[13678]: Sending gratuitous ARP on eth0 for 10.0.2.16
Okt 21 09:21:57 vagrantbox Keepalived_vrrp[13678]: Sending gratuitous ARP on eth0 for 10.0.2.16
Okt 21 09:22:02 vagrantbox Keepalived_vrrp[13678]: Sending gratuitous ARP on eth0 for 10.0.2.16
root@vagrantbox:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:ce:dd:78 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 10.0.2.16/24 scope global secondary eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fece:dd78/64 scope link 
       valid_lft forever preferred_lft forever

So from my side this is not a problem with the role. The latest keepalived version, especially the systemd unit file in it, would fix my issue.

evrardjp commented 8 years ago

@jardleex I'll respawn 3 new 16.04. I didn't got that on my new instances, so I am concerned there could be something in the virtualbox image. Let me double check.

a1git commented 8 years ago

stable/newton latest changeID: Ibaffd0c1a51a72a789de9b0c7c8d9c4ef6009b8e

I tried on 3 different multi-node installations/different clusters (all greenfield). It worked out of the box on all 3 different installations without any issues for me.

/me happy \o/

evrardjp commented 8 years ago

@a1git stable/newton doesn't have the latest version of the keepalived role so I suspect the cause of your issue was something else. But I'm glad it works.

@jardleex I can't reproduce this on a clean 16.04 from neither gandi nor rackspace nor my CI. None of them are using vbox 'though. Could you drop your keepalived configuration, version and source? Do you run with keepalived_use_latest_stable: True or False? Does it change something for you?

jardleex commented 8 years ago

@evrardjp

root@vagrantbox:~# keepalived --version
Keepalived v1.2.23 (07/26,2016)

Copyright (C) 2001-2016 Alexandre Cassen, <acassen@gmail.com>

Build options: KRNL_2_6 WITH_LVS HAVE_IPVS_SYNCD WITH_VRRP HAVE_VRRP_VMAC HAVE_ADDR_GEN_MODE WITHOUT_SNMP WITHOUT_SNMP_KEEPALIVED WITHOUT_SNMP_CHECKER WITHOUT_SNMP_RFC WITHOUT_SNMP_RFCV2 WITHOUT_SNMP_RFCV3 LIBIPVS_USE_NL WITHOUT_LIBNL WITH_VRRP_AUTH WITH_SO_MARK WITHOUT_LIBIPTC WITHOUT_LIBIPSET WITHOUT_IPV4_DEVCONF WITHOUT_IF_H_LINK_H_COLLISION HAVE_LINUX_NET_IF_H_COLLISION HAVE_SOCK_NONBLOCK HAVE_SOCK_CLOEXEC HAVE_FIB_ROUTING NO_MEM_CHECK NO_MEM_CHECK_LOG

I used keepalived_use_latest_stable: False in the first attempt (as defined in defaults/main.yml). Changing this to True ain't changed the keepalived version as 1.2.23 is the latest one in the PPA.

From my side it's fine. I'll just wait for 1.2.24 to be in the PPA. You don't need to spend time searching here. Seems that is only an issue in my Vagrant box.

james-portman commented 6 years ago

Hi, just to say I have exactly the same issue in Vagrant/Virtualbox with Ubuntu 16.04 Have to manually log into the box and restart keepalived which works fine, it just seems to fail the first time when ansible tries to restart it. Also systemd fails to notice the service didn't start properly:

Dec 21 10:10:19 ubuntu-xenial systemd[1]: Starting LSB: Starts keepalived...
Dec 21 10:10:19 ubuntu-xenial keepalived[4542]:  * Starting keepalived keepalived
Dec 21 10:10:19 ubuntu-xenial keepalived[4542]:    ...fail!
Dec 21 10:10:19 ubuntu-xenial systemd[1]: Started LSB: Starts keepalived.
Dec 21 10:10:20 ubuntu-xenial Keepalived_vrrp[4277]: Stopped
Dec 21 10:10:20 ubuntu-xenial Keepalived[4275]: Stopped Keepalived v1.2.23 (07/26,2016)
Dec 21 10:12:46 ubuntu-xenial systemd[1]: Started LSB: Starts keepalived.