Closed Kariton closed 2 years ago
TASK [ansible-keepalived : ensure keepalived is enabled] ***********************
fatal: [keepalived-centos7]: FAILED! => {"changed": true, "cmd": ["systemctl", "enable", "keepalived", "--now"], "delta": "0:00:00.605824", "end": "2022-07-12 17:45:22.240112", "msg": "non-zero return code", "rc": 1, "start": "2022-07-12 17:45:21.634288", "stderr": "Created symlink from /etc/systemd/system/multi-user.target.wants/keepalived.service to /usr/lib/systemd/system/keepalived.service.\nJob for keepalived.service failed because the control process exited with error code. See \"systemctl status keepalived.service\" and \"journalctl -xe\" for details.", "stderr_lines": ["Created symlink from /etc/systemd/system/multi-user.target.wants/keepalived.service to /usr/lib/systemd/system/keepalived.service.", "Job for keepalived.service failed because the control process exited with error code. See \"systemctl status keepalived.service\" and \"journalctl -xe\" for details."], "stdout": "", "stdout_lines": []}
The current MAIN branch does indeed NOT produce the same error on my side... even though I thought it was an error on my side and looked like a "docker problem"... my VMs do work as expected...
I will see what I can find.
oh well... old keepalived version FTW:
[root@keepalived-centos7 /]# systemctl status keepalived
● keepalived.service - LVS and VRRP High Availability Monitor
Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/keepalived.service.d
└─override.conf
Active: failed (Result: exit-code) since Tue 2022-07-12 18:09:00 UTC; 22s ago
Process: 1447 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=3)
Jul 12 18:09:00 keepalived-centos7 systemd[1]: Starting LVS and VRRP High Availability Monitor...
Jul 12 18:09:00 keepalived-centos7 Keepalived[1447]: Starting Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2
Jul 12 18:09:00 keepalived-centos7 Keepalived[1447]: Opening file '/etc/keepalived/keepalived.conf'.
Jul 12 18:09:00 keepalived-centos7 Keepalived[1447]: Unable to find config file(s) '/etc/keepalived/scripts/*.conf'.
Jul 12 18:09:00 keepalived-centos7 systemd[1]: keepalived.service: control process exited, code=exited status=3
Jul 12 18:09:00 keepalived-centos7 systemd[1]: Failed to start LVS and VRRP High Availability Monitor.
Jul 12 18:09:00 keepalived-centos7 systemd[1]: Unit keepalived.service entered failed state.
Jul 12 18:09:00 keepalived-centos7 systemd[1]: keepalived.service failed.
[root@keepalived-centos7 /]# ll /etc/keepalived/scripts/
total 0
Thanks for continuing the work on this.
I still have trouble to wrap my head around the reason for the split in different config files.
You mention more granularity/idempotency, but I am not really sure to understand it. Would you mind clarifying? For me, adding variables to do the cleanup is by far the biggest pain point of the split (it means ppl have to read the code instead of just editing their vars)
Sure:
here is a direct playbook example:
inventory
[squiddev]
proxydev01.example.tld
proxydev02.example.tld
[loadbalancers]
loadbalancer01.example.tld
loadbalancer02.example.tld
playbook/squiddev-first.yml
---
- hosts: loadbalancers, proxydev01.example.tld
vars:
activeconn_threshold: 5
pre_tasks:
- name: "exclude {{ groups['squiddev'][0] }} from loadbalancer"
ansible.builtin.include_role:
name: ansible-keepalived
apply:
tags:
- keepalived-config
when: "'loadbalancers' in group_names"
vars:
keepalived_virtual_server_groups:
- name: proxy
vips:
- ip: '172.16.10.30'
port: 3128
delay_loop: 5
protocol: TCP
lvs_sched: wrr
lvs_method: DR
persistence_timeout: 120
real_servers:
- ip: '172.28.20.31'
port: 3128
weight: 0
tcp_checks:
- connect_port: 3128
connect_timeout: 1
retry: 2
delay_before_retry: 2
- ip: '172.28.20.32'
port: 3128
weight: 1
tcp_checks:
- connect_port: 3128
connect_timeout: 1
retry: 2
delay_before_retry: 2
- name: Force all notified handlers to run at this point, not waiting for normal sync points
ansible.builtin.meta: flush_handlers
- name: Verify weight is set to zero
ansible.builtin.shell:
cmd: "ipvsadm -L | grep {{ groups['squiddev'][0] }} | awk '{ print $4 }'"
register: keepalived_weight
run_once: true
changed_when: false
failed_when: keepalived_weight.stdout | int != 0
delegate_to: "{{ groups['loadbalancers'][0] }}"
- name: Wait until 'ActiveConn' are below threshold
ansible.builtin.shell:
cmd: "ipvsadm -L | grep {{ groups['squiddev'][0] }} | awk '{ print $5 }'"
register: keepalived_activeconn
until: keepalived_activeconn.stdout | int <= activeconn_threshold
retries: 300
delay: 5
run_once: true
changed_when: false
delegate_to: "{{ groups['loadbalancers'][0] }}"
tasks:
- name: "configure squid {{ groups['squiddev'][0] }}"
ansible.builtin.include_role:
name: squid
when: "'squiddev' in group_names"
- name: Force all notified handlers to run at this point, not waiting for normal sync points
ansible.builtin.meta: flush_handlers
post_tasks:
- name: "include {{ groups['squiddev'][0] }} in loadbalancer"
ansible.builtin.include_role:
name: ansible-keepalived
apply:
tags:
- keepalived-config
when: "'loadbalancers' in group_names"
vars:
keepalived_virtual_server_groups:
- name: proxy
vips:
- ip: '172.16.10.30'
port: 3128
delay_loop: 5
protocol: TCP
lvs_sched: wrr
lvs_method: DR
persistence_timeout: 120
real_servers:
- ip: '172.28.20.31'
port: 3128
weight: 1
tcp_checks:
- connect_port: 3128
connect_timeout: 1
retry: 2
delay_before_retry: 2
- ip: '172.28.20.32'
port: 3128
weight: 1
tcp_checks:
- connect_port: 3128
connect_timeout: 1
retry: 2
delay_before_retry: 2
This is just the result from my personal lab and the use case i target with this PR. but now this role offers a few different configurations that can be configured in a more "ad-hoc" way.
A series of playbooks will be able to define the desired state. with a lot of flexibility and granularity. if you want just a floating IP this will be overkill.
if you configure a entire IPVS router that might be needed - at least for me it is.
you will also be able to just define everything within the group vars or whatever and just update portions as needed for updates or other kinds of maintenance. (my current example works this way)
For me, adding variables to do the cleanup is by far the biggest pain point of the split (it means ppl have to read the code instead of just editing their vars)
I understand your concerns. This cleanup is not there to delete the configurations in general. Every dict can be deleted on its own. Like here: tests/keepalived_haproxy_combined_edit_example.yml
But if - in the some case eventually - there are leftovers that got not threaded correctly... what will happen? In the worst case keepalived will refuse to start. (hopefully not on the entire keepalived cluster)
You can purge everything and start "fast" over again.
I got inspired for this by the role linux-system-roles/logging
.
This saved me hours of debugging.
We will surely find a way to describe that clearly within the README.MD
and default/main.yml
.
With examples (the squid one in particular) and such.
If you dont mind an other PR for full IPVS configuration capabilitys it will follow soon™ Its mostly sysctl related / kernel parameters.
this would bring the potential of this role even further.
I somehow missed the keepalived instances state handling.
since be37c73c0661685fe0a4688a5710121c9f6faa18 (removal of official RHEL 7 support) my task to thread RHEL 7 in a special manor (d53e31c0781533e20d5d41fb90d2af6eb88436aa) is no longer needed.
solved and not needed anymore as i found another solution which is sufficent: https://github.com/evrardjp/ansible-keepalived/issues/200#issuecomment-1184949648
If you dont mind an other PR for full IPVS configuration capabilitys it will follow soon™ Its mostly sysctl related / kernel parameters.
this would bring the potential of this role even further.
That sounds awesome! If we introduce testing around it, it will also be reliable for you in the long run.
Sure:
here is a direct playbook example:
inventory
[squiddev] proxydev01.example.tld proxydev02.example.tld [loadbalancers] loadbalancer01.example.tld loadbalancer02.example.tld
playbook/squiddev-first.yml
--- - hosts: loadbalancers, proxydev01.example.tld vars: activeconn_threshold: 5 pre_tasks: - name: "exclude {{ groups['squiddev'][0] }} from loadbalancer" ansible.builtin.include_role: name: ansible-keepalived apply: tags: - keepalived-config when: "'loadbalancers' in group_names" vars: keepalived_virtual_server_groups: - name: proxy vips: - ip: '172.16.10.30' port: 3128 delay_loop: 5 protocol: TCP lvs_sched: wrr lvs_method: DR persistence_timeout: 120 real_servers: - ip: '172.28.20.31' port: 3128 weight: 0 tcp_checks: - connect_port: 3128 connect_timeout: 1 retry: 2 delay_before_retry: 2 - ip: '172.28.20.32' port: 3128 weight: 1 tcp_checks: - connect_port: 3128 connect_timeout: 1 retry: 2 delay_before_retry: 2 - name: Force all notified handlers to run at this point, not waiting for normal sync points ansible.builtin.meta: flush_handlers - name: Verify weight is set to zero ansible.builtin.shell: cmd: "ipvsadm -L | grep {{ groups['squiddev'][0] }} | awk '{ print $4 }'" register: keepalived_weight run_once: true changed_when: false failed_when: keepalived_weight.stdout | int != 0 delegate_to: "{{ groups['loadbalancers'][0] }}" - name: Wait until 'ActiveConn' are below threshold ansible.builtin.shell: cmd: "ipvsadm -L | grep {{ groups['squiddev'][0] }} | awk '{ print $5 }'" register: keepalived_activeconn until: keepalived_activeconn.stdout | int <= activeconn_threshold retries: 300 delay: 5 run_once: true changed_when: false delegate_to: "{{ groups['loadbalancers'][0] }}" tasks: - name: "configure squid {{ groups['squiddev'][0] }}" ansible.builtin.include_role: name: squid when: "'squiddev' in group_names" - name: Force all notified handlers to run at this point, not waiting for normal sync points ansible.builtin.meta: flush_handlers post_tasks: - name: "include {{ groups['squiddev'][0] }} in loadbalancer" ansible.builtin.include_role: name: ansible-keepalived apply: tags: - keepalived-config when: "'loadbalancers' in group_names" vars: keepalived_virtual_server_groups: - name: proxy vips: - ip: '172.16.10.30' port: 3128 delay_loop: 5 protocol: TCP lvs_sched: wrr lvs_method: DR persistence_timeout: 120 real_servers: - ip: '172.28.20.31' port: 3128 weight: 1 tcp_checks: - connect_port: 3128 connect_timeout: 1 retry: 2 delay_before_retry: 2 - ip: '172.28.20.32' port: 3128 weight: 1 tcp_checks: - connect_port: 3128 connect_timeout: 1 retry: 2 delay_before_retry: 2
Great example! I am thinking of building a collection for HA, would it make sense that we include these kind of examples in the collection? WDYT? Let's discuss this into the "discussion" on github !
As discussed in https://github.com/evrardjp/ansible-keepalived/issues/200 and drafted in https://github.com/evrardjp/ansible-keepalived/pull/203
I hopefully didnt messed some stuff up.