evrardjp / ansible-keepalived

Keepalived role for ansible deployment
Apache License 2.0
98 stars 98 forks source link

Add more execution granularity / idempotency #200

Closed Kariton closed 2 years ago

Kariton commented 2 years ago

Hey folks,

What do you think about splitting the templates in more granular single files? We do a similar thing with our old-school hand-managed keepalived installations.

you already have a nice base to split them IMO.

I think about the following deployed structure:

.
├── keepalive.conf
├── script
│   ├── {{ keepalived_scripts.items() }}.conf # DRAFT - dont know exact filter. One file per item by loop
├── sync_groups
│   ├── {{keepalived_sync_groups.items() }}.conf # DRAFT - dont know exact filter. One file per item by loop
├── instances
│   ├── {{ keepalived_instances.items() }}.conf # DRAFT - dont know exact filter. One file per item by loop
├── groups
│   ├── {{ keepalived_virtual_server_groups.name }}.conf # vips
├── services
│   ├── {{ keepalived_virtual_server_groups.name }}.conf # real_servers

Where the corresponding content is like this:

keepalive.conf:
- keepalived_global_defs
include /etc/keepalived/script/*.conf
include /etc/keepalived/sync_groups/*.conf
include /etc/keepalived/instances/*.conf
include /etc/keepalived/groups/*.conf
include /etc/keepalived/services/*.conf

script/*.conf
- keepalived_scripts

sync_groups/*.conf
- keepalived_sync_groups

instances/*.conf
- keepalived_instances

groups/*.conf
- keepalived_virtual_server_groups # EXCLUDE real_servers

services/*.conf
- keepalived_virtual_server_groups # ONLY real_servers

Why?

The goal is simpler configuration management. This might also include the templates. (even if some tasks get a bit messy)

This will add the ability to manage subsets of the keepalived configuration as needed. Especially the function to edit the keepalived_virtual_server_groups for automated backend maintenance.

Simple Playbook / pre_tasks etc. could disable real_servers. keepalived_virtual_server_groups.name and every other (sub-)configuration has be the desired state as it will replace /etc/keepalived/services/httpd.conf

vars:
  keepalived_virtual_server_groups:
    - name: 'httpd'
      real_servers:
        - ip: '1.1.2.1'
          port: 443
          weight: 0  # <- disabled
          tcp_checks:
            - connect_port: 443
              connect_timeout: 2
              retry: 3
              delay_before_retry: 2
        - ip: '1.1.2.2'
          port: 443
          weight: 1
          tcp_checks:
            - connect_port: 443
              connect_timeout: 2
              retry: 3
              delay_before_retry: 2

Simple Playbook / post_tasks etc. could replace vips. keepalived_virtual_server_groups.name and every other (sub-)configuration has be the desired state as it will replace /etc/keepalived/groups/httpd.conf

vars:
  keepalived_virtual_server_groups:
    - name: 'httpd'
      vips:
#         - ip: '1.1.1.1' # <- old
#           port: 443
         - ip: '1.1.1.2' # <- new
           port: 443

What is needed?

Intensive planning and testing to not break anything.

Many dict's now need a "state" value and following new tasks:

I may missed something totally. This was just written by things I know and remember.

What do you think? Or do you know something I don't? :)

Greetings, Kariton


EDIT1: added service restart VS. reload to What is needed?

evrardjp commented 2 years ago

hello!

I am back from vacation, I will read this as soon as possible.

evrardjp commented 2 years ago

There are a few points to remember, and that can clarify why we do the current way (doesn't mean we shouldn't improve!)

  1. I never heard the story of someone needing to replace a VIP or disable a server using keepalived (I am generally using a LB on top of keepalived, like haproxy, doing what I need)
  2. If you intend to use the edition of a template for maintenance, it means that keeaplived will be restarted . It is by far more intrusive than having something on top of it, especially if people don't run this playbook in serial.

I am not against the feature, but I am not seeing its value yet, outside the fact that you have to deal with smaller files (nice!) but also more ACLs (less nice), and the fact that the run is getting slower due to more tasks (less nice).

Am I missing something obvious? Or do you mean we can changes sub parts of the configuration like these maintenances without restarting keepalived?

Kariton commented 2 years ago

Thanks for the feedback,

We are using keepalived and IPVS as our loadbalancer (plain routing). I think this makes the different... We configure things within keepalived.

replace a VIP

This was just some fix idea as example. But you are right, its not ideal.

The will be restarted is not totally wrong - as the current handler would do that. But it is not needed for keepalived. I was not thinking long enough about this... I just thought "yeah, just a correct notify / handler and done". but WHEN to restart VS. reload.

More tasks will definitely slow the tasks execution down.

Or do you mean we can changes sub parts of the configuration like these maintenances without restarting keepalived?

Based on the changed state for the corresponding task is a reload possible. so for example: keepalive.conf is changed -> restart */*.conf is changed -> reload

I dont know if a restart is for any particular configuration needed. I cant even remember to have restarted a keepalived service at all.

Kariton commented 2 years ago

Here they talk about a similar thing.

https://github.com/voxpupuli/puppet-keepalived/issues/52

Restarting keepalived can be a bit disruptive. For the VRRP part, all IP are removed and added back. For the IPVS part, all rules are removed than added back. When using reload instead, Keepalived tries to be smart and only add/remove the IP that need to be added/removed. The same for IPVS. Therefore, it is better to use reload if possible.

evrardjp commented 2 years ago

Yeah, reload is definitely better, just it was not working with the ansible modules long ago. This is something we might want to try, and improve the quality of the role.

Would you be okay to create a PR, and we iterate together on it? There are many things that deserve a refresh -- I would be happy to start it, but would need your help on making sure it fits your needs...

Maybe you can start by writing the tests?

Kariton commented 2 years ago

Hey, sorry for my late reply.

I will create a fork with a general "kind of rework" and we can see what those changes would look like. But it will take some time.

And about the testing part... I just started with ansible / molecule pipelines and such in my GitLab instance. I'll definitely have a look at those. But this is totally new stuff for me.

evrardjp commented 2 years ago

hey, don't worry, we can work on this together :) Starting with a fork gives us idea on how to improve. I will also work on that on the side.

This way we can come with the greatest ideas and make this role even better!

Kariton commented 2 years ago

Hey ho,

I finally managed to test some stuff and got this: https://github.com/Kariton/ansible-keepalived/pull/1

would love to get some feedback.

Greetings, Kariton

Kariton commented 2 years ago

Hey,

My tox test seems to work - so there is my (draft) PR https://github.com/evrardjp/ansible-keepalived/pull/203

But Im not that sure about the molecule verify part.

evrardjp commented 2 years ago

I propose we discuss this in the relevant PRs. Thank you for your contributions!

Kariton commented 2 years ago

@evrardjp I got a sudden inspiration: by utilizing combine() or community.general.lists_mergeby() i may be able to get around my entire previous thoughts and achieve what i need without ""rewriting"" your awesome role.

but TBO i dont know how to do that in detail.

what i got so far: playbook

- hosts: loadbalancers, proxy01.example.tld
  vars:
    server_groups_name: 'proxy'

  tasks:
   - name: get real_server configuration
     ansible.builtin.debug:
       var: hostvars[groups['loadbalancers'][0]]['keepalived_virtual_server_groups'] | selectattr('name', 'equalto', server_groups_name) | map(attribute='real_servers') | first | selectattr('ip', 'equalto', hostvars[groups['squiddev'][0]]['ansible_default_ipv4']['address'])
     run_once: true

result

ok: [loadbalancer01.example.tld] => {
    "hostvars[groups['loadbalancers'][0]]['keepalived_virtual_server_groups'] | selectattr('name', 'equalto', server_groups_name) | map(attribute='real_servers') | first | selectattr('ip', 'equalto', hostvars[groups['squiddev'][0]]['ansible_default_ipv4']['address'])": [
        {
            "ip": "172.28.20.31",
            "port": 3128,
            "tcp_checks": [
                {
                    "connect_port": 3128,
                    "connect_timeout": 1,
                    "delay_before_retry": 2,
                    "retry": 2
                }
            ],
            "weight": 1
        }
    ]
}

but how can i create a fact to update the weight?

it will fit my goals if i can use my example playbook https://github.com/evrardjp/ansible-keepalived/pull/209#issuecomment-1183672535 and replace the vars: part with something.

would you mind lending me a hand on this matter as im stuck ATM?

Kariton commented 2 years ago

I think i actually got it.

         {% set virtual_server_group_list = [] %}
         {% for virtual_server_group in keepalived_virtual_server_groups %}
         {% set real_server_list = [] %}
         {%   for real_server in virtual_server_group.real_servers %}
         {%     if real_server.ip == hostvars[groups['squiddev'][0]]['ansible_default_ipv4']['address'] %}
         {%       set real_server = real_server | combine({'weight':0}) %}
         {%     endif %}
         {%       set real_server_list = real_server_list.append( real_server ) %}
         {%   endfor %}
         {%       set virtual_server_group = virtual_server_group | combine({'real_servers':real_server_list}) %}
         {%       set virtual_server_group_list = virtual_server_group_list.append( virtual_server_group ) %}
         {% endfor %}
         {{ virtual_server_group_list }}
Kariton commented 2 years ago

will create a PR for documentation purposes - this is no longer needed.

evrardjp commented 2 years ago

I think i actually got it.

         {% set virtual_server_group_list = [] %}
         {% for virtual_server_group in keepalived_virtual_server_groups %}
         {% set real_server_list = [] %}
         {%   for real_server in virtual_server_group.real_servers %}
         {%     if real_server.ip == hostvars[groups['squiddev'][0]]['ansible_default_ipv4']['address'] %}
         {%       set real_server = real_server | combine({'weight':0}) %}
         {%     endif %}
         {%       set real_server_list = real_server_list.append( real_server ) %}
         {%   endfor %}
         {%       set virtual_server_group = virtual_server_group | combine({'real_servers':real_server_list}) %}
         {%       set virtual_server_group_list = virtual_server_group_list.append( virtual_server_group ) %}
         {% endfor %}
         {{ virtual_server_group_list }}

Wow you were fast, I didn't get the chance to help you yet!

Congrats! I approve the idea of documenting it :)