Icinga / ansible-collection-icinga

Collection to setup and manage components of the Icinga software stack
Apache License 2.0
46 stars 35 forks source link

Improve performance of icinga2_object action plugin #339

Open lucagubler opened 3 weeks ago

cla-bot[bot] commented 3 weeks ago

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Luca Gubler. This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails
cla-bot[bot] commented 3 weeks ago

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Luca Gubler. This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails
Donien commented 3 weeks ago

Thank you very much @lucagubler!

Could you also share an example playbook?
I've just tested your PR with my playbook (which works with current main) but my objects that go to local.d/hosts.conf are never deployed.
In fact, nothing is deployed.

- name: Install Icinga
  become: true
  hosts: ansible-debian12

  vars:
    icinga2_confd: "local.d"
    icinga2_config_directories:
      - "local.d"
    icinga2_purge_features: true
    icinga2_features:
      - name: api 
        ca_host: none
        endpoints:
          - name: NodeName
        zones:
          - name: ZoneName
            endpoints:
              - NodeName
      - name: notification
        state: absent
    icinga2_objects:
      - name: Host1
        type: Host
        file: "local.d/hosts.conf"
        check_command: dummy

  roles:
    - icinga.icinga.repos
    - icinga.icinga.icinga2
lucagubler commented 3 weeks ago

That's strange. The only difference I can see is that you're using the local.d/ directory while I'm using zones.d/

Here are the vars I use to setup Icinga2

icinga2_constants:
  TicketSalt: "{{ vault_ticket_salt }}"
  NodeName: "{{ ansible_fqdn }}"
  ZoneName: "main"
  ScriptDir: /etc/icinga2/scripts
icinga2_confd: false
icinga2_purge_features: true
icinga2_config_directories:
  - zones.d/main/commands
  - zones.d/main/hosts
  - zones.d/main/services
icinga2_features:
  - name: icingadb
    host: 127.0.0.1
  - name: notification
  - name: checker
  - name: api
    ca_host: none
    endpoints:
      - name: NodeName
    zones:
      - name: ZoneName
        endpoints:
          - NodeName
  - name: mainlog
    severity: information

icinga2_objects:
  - name: Host1
    type: Host
    file: "zones.d/main/hosts.conf"
    check_command: dummy
Donien commented 3 weeks ago

I thought I might be going mad. So I decided to use podman to make sure I don't pull any plugins / roles / collections from the wrong place.

If anyone sees where I'm messing up, please let me know!

My complete setup to test looks as follows and does not end with my host being deployed:

podman run --rm -it --name ansible-test debian:12 bash

Anything below is run inside that container.

apt update
apt install -y vim git ansible

ansible-galaxy collection install git+https://github.com/lucagubler/ansible-collection-icinga.git,fix/improve-icinga2-object-action-plugin-performance-338

hosts.ini:

[all]
testhost ansible_host=locahost ansible_connection=local ansible_user=root

play.yml:

- name: Install Icinga
  become: true
  hosts: testhost

  vars:
    icinga2_constants:
      TicketSalt: "123"
      NodeName: "{{ ansible_fqdn }}"
      ZoneName: "main"
      ScriptDir: /etc/icinga2/scripts
    icinga2_confd: false
    icinga2_purge_features: true
    icinga2_config_directories:
      - zones.d/main/commands
      - zones.d/main/hosts
      - zones.d/main/services
    icinga2_features:
      - name: icingadb
        host: 127.0.0.1
      - name: notification
      - name: checker
      - name: api
        ca_host: none
        endpoints:
          - name: NodeName
        zones:
          - name: ZoneName
            endpoints:
              - NodeName
      - name: mainlog
        severity: information

    icinga2_objects:
      - name: Host1
        type: Host
        file: "zones.d/main/hosts.conf"
        check_command: dummy

  roles:
    - icinga.icinga.repos
    - icinga.icinga.icinga2
ansible-playbook -i hosts.ini play.yml 

icinga2 daemon -C --dump-objects
icinga2 object list --type host

I think the interesting part is this:

TASK [icinga.icinga.icinga2 : collect empty config dirs] **************************************************************************************************************************************
changed: [testhost]

TASK [icinga.icinga.icinga2 : remove empty config dirs] ***************************************************************************************************************************************
changed: [testhost] => (item=/var/tmp/icinga/features-available/checker.conf)
changed: [testhost] => (item=/var/tmp/icinga/features-available/notification.conf)
changed: [testhost] => (item=/var/tmp/icinga/features-available/mainlog.conf)
changed: [testhost] => (item=/var/tmp/icinga/features-available/api.conf)
changed: [testhost] => (item=/var/tmp/icinga/features-available/icingadb.conf)
changed: [testhost] => (item=/var/tmp/icinga/zones.d/main/hosts.conf)
changed: [testhost] => (item=/var/tmp/icinga/zones.conf)

...

TASK [icinga.icinga.icinga2 : remove empty config files] **************************************************************************************************************************************
skipping: [testhost] => (item=/var/tmp/icinga/features-available/checker.conf)
skipping: [testhost] => (item=/var/tmp/icinga/features-available/notification.conf)
skipping: [testhost] => (item=/var/tmp/icinga/features-available/mainlog.conf)
skipping: [testhost] => (item=/var/tmp/icinga/features-available/api.conf)
skipping: [testhost] => (item=/var/tmp/icinga/features-available/icingadb.conf)
ok: [testhost] => (item=/var/tmp/icinga/zones.d/main/hosts.conf)
skipping: [testhost] => (item=/var/tmp/icinga/zones.conf)

/var/tmp/icinga/zones.d/main/hosts.conf gets removed since it - according to the task name - is an empty config file. Of course it is already gone becaue of remove empty config dirs here.

lucagubler commented 3 weeks ago

@Donien I think I found the issue. The Ansible task Process Icinga2 objects locally creates all objects on the Ansible control node. So if you go to /var/tmp/icinga2/ on the host where you executed the playbook, you should see all objects...

If you remove the delegate_to it should create all files on the remote host. However, it has to copy all files again and will take much longer (9 minutes instead of 4 minutes in my case).

Do we need to keep the /var/tmp/icinga files on the remote host?

We could do the following:

This way we only have to copy a single file which may increase performance. However, I haven't tested that yet. But I think this will lead to idempotency issues since removing files, copying an archive and unarchiving will always have changes...

Donien commented 3 weeks ago

Well, in my case the ansible controller and the target host were both the same container. So it should have worked there.

I'm just throwing in my current work for reference since it exists.

The problem I see with delegation to the ansible controller is that we could have multiple hosts.
If the task (that creates stuff in /var/tmp/icinga) runs once per host - they can and most likely will have different icinga2_objects - then we would have to create /var/tmp/icinga/{{ inventory_hostname }} on the ansible controller to differentiate the hosts.

And yes, idempotency is a big concern. With my approach I had to resort to sorting every (Icinga) object in my list of objects that go to the same file because sometimes magic would give me a different order than the last 4 runs.

I think bulk writing objects that go to the same file would already greatly increase performance without having to work around quirky ansible module idempotency issues.

lucagubler commented 3 weeks ago

Thanks, I'll have a look at it. I agree, writing a single file should be faster. And also idempotency shouldn't be an issue anymore.

Right now the icinga2_object is an action plugin. As far as I know action plugins are executed on localhost. Couldn't we just convert that to a custom library module? By default they will be executed on the remote host. This way we don't have to copy the files anymore.

Donien commented 3 weeks ago

icinga2_object is a action plugin, yes.
Action plugins are executed on the Ansible Controller.

But: If you run another module using _execute_module, that module runs on the target host.

Example:

file_args = dict()
file_args['state'] = 'directory'
file_args['path'] = path
file_module = self._execute_module(
    module_name='file',
    module_args=file_args,
    task_vars=task_vars,
    tmp=tmp
)

Basically, the calculations and creation of the config is delegated to the Ansible Controller (action plugin) but the deployment of a file is done on the target host (_execute_module).

When we talk about copying we actually copy twice:

So we don't copy files from the Controller. We copy content (lives in RAM) to the target host and then copy files locally on the target host from /var/tmp/icinga/ to /etc/icinga2/ (no file writing on the Controller).


The reason deployment is done in /var/tmp/icinga/ initially is that

I think keeping the logic that builds the configuration in an action module is beneficial since connections to the target host are only made once a write operation happens. So before doing a write operation the code can decide if writing is even necessary.
Oftentimes the Controller has sufficient resources to handle the load anyway.