Checkmk / ansible-collection-checkmk.general

The official Checkmk Ansible collection - brought to you by the Checkmk company.
https://galaxy.ansible.com/checkmk/general
GNU General Public License v3.0
120 stars 55 forks source link

[BUG] Agent install fails at "Timeout when waiting for 127.0.0.1:6556" #659

Closed JordyEGNL closed 2 days ago

JordyEGNL commented 3 days ago

Verify first that your issue is not already reported here. Where possible also test if the latest release and main branch are affected too. Complete all sections as described!

Describe the bug When trying to install the agent to a Ubuntu Server 24.04 LTS it fails at the task checkmk.general.agent : Linux: Verify Checkmk Agent Port is open.

This is my playbook:

- name: VPS setup
  hosts:
    - virtual
  remote_user: jordy
  become: true
  roles:
    - role: checkmk.general.agent
      tags: checkmk-agent

This is my variable file (with password removed of course). Everything is accessible locally.

checkmk_agent_edition: cre
checkmk_agent_version: "2.3.0p15"
checkmk_agent_server_protocol: https
checkmk_agent_server: checkmk.hoebergen.net
checkmk_agent_server_validate_certs: 'false'
checkmk_agent_server_port: "{% if checkmk_agent_server_protocol == 'https' %}443{% else %}80{% endif %}"

checkmk_agent_tls: 'false'
checkmk_agent_site: cmk
checkmk_agent_user: cmkadmin
checkmk_agent_pass: "<REDACTED>"
checkmk_agent_registration_server: "10.0.10.110:8000"
checkmk_agent_registration_site: "{{ checkmk_agent_site }}"

checkmk_agent_add_host: 'false'
checkmk_agent_delegate_api_calls: localhost
checkmk_agent_host_attributes:
    ipaddress: "{{ checkmk_agent_host_ip | default(omit) }}"
checkmk_agent_host_name: "{{ inventory_hostname }}"
checkmk_agent_host_ip: "{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}"
checkmk_agent_port: 6556
checkmk_agent_auto_activate: 'true'
checkmk_agent_folder: "/"

checkmk_agent_configure_firewall: 'true'
checkmk_agent_mode: pull
checkmk_agent_no_log: 'true'
checkmk_agent_server_ips: ["10.0.10.110"]

Fails with following error

<10.0.10.102> (1, b'\r\n\r\n{"elapsed": 60, "failed": true, "msg": "Timeout when waiting for 127.0.0.1:6556", "invocation": {"module_args": {"port": 6556, "timeout": 60, "host": "127.0.0.1", "connect_timeout": 5, "delay": 0, "active_connection_states": ["ESTABLISHED", "FIN_WAIT1", "FIN_WAIT2", "SYN_RECV", "SYN_SENT", "TIME_WAIT"], "state": "started", "sleep": 1, "path": null, "search_regex": null, "exclude_hosts": null, "msg": null}}}\r\n', b"Warning: Permanently added '10.0.10.102' (ED25519) to the list of known hosts.\r\nConnection to 10.0.10.102 closed.\r\n")
<10.0.10.102> Failed to connect to the host via ssh: Warning: Permanently added '10.0.10.102' (ED25519) to the list of known hosts.
Connection to 10.0.10.102 closed.
<10.0.10.102> ESTABLISH SSH CONNECTION FOR USER: jordy
<10.0.10.102> SSH: EXEC ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="jordy"' -o ConnectTimeout=10 10.0.10.102 '/bin/sh -c '"'"'rm -f -r /home/jordy/.ansible/tmp/ansible-tmp-1726564773.72224-725300-272406614598573/ > /dev/null 2>&1 && sleep 0'"'"''
<10.0.10.103> (1, b'\r\n\r\n{"elapsed": 60, "failed": true, "msg": "Timeout when waiting for 127.0.0.1:6556", "invocation": {"module_args": {"port": 6556, "timeout": 60, "host": "127.0.0.1", "connect_timeout": 5, "delay": 0, "active_connection_states": ["ESTABLISHED", "FIN_WAIT1", "FIN_WAIT2", "SYN_RECV", "SYN_SENT", "TIME_WAIT"], "state": "started", "sleep": 1, "path": null, "search_regex": null, "exclude_hosts": null, "msg": null}}}\r\n', b"Warning: Permanently added '10.0.10.103' (ED25519) to the list of known hosts.\r\nConnection to 10.0.10.103 closed.\r\n")
<10.0.10.103> Failed to connect to the host via ssh: Warning: Permanently added '10.0.10.103' (ED25519) to the list of known hosts.
Connection to 10.0.10.103 closed.
<10.0.10.103> ESTABLISH SSH CONNECTION FOR USER: jordy
<10.0.10.103> SSH: EXEC ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="jordy"' -o ConnectTimeout=10 10.0.10.103 '/bin/sh -c '"'"'rm -f -r /home/jordy/.ansible/tmp/ansible-tmp-1726564773.8182316-725303-182244207396460/ > /dev/null 2>&1 && sleep 0'"'"''
<10.0.10.102> (0, b'', b"Warning: Permanently added '10.0.10.102' (ED25519) to the list of known hosts.\r\n")
fatal: [lab-vps-02]: FAILED! => {
    "changed": false,
    "elapsed": 60,
    "invocation": {
        "module_args": {
            "active_connection_states": [
                "ESTABLISHED",
                "FIN_WAIT1",
                "FIN_WAIT2",
                "SYN_RECV",
                "SYN_SENT",
                "TIME_WAIT"
            ],
            "connect_timeout": 5,
            "delay": 0,
            "exclude_hosts": null,
            "host": "127.0.0.1",
            "msg": null,
            "path": null,
            "port": 6556,
            "search_regex": null,
            "sleep": 1,
            "state": "started",
            "timeout": 60
        }
    },
    "msg": "Timeout when waiting for 127.0.0.1:6556"
}

When using the following command on the host everything works fine. cmk-agent-ctl is successfully installed with Ansible, but does not connect to the remote server.

sudo cmk-agent-ctl register --hostname LAB-VPS-01 \
    --server 10.0.10.110:8000 --site cmk \
    --user agent_registration --password '<REDACTED>' --trust-cert

Component Name Component Name: checkmk.general.agent

Ansible Version

$ ansible --version
ansible [core 2.16.3]
  config file = /home/jordy/ansible/ansible.cfg
  configured module search path = ['/home/jordy/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3/dist-packages/ansible
  ansible collection location = /home/jordy/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/bin/ansible
  python version = 3.12.3 (main, Sep 11 2024, 14:17:37) [GCC 13.2.0] (/usr/bin/python3)
  jinja version = 3.1.2
  libyaml = True

Checkmk Version and Edition

2.3.0p15 (CRE)

Collection Version

$ ansible-galaxy collection list
---------------------------------------- -------
checkmk.general                          5.2.1

# /usr/lib/python3/dist-packages/ansible_collections
Collection                               Version
---------------------------------------- -------
amazon.aws                               7.2.0
ansible.netcommon                        5.3.0
ansible.posix                            1.5.4
ansible.utils                            2.12.0
ansible.windows                          2.2.0
arista.eos                               6.2.2
awx.awx                                  23.6.0
azure.azcollection                       1.19.0
check_point.mgmt                         5.2.2
chocolatey.chocolatey                    1.5.1
cisco.aci                                2.8.0
cisco.asa                                4.0.3
cisco.dnac                               6.10.2
cisco.intersight                         2.0.7
cisco.ios                                5.3.0
cisco.iosxr                              6.1.1
cisco.ise                                2.7.0
cisco.meraki                             2.17.2
cisco.mso                                2.5.0
cisco.nxos                               5.3.0
cisco.ucs                                1.10.0
cloud.common                             2.1.4
cloudscale_ch.cloud                      2.3.1
community.aws                            7.1.0
community.azure                          2.0.0
community.ciscosmb                       1.0.7
community.crypto                         2.17.1
community.digitalocean                   1.26.0
community.dns                            2.8.0
community.docker                         3.7.0
community.general                        8.3.0
community.grafana                        1.7.0
community.hashi_vault                    6.1.0
community.hrobot                         1.9.0
community.library_inventory_filtering_v1 1.0.0
community.libvirt                        1.3.0
community.mongodb                        1.6.3
community.mysql                          3.8.0
community.network                        5.0.2
community.okd                            2.3.0
community.postgresql                     3.3.0
community.proxysql                       1.5.1
community.rabbitmq                       1.2.3
community.routeros                       2.12.0
community.sap                            2.0.0
community.sap_libs                       1.4.2
community.sops                           1.6.7
community.vmware                         4.1.0
community.windows                        2.1.0
community.zabbix                         2.3.1
containers.podman                        1.11.0
cyberark.conjur                          1.2.2
cyberark.pas                             1.0.25
dellemc.enterprise_sonic                 2.4.0
dellemc.openmanage                       8.7.0
dellemc.powerflex                        2.1.0
dellemc.unity                            1.7.1
f5networks.f5_modules                    1.27.1
fortinet.fortimanager                    2.3.1
fortinet.fortios                         2.3.4
frr.frr                                  2.0.2
gluster.gluster                          1.0.2
google.cloud                             1.3.0
grafana.grafana                          2.2.4
hetzner.hcloud                           2.4.1
hpe.nimble                               1.1.4
ibm.qradar                               2.1.0
ibm.spectrum_virtualize                  2.0.0
ibm.storage_virtualize                   2.2.0
infinidat.infinibox                      1.3.12
infoblox.nios_modules                    1.6.1
inspur.ispim                             2.2.0
inspur.sm                                2.3.0
junipernetworks.junos                    5.3.1
kubernetes.core                          2.4.0
lowlydba.sqlserver                       2.2.2
microsoft.ad                             1.4.1
netapp.aws                               21.7.1
netapp.azure                             21.10.1
netapp.cloudmanager                      21.22.1
netapp.elementsw                         21.7.0
netapp.ontap                             22.9.0
netapp.storagegrid                       21.11.1
netapp.um_info                           21.8.1
netapp_eseries.santricity                1.4.0
netbox.netbox                            3.16.0
ngine_io.cloudstack                      2.3.0
ngine_io.exoscale                        1.1.0
openstack.cloud                          2.2.0
openvswitch.openvswitch                  2.1.1
ovirt.ovirt                              3.2.0
purestorage.flasharray                   1.26.0
purestorage.flashblade                   1.15.0
purestorage.fusion                       1.6.0
sensu.sensu_go                           1.14.0
splunk.es                                2.1.2
t_systems_mms.icinga_director            2.0.1
telekom_mms.icinga_director              1.35.0
theforeman.foreman                       3.15.0
vmware.vmware_rest                       2.3.1
vultr.cloud                              1.12.1
vyos.vyos                                4.1.0
wti.remote                               1.0.5

To Reproduce Steps to reproduce the behavior:

  1. Create the playbook and group_vars file for the host
  2. Run ansible-playbook setup_vps.yml -i inventory/default.yml -K -vvv
  3. Fails at checkmk.general.agent : Linux: Verify Checkmk Agent Port is open.

Expected behavior The cmk-agent service should start before the opening of the port is checked

Actual behavior Task does fail because the agent doesn't start

Additional context The hosts are already added to the CMK Dashboard (applied the changes). When manually adding the cmk-agent to the hosts everything is working as expected.

checkmk.hoebergen.net is available from within my network and goes to Traefik that forwards the request to the checkmk docker container on port 5000 10.0.10.110:8000 directly goes to the checkmk server (skips Traefik)

JordyEGNL commented 2 days ago

It seems that the following tasks use the cmk-agent-ctl register ... command

- name: "{{ ansible_system }}: Register Agent for automatic Updates using User Password."
  become: true
  ansible.builtin.shell: |
    cmk-update-agent register -H {{ checkmk_agent_host_name }} \
    -s {{ checkmk_agent_registration_server }} -i {{ checkmk_agent_registration_site }} -p {{ checkmk_agent_registration_server_protocol }} \
    -U {{ checkmk_agent_user }} -P {{ __checkmk_agent_auth }}
  no_log: "{{ checkmk_agent_no_log | bool }}"
  register: __checkmk_agent_update_state
  when: |
    checkmk_agent_edition | lower != "cre"
    and __checkmk_agent_updater_binary.stat.exists | bool
    and checkmk_agent_update | bool
    and (checkmk_agent_pass is defined and checkmk_agent_pass | length)
    and (checkmk_agent_secret is not defined)
    and not ((checkmk_agent_registration_server + '/' + checkmk_agent_registration_site in __checkmk_agent_updater_state.stdout)
    and ('"error": null' in __checkmk_agent_updater_state.stdout) )
  changed_when: "'Successfully registered agent of host' in __checkmk_agent_update_state.stderr"

- name: "{{ ansible_system }}: Register Agent for automatic Updates using Automation Secret."
  become: true
  ansible.builtin.shell: |
    cmk-update-agent register -H {{ checkmk_agent_host_name }} \
    -s {{ checkmk_agent_registration_server }} -i {{ checkmk_agent_registration_site }} -p {{ checkmk_agent_registration_server_protocol }} \
    -U {{ checkmk_agent_user }} -S {{ __checkmk_agent_auth }}
  no_log: "{{ checkmk_agent_no_log | bool }}"
  register: __checkmk_agent_update_state
  when: |
    checkmk_agent_edition | lower != "cre"
    and __checkmk_agent_updater_binary.stat.exists | bool
    and checkmk_agent_update | bool
    and (checkmk_agent_secret is defined and checkmk_agent_secret | length)
    and not ((checkmk_agent_registration_server + '/' + checkmk_agent_registration_site in __checkmk_agent_updater_state.stdout)
    and ('"error": null' in __checkmk_agent_updater_state.stdout) )
  changed_when: "'Successfully registered agent of host' in __checkmk_agent_update_state.stderr"

- name: "{{ ansible_system }}: Register Agent for TLS."
  become: true
  ansible.builtin.shell: |
    cmk-agent-ctl register -H {{ checkmk_agent_host_name }} \
    -s {{ checkmk_agent_registration_server }} -i {{ checkmk_agent_registration_site }} \
    -U {{ checkmk_agent_user }} -P {{ __checkmk_agent_auth }} --trust-cert
  no_log: "{{ checkmk_agent_no_log | bool }}"
  register: __checkmk_agent_tls_state
  when: |
    __checkmk_agent_controller_binary.stat.exists | bool
    and checkmk_agent_tls | bool
    and (__checkmk_agent_auth is defined and __checkmk_agent_auth | length)
    and not checkmk_agent_registration_server + '/' + checkmk_agent_registration_site in __checkmk_agent_registered_connections.stdout
  changed_when: "'Registration complete' in __checkmk_agent_tls_state.stdout"

So for the Raw Edition the following variable needs to be set to true.

checkmk_agent_tls: 'true'

This does also set the --trust-cert option as seen above :)

Everything is now working as expected

(maybe do not run the checkmk.general.agent : Linux: Verify Checkmk Agent Port is open. task when checkmk_agent_tls or checkmk_agent_update is set to false)