Checkmk / ansible-collection-checkmk.general

The official Checkmk Ansible collection - brought to you by the Checkmk company.
https://galaxy.ansible.com/checkmk/general
GNU General Public License v3.0
121 stars 57 forks source link

[BUG] Register Agent for TLS failes in the second run when using distributed monitoring #425

Closed hb9hnt closed 1 year ago

hb9hnt commented 1 year ago

Describe the bug In a distributed monitoring setup the registration tasks fails on the second run of the agent role (while it successfully registers the hosts on the first run). Setup: The hosts are put in a folder that has its monitoring set to a remote instance.

What works: When the role is run for the first time the hosts are successfully registered and turn up on WATO. What does not work: The role fails when it is run a second time without any force option set to true. What I would expect to happen: The role should succeed with no changes since the host is already registered from the first run.

Component Name

Component Name: agent role, this task fails: https://github.com/Checkmk/ansible-collection-checkmk.general/blob/73f2475a39a511220b86a5aa1dd6ea0d709d8786/roles/agent/tasks/Linux.yml#L97

Ansible Version

 $ ansible --version
ansible [core 2.14.4]
  config file = /home/*****/.ansible.cfg
  configured module search path = ['/home/****/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3.11/site-packages/ansible
  ansible collection location = /home/*****/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/bin/ansible
  python version = 3.11.3 (main, Apr  5 2023, 00:00:00) [GCC 12.2.1 20221121 (Red Hat 12.2.1-4)] (/usr/bin/python3)
  jinja version = 3.0.3
  libyaml = True

Checkmk Version and Edition

2.2.0p5 (CCE)

Collection Version

$ ansible-galaxy collection list

# /usr/lib/python3.11/site-packages/ansible_collections
Collection                    Version
----------------------------- -------
amazon.aws                    5.4.0  
ansible.netcommon             4.1.0  
ansible.posix                 1.5.1  
ansible.utils                 2.9.0  
ansible.windows               1.13.0 
arista.eos                    6.0.0  
awx.awx                       21.14.0
azure.azcollection            1.15.0 
check_point.mgmt              4.0.0  
chocolatey.chocolatey         1.4.0  
cisco.aci                     2.4.0  
cisco.asa                     4.0.0  
cisco.dnac                    6.6.4  
cisco.intersight              1.0.24 
cisco.ios                     4.4.0  
cisco.iosxr                   4.1.0  
cisco.ise                     2.5.12 
cisco.meraki                  2.15.1                 
cisco.mso                     2.2.1  
cisco.nso                     1.0.3  
cisco.nxos                    4.1.0  
cisco.ucs                     1.8.0  
cloud.common                  2.1.3  
cloudscale_ch.cloud           2.2.4  
community.aws                 5.4.0  
community.azure               2.0.0                 
community.ciscosmb            1.0.5  
community.crypto              2.11.1 
community.digitalocean        1.23.0 
community.dns                 2.5.2                                                                                             
community.docker              3.4.3                                                                                             
community.fortios             1.0.0  
community.general             6.5.0  
community.google              1.0.0     
community.grafana             1.5.4  
community.hashi_vault         4.2.0  
community.hrobot              1.8.0  
community.libvirt             1.2.0  
community.mongodb             1.5.1  
community.mysql               3.6.0  
community.network             5.0.0  
community.okd                 2.3.0  
community.postgresql          2.3.2  
community.proxysql            1.5.1  
community.rabbitmq            1.2.3  
community.routeros            2.8.0  
community.sap                 1.0.0  
community.sap_libs            1.4.1  
community.skydive             1.0.0  
community.sops                1.6.1  
community.vmware              3.5.0  
community.windows             1.12.0 
community.zabbix              1.9.2  
containers.podman             1.10.1 
cyberark.conjur               1.2.0  
cyberark.pas                  1.0.17 
dellemc.enterprise_sonic      2.0.0  
dellemc.openmanage            6.3.0  
dellemc.os10                  1.1.1  
dellemc.os6                   1.0.7  
dellemc.os9                   1.0.4  
dellemc.powerflex             1.5.0  
dellemc.unity                 1.5.0  
f5networks.f5_modules         1.23.0 
fortinet.fortimanager         2.1.7  
fortinet.fortios              2.2.3  
frr.frr                       2.0.0  
gluster.gluster               1.0.2  
gluster.gluster               1.0.2  
google.cloud                  1.1.3  
grafana.grafana               1.1.1  
hetzner.hcloud                1.10.0 
hpe.nimble                    1.1.4  
ibm.qradar                    2.1.0  
ibm.spectrum_virtualize       1.11.0 
infinidat.infinibox           1.3.12 
infoblox.nios_modules         1.4.1  
inspur.ispim                  1.3.0  
inspur.sm                     2.3.0  
junipernetworks.junos         4.1.0  
kubernetes.core               2.4.0  
lowlydba.sqlserver            1.3.1  
mellanox.onyx                 1.0.0  
netapp.aws                    21.7.0 
netapp.azure                  21.10.0
netapp.cloudmanager           21.22.0
netapp.elementsw              21.7.0 
netapp.ontap                  22.4.1 
netapp.storagegrid            21.11.1
netapp.um_info                21.8.0 
netapp_eseries.santricity     1.4.0  
netbox.netbox                 3.11.0 
ngine_io.cloudstack           2.3.0  
ngine_io.exoscale             1.0.0  
ngine_io.vultr                1.1.3  
openstack.cloud               1.10.0 
openvswitch.openvswitch       2.1.0  
ovirt.ovirt                   2.4.1  
purestorage.flasharray        1.17.2 
purestorage.flashblade        1.10.0 
purestorage.fusion            1.4.1  
sensu.sensu_go                1.13.2 
splunk.es                     2.1.0  
t_systems_mms.icinga_director 1.32.2 
theforeman.foreman            3.9.0  
vmware.vmware_rest            2.3.1  
vultr.cloud                   1.7.0  
vyos.vyos                     4.0.1  
wti.remote                    1.0.4  

# /home/****/.ansible/collections/ansible_collections
Collection        Version
----------------- -------
ansible.posix     1.5.4  
ansilabnl.micetro 1.0.7  
checkmk.general   2.3.0  
community.general 7.1.0  

# /usr/share/ansible/collections/ansible_collections
Collection        Version
----------------- -------
community.general 6.5.0  

To Reproduce Setup:

Expected behavior

The role should also successfully run the second time but with no change for registration because the host is already registered.

Actual behavior The task fails the second time with the following output (with some names redacted):

fatal: [hostname.example.com]: FAILED! => {"changed": true, "cmd": "cmk-agent-ctl register -H hostname.example.com -s checkmk.example.com -i main_site -U cmkadmin -P ******** --trust-cert\n", "delta": "0:00:00.493518", "end": "2023-08-24 09:18:10.62435
7", "msg": "non-zero return code", "rc": 1, "start": "2023-08-24 09:18:10.130839", "stderr": "ERROR [cmk_agent_ctl] Error registering existing host at https://checkmk.example.com:8000/main_site\n\nCaused by:\n    Request failed with code 405 Method Not Allowed:
 Wrong site - Details: This host is monitored on the site remote_site, but you tried to register it at the site main_site.", "stderr_lines": ["ERROR [cmk_agent_ctl] Error registering existing host at https://checkmk.example.com:8000/main_site", "", "Caused by
:", "    Request failed with code 405 Method Not Allowed: Wrong site - Details: This host is monitored on the site remote_site, but you tried to register it at the site main_site."], "stdout": "", "stdout_lines": []} 

Minimum reproduction example

It's not possible to provide a minimal reproduction example because you need a distributed setup with a folder set up as described above. However, if you have this the following example should reproduce the problem:

- hosts: <some host> 
  become: true
  collections:
    - checkmk.general
  vars:
    checkmk_agent_server: "https://<hostname-of-main-site>" 
    checkmk_agent_site: "main_site"
    checkmk_agent_folder: <folder with setting to remote instance>
    checkmk_agent_user: "****"
    checkmk_agent_pass: "****"
    checkmk_agent_update: 'true'
    checkmk_agent_tls: 'true'
  roles:
    -  agent
robin-checkmk commented 1 year ago

Hi @hb9hnt and thanks for reporting. I actually realized, that this is a regression and the fix is already in #382. It will most likely be released with the next release.

If you want to do a hotfix, you can look at this commit: ef861c2bdb5c46dab4e2970e1579781ff6452a62

hb9hnt commented 1 year ago

Hey @robin-checkmk

Sorry for my late reaction, I was on holiday. I will test the new release this week. Thanks for the fix :)

robin-checkmk commented 1 year ago

No worries @hb9hnt most of the people around here do this in their free time, so there is no rush at all. :) Could you verify the fix?

hb9hnt commented 1 year ago

Hey @robin-checkmk - we still ran into those problems. The main issue was that the variable checkmk-agent-server is used for the server that runs the instance where the server should be registered and for API calls to the central instance (if the config is managed centrally). IMHO there should be two distinct variable for this.

I discussed the details with @lgetwan

robin-checkmk commented 1 year ago

There was a regression I fixed in #416. With that fix, registration on remote sites works again. It is available with the current release 3.2.0.