canonical / charm-openstack-service-checks

Collection of Nagios checks and other utilities that can be used to verify the operation of an OpenStack cluster
0 stars 2 forks source link

Automated removal of ignored UUIDs whose objects no longer exist #161

Closed LCVcode closed 1 month ago

LCVcode commented 4 months ago

This is a feature request to automate the removal of UUIDs found after the --ignored flag in rendered NRPE files when the corresponding OpenStack object no longer exists. There are often legitimate reason to ignore various OpenStack objects in these checks, but there is not a good mechanism to remove stale UUIDs from these files, which can result in these files bloating over time.

As an example:

$ cat /etc/nagios/nrpe.d/check_octavia_loadbalancers.cfg 
<truncated>
command[check_octavia_loadbalancers]=/usr/local/lib/nagios/plugins/check_octavia.py --check loadbalancers --ignored 1a0f7f91-72ec-4b12-a48c-513e209f165d,61c4aa2a-a26c-437e-bb10-8957626e3ed8,f1bff879-2299-4c92-b65f-fc7340d5b791

When checking for the existence of that last loadbalancer:

$ openstack loadbalancer show f1bff879-2299-4c92-b65f-fc7340d5b791
Unable to locate f1bff879-2299-4c92-b65f-fc7340d5b791 in loadbalancers

In a case like this, it would make sense for the openstack-service-checks charm to re-render /etc/nagios/nrpe.d/check_octavia_loadbalancers.cfg such that it excludes f1bff879-2299-4c92-b65f-fc7340d5b791.

zxhdaze commented 1 month ago

Close this issue because we are moving checks from openstack-service-checks to openstack-exporter. The new alerting will be based on octavia as a whole being functional and not on individual loadbalancers being up or down.