canonical / charm-openstack-service-checks

Collection of Nagios checks and other utilities that can be used to verify the operation of an OpenStack cluster
0 stars 2 forks source link

[RFE] provide sanity checks for all endpoints #89

Closed sudeephb closed 7 months ago

sudeephb commented 7 months ago

During a recent outage a certificate subject mismatch killed all functionality in the cloud, and yet I had no alerts from nagios about any malfunction since the certificates themselves were valid.

In the specific case, check_nova_services could have warned about lack of connectivity towards the nova api (if not for LP#1829539), and one could imagine preventing this from reoccurring by making the ssl checks more sophisticated[*], but I would like to suggest a more generic approach instead: every service should have a correspondent sanity check, such as 'token issue' for keystone, 'image list' for glance, 'server list' for nova, and so on. I think that ensuring that the API can provide an answer (regardless of the content of the answer) can be very useful and relatively simple to implement.

[*] an excerpt from the check_http plugin:

Please note that this plugin does not check if the presented server certificate matches the hostname of the server, or if the certificate has a valid chain of trust to one of the locally installed CAs.


Imported from Launchpad using lp2gh.

sudeephb commented 7 months ago

(by eric-chen) This issue was pending over years. We will migrate to canonical observability stack (cos). Therefore, we won't implement this feature.