canonical / charm-openstack-service-checks

Collection of Nagios checks and other utilities that can be used to verify the operation of an OpenStack cluster
0 stars 2 forks source link

Consoleauth not monitored for all instances down #113

Closed sudeephb closed 7 months ago

sudeephb commented 7 months ago

I believe that the openstack-service-checks charm ignores consoleauth because some environments do not run memcached for active-active consoleauth. Unfortunately, this leaves consoleauth exposed for not being alerted for all-instances-down.

I'm receiving no alerts with the following showing up in 'openstack compute service list|grep consoleauth' output.

| 90 | nova-consoleauth | juju-7cfc1d-18-lxd-5 | internal | enabled | down | 2018-07-18T08:06:25.000000 | | 92 | nova-consoleauth | juju-7cfc1d-16-lxd-5 | internal | enabled | down | None | | 95 | nova-consoleauth | juju-7cfc1d-19-lxd-5 | internal | enabled | down | None |

We should ensure that there's at least one active nova-consoleauth. We may want to find a way to check the juju charm config for nova-cloud-controller for the HA consoleauth setting to determine whether to alert on down consoleauths if they're all supposed to be live as well.


Imported from Launchpad using lp2gh.

sudeephb commented 7 months ago

(by afreiberger) Perhaps having openstack-service-checks smooshed with an n-c-c unit would allow it to query the status of the res_nova_consoleauth corosync resource? Either that, or this check will need to be integrated into nova-cloud-controller charm's nrpe checks.

sudeephb commented 7 months ago

(by afreiberger) Considering this is managed by corosync, the root issue in this instance is that nrpe is not related to the nova-cloud-controller hacluster charm. The corosync resource is the proper place to alert on this.