canonical / charm-openstack-service-checks

Collection of Nagios checks and other utilities that can be used to verify the operation of an OpenStack cluster
0 stars 4 forks source link

Rocky services failing endpoint specific checks #35

Closed sudeephb closed 9 months ago

sudeephb commented 9 months ago

the following service endpoints are returning 400 error on Rocky.

barbican designate gnocchi octavia

We should determine proper endpoint test for these services and add into the refined check URLs.


Imported from Launchpad using lp2gh.

sudeephb commented 9 months ago

(by afreiberger) looks like the added url options for barbican and gnocchi that expect Unauthorized and x-openstack-request-id are causing failure:

/usr/lib/nagios/plugins/check_http -H barbican.myfqdn -p 9311 -u / -e Unauthorized -d x-openstack-request-id HTTP CRITICAL - Invalid HTTP response received from host on port 9311: HTTP/1.1 400 Bad Request

Also, the barbican port is SSL (https specified in the openstack endpoint list) but it's not running check_http with the -S ssl flag.

$ /usr/lib/nagios/plugins/check_http -H barbican.myfqdn -p 9311 -u / -e Unauthorized -d x-openstack-request-id -S HTTP CRITICAL - Invalid HTTP response received from host on port 9311: HTTP/1.1 300 Multiple Choices

So, it seems there may be a couple bugs around these endpoint checks as well as the agent checks after the rewrite for pytest.

sudeephb commented 9 months ago

(by marton-kiss) I experienced the same with SSL on a Bionic/Queens deployment with Vault enabled. The workaround was to enforce re-rendering of nrpe checks with:

$ juju remove-relation keystone:identity-credentials openstack-service-checks:identity-credentials $ juju add-relation keystone:identity-credentials openstack-service-checks:identity-credentials

The initial (wrong) config was looking like this: cat /etc/nagios/nrpe.d/check_gnocchi_internal.cfg ... command[check_gnocchi_internal]=/usr/lib/nagios/plugins/check_http -H gnocchi.XXXXXXX -p 8041 -u / -e Unauthorized -d x-openstack-request-id

Meanwhile the newly rendered one: cat /etc/nagios/nrpe.d/check_gnocchi_internal.cfg ... command[check_gnocchi_internal]=/usr/lib/nagios/plugins/check_http -H gnocchi.XXXXXXX -p 8041 -u / -e Unauthorized -d x-openstack-request-id -S -C 30,14

The newly rendered configuration had the proper SSL parameters like -S and -C and the health check provided an OK status for the service.

sudeephb commented 9 months ago

(by xavpaice) This does appear to be that the checks are deployed expecting http but the service listens https. Interestingly, if we mangle the config-changed hook to re-write the checks entirely, the services are written correctly the second time around. I suspect therefore that this is a function of the endpoints not being registered https first time round, and the charm doesn't re-read the catalog once it's generated the checks the first time.

https://code.launchpad.net/~xavpaice/charm-openstack-service-checks/+git/charm-openstack-service-checks/+merge/367178 addresses this, however there's a couple of other changes which are likely to cause the same result.

sudeephb commented 9 months ago

(by npochet) This behavior is still happening for gnocchi with TLS on Bionic Queens. The work-around is to remove the relation with keystone and re-apply it. Could someone re-open the bug?

sudeephb commented 9 months ago

(by nikolay.vinogradov) I confirm. The problem still exists with Bionic Queens and the latest version of the charm. Workaround helped, thanks!