canonical / charm-openstack-service-checks

Collection of Nagios checks and other utilities that can be used to verify the operation of an OpenStack cluster
0 stars 2 forks source link

Nagios warning message with ceph-radosgw #78

Closed sudeephb closed 7 months ago

sudeephb commented 7 months ago

With the recent 20.05 charm release, the ceph-radosgw register the object service into keystone service catalog as "s3" instead of "swift" [1]. However the openstack-service-checks charm is always checking the /healtcheck in case of s3 endpoint and Nagios is throwing the following constant warning for admin, internal, public endpoints:

XXX-openstack-service-checks-0-s3_admin WARNING 2020-05-29 13:07:32 4d 22h 20m 28s 4/4 HTTP WARNING: HTTP/1.1 404 Not Found - 476 bytes in 0.012 second response time

XXX-openstack-service-checks-0-s3_internal WARNING 2020-05-29 13:03:05 4d 22h 19m 55s 4/4 HTTP WARNING: HTTP/1.1 404 Not Found - 476 bytes in 0.013 second response time

XXX-openstack-service-checks-0-s3_public WARNING 2020-05-29 13:03:39 4d 22h 19m 21s 4/4 HTTP WARNING: HTTP/1.1 404 Not Found - 476 bytes in 0.013 second response time

The rendered nrpe check looks like that:

cat check_s3_admin.cfg

check s3_admin

The following header was added automatically by juju

Modifying it will affect nagios monitoring and alerting

servicegroups: juju

command[check_s3_admin]=/usr/lib/nagios/plugins/check_http -H swift-internal.XXXX -p 443 -u /healthcheck -S

The s3 check endpoint has a "/healthcheck" hardcoded value in src/lib/lib_openstack_service_checks.py: health_check_params = { ... 's3': '/healthcheck', 'swift': self.charm_config.get('swift_check_params', '/'), }

A similar issue had been addressed before for swift endpoints, and the swift endpoint got a swift_check_params charm config option to make the health check endpoint configurable. [2]

I suggest to do the same for s3 endpoints to properly handle the change happened in ceph-radosgw charm.

[1] Add S3 endpoint to service catalog https://github.com/openstack/charm-ceph-radosgw/commit/0667a64be650abb253ac871024a34901fa68befd

[2] Add swift_check_params config https://code.launchpad.net/~xavpaice/charm-openstack-service-checks/+git/charm-openstack-service-checks/+merge/363815


Imported from Launchpad using lp2gh.

sudeephb commented 7 months ago

(by marton-kiss) And this is a very simple patch to fix the issue:

https://paste.ubuntu.com/p/T9qRbVMD82/

sudeephb commented 7 months ago

(by marton-kiss) An improved version: https://pastebin.canonical.com/p/nfxvS4DtN3/

sudeephb commented 7 months ago

(by dparv) Could you create a merge proposal for the requested changes? Thanks!

sudeephb commented 7 months ago

(by marton-kiss) Sure, I already did, you can find it here:

https://code.launchpad.net/~marton-kiss/charm-openstack-service-checks/+git/charm-openstack-service-checks/+merge/384838

I did a test yesterday with the patched charm, and I got an all-green nagios this time. You can find the patched one released here:

https://jaas.ai/u/marton-kiss/openstack-service-checks/0

sudeephb commented 7 months ago

(by pguimaraes) Hi, this is a potential blocker for handovers since Nagios won't be ready. We have a fix in place waiting for merge, I will raise it as field-high. Can anyone review it?

sudeephb commented 7 months ago

(by vultaire) This is available in cs:~llama-charmers-next/openstack-service-checks-7. Note that to avoid potentially breaking existing deployments, the API in question still defaults to "/healthcheck"; you need to set s3_check_params to "/" to support the case in this ticket.

sudeephb commented 7 months ago

(by andre-ruiz) Interestingly enough the default shown in the charmstore for this particular release is:


swift_check_params

(string) URL to use with check_http if there is a Swift endpoint. Default is '/', but it's possible to add extra params, e.g. '/v3 -e Unauthorized -d x-openstack-request-id' or a different url, e.g. '/healthcheck'. Mitaka Swift typically needs '/healthcheck'.

Default: /

Note the default value.

https://jaas.ai/u/llama-charmers-next/openstack-service-checks/7

sudeephb commented 7 months ago

(by andre-ruiz)

A correction to my last comment. swift_check_params actually defaults to "/" since it's inception, what is defaulted to "/healthcheck" is s3_check_params, I confused the two.

sudeephb commented 7 months ago

(by yoshikadokawa) I'm having the same issue, but I only could mitigate this issue by using the charm from llama-charmers-next. I believe the status should be "Fix committed", since it is still not available in llama-charmers.

sudeephb commented 7 months ago

(by gabor.meszaros) I'm facing the same.

Yoshi, fix committed will become fix released once it's available. Committed means it's in the master, under testing, pending release.

sudeephb commented 7 months ago

(by nobuto) I've requested cherry-picks in: https://code.launchpad.net/~nobuto/charm-openstack-service-checks/+git/charm-openstack-service-checks/+merge/387819 Please review and leave some feedback there. Thanks!

sudeephb commented 7 months ago

(by vlgrevtsev) FWIW, this issue is still actual and reproducible with cs:~llama-charmers-next/openstack-service-checks-9.

The workaround is the same: juju config openstack-service-checks s3_check_params='/'

sudeephb commented 7 months ago

(by aieri) MP#384838 is in 20.10, so this is now fix-released