canonical / charm-openstack-service-checks

Collection of Nagios checks and other utilities that can be used to verify the operation of an OpenStack cluster
0 stars 2 forks source link

OSC getting lots of SSL_CERT CRITICAL errors looking for SCT #147

Closed sudeephb closed 6 months ago

sudeephb commented 6 months ago

I have deployed latest/edge (e8a92c1) to work around another issue and now I have tons of SSL errors in Nagios such as:

SSL_CERT CRITICAL aodh.os.internal: Cannot find Signed Certificate Timestamps(SCT)

I'm using vault issued certificates and, as I understand it, there will never be any SCT for certificates issued by vault nor self-signed certificates. I think it makes sense to ignore sct by default.

We can do this by adding --ignore-sct to the _render_https_endpoint_checks method.

Additionally, when the check_ssl_cert check script was added, it changed how the check_http script is generated. Previously, if the endpoint was https, the check_http config was re-written to have the -S but now, instead of overwriting the check_http config, we add a new check_ssl_cert config. This leaves behind the check_http config which will fail because it's trying to make an http connection to an https port. This causes one of the two alerts in Nagios:

HTTP WARNING: HTTP/1.1 400 Bad Request - 628 bytes in 0.206 second response time HTTP CRITICAL - Invalid HTTP response received from host on port 9312: HTTP/1.1 400 Bad Request

I think the code around lines 780 to 820 of lib_openstack_service_checks.py should wrap both the _render_http_endpoint_checks and _render_https_endpoint_checks in check_url.scheme conditional blocks so that we only write one or the other, not both.

The attached patch is an example fix for both these issues.


Imported from Launchpad using lp2gh.

sudeephb commented 6 months ago

(by vern)

sudeephb commented 6 months ago

(by vern) I looked at the two merge requests. They look like good solutions to me. Apologies for double-loading two issues in one bug.

sudeephb commented 6 months ago

(by raychan96) Hi Vern, thanks for suggesting the patches. I wonder what is your testing environment? Is it only a testing environment where the security is not of interest; or it is a production environment?

sudeephb commented 6 months ago

(by vern) Where I've been testing is in deployment of a production environment. While doing the deployment iterations, I'm using a vault-generated root-ca but when the environment is handed over to the customer, they will provide a root-ca for vault.

In both scenarios, the certificates generated by vault will not have Signed Certificate Timestamps so either way, the sct check would fail.

sudeephb commented 6 months ago

(by raychan96) Thanks for the clarification, I think it's safe to disable signed certificate timestamps by default for now (and not opening up another config option just for check_ssl_cert to complicate the config options). But since check_ssl_cert will do more checking than before, it may need more tuning and feedback from field engineer. Please feel free to report this kind of bugs.