canonical / charm-openstack-service-checks

Collection of Nagios checks and other utilities that can be used to verify the operation of an OpenStack cluster
0 stars 4 forks source link

charm upgrade result in "unable to get local issuer certificate" error #26

Closed sudeephb closed 9 months ago

sudeephb commented 9 months ago

Hello,

Due to charm upgrade from latest/stable to latest/edge channel as a workaround for this bug: https://bugs.launchpad.net/charm-openstack-service-checks/+bug/1995243

Nagios start report new critical issues (for all openstack apis) with this message: SSL_CERT CRITICAL aodh.api..com: Cannot verify certificate: unable to get local issuer certificate

When I ran the check manually I got this result: root@juju-xxxx-lxd-8:/etc/nagios/nrpe.d# /usr/local/lib/nagios/plugins/check_ssl_cert -H glance.api.site.com -p 9292 -u /healthcheck -c 14 -w 30 --ignore-sct SSL_CERT CRITICAL glance.api.site.com: Cannot verify certificate: unable to get local issuer certificate unable to verify the first certificate|days_chain_elem1=xx;xx;xx;;

Even if I remove and re-add the service-check unit, I got the same issue.


Imported from Launchpad using lp2gh.

sudeephb commented 9 months ago

(by aym-frikha) subscribed ~field-high

sudeephb commented 9 months ago

(by marcusboden) Hi,

can you check if the symlink at /etc/ssl/certs/openstack-service-checks.pem exists and points to the right cert, i.e.: root@juju-b7b9b1-14:~# ls -l /etc/ssl/certs/openstack-service-checks.pem lrwxrwxrwx 1 root root 61 Jan 20 15:46 /etc/ssl/certs/openstack-service-checks.pem -> /usr/local/share/ca-certificates/openstack-service-checks.crt

And can you test the following? /usr/local/lib/nagios/plugins/check_ssl_cert -H glance.api.site.com -p 9292 -u /healthcheck -c 14 -w 30 --ignore-sct -r /usr/local/share/ca-certificates/openstack-service-checks.crt

sudeephb commented 9 months ago

(by marcusboden) Oh, one and just to be sure: Did you configure the ssl cert for openstack-service-checks? That does not happen automatically yet (see here: https://bugs.launchpad.net/charm-openstack-service-checks/+bug/1999507)

sudeephb commented 9 months ago

(by aym-frikha) Hi Marcus, Thanks for the suggestions. Yes I checked the symlink and everything works fine. I also tested with the glance endpoint but still have this issue: Cannot verify certificate: unable to get local issuer certificate unable to verify the first certificate|days_chain_elem1=206;30;14;;

Also the trusted_ssl_ca config is there. I also tried to ran this command : juju run-action \ --wait vault/0 get-root-ca --format json \ | jq -r '."unit-vault-0".results.output' \ | base64 -w 0 \ | xargs -I {} juju config openstack-service-checks trusted_ssl_ca={}

But still I have the same issue.

sudeephb commented 9 months ago

(by jpablo-norena) I am hitting the same issue.

$ /usr/local/lib/nagios/plugins/check_ssl_cert -H aodh.uy-south-1.mysite.com -p 8776 -u /healthcheck -c 14 -w 30 --ignore-sct -r /usr/local/share/ca-certificates/openstack-service-checks.crt SSL_CERT CRITICAL aodh.uy-south-1.mysite.com: Connection refused

I checked the trusted_ssl_ca config in the charm and the /usr/local/share/ca-certificates/openstack-service-checks.crt content is the same root CA provided to aodh charm as the ssl_ca option.

However, from the openstack-service-checks unit, the connection to aodh public ip is getting refused on port 8776

$ telnet aodh.uy-south-1.mysite.com 8776 Trying ... telnet: Unable to connect to remote host: Connection refused

sudeephb commented 9 months ago

(by martin-kalcok) Juan I'm sorry but it looks like you are hitting different issue. Your issue seems to be more network-related while in the originally reported case, the network connection works, but certificate validation fails.

I can confirm that this issue occurs when an intermediate CA is used to issue certificates for openstack services. Charm openstack-service-checks can be configured with certificate PEM chain (consisting of root CA and intermediate CA) in the trusted_ssl_ca option. It correctly stores it in /usr/local/share/ca-certificates/openstack-service-checks.pem and links it to /etc/ssl/certs/ folder however the check /usr/local/lib/nagios/plugins/check_ssl_cert is unable validate certificates using this chain file by default.

Workarounds and possible solutions:

I spoke with Aymen and he mentioned that logging into the openstack-service-checks unit and splitting the chain into two standalone CA certs solved the issue for him. However I was not able to get it working using this workaround.

What worked for me was explicitly specifying the CA chain file with -r option. Then the check worked even if the file contained CA chain. Example:

root@juju-d6c438-openstack-25:~# /usr/local/lib/nagios/plugins/check_ssl_cert -H 10.5.1.238 -p 8776 -u /v3 -c 14 -w 30 --ignore-sct SSL_CERT CRITICAL juju-d6c438-openstack-7.project.serverstack: Cannot verify certificate: unable to get local issuer certificate|days_chain_elem1=364;30;14;; days_chain_elem2=3649;30;14;;

root@juju-d6c438-openstack-25:~# /usr/local/lib/nagios/plugins/check_ssl_cert -H 10.5.1.238 -p 8776 -u /v3 -c 14 -w 30 --ignore-sct -r /etc/ssl/certs/openstack-service-checks.pem SSL_CERT OK - 10.5.1.238:8776, https, x509 certificate 'juju-d6c438-openstack-7.project.serverstack' from 'Vault Intermediate Certificate Authority (charm-pki-local)' valid until Feb 3 15:13:22 2024 GMT (expires in 364 days)|days_chain_elem1=364;30;14;; days_chain_elem2=3649;30;14;;

Moving forward I think we need to figure out why is the check unable to process PEM chains implicitly and fix it. Alternatively we can change the way we execute the checks to always include explicit -r /etc/ssl/certs/openstack-service-checks.pem.

sudeephb commented 9 months ago

(by marcusboden) Hi, sorry for not updating earlier, here's my thoughts from before I got sick:

I think the problem lies somewhere with the update-ca-certificates, or rather the openssl rehash called from that script, which will complain if there are multiple certs in one file (i.e. a cert chain).

I also thought about adding the -r flag, but I'd rather have a solid certificate setup system wide, instead of just fixing the one check.

So I my idea would be to update the trusted_ssl_ca config, so that it saves a certificate chain into multiple files.

sudeephb commented 9 months ago

(by raychan96) Aymen, I wonder if do you still have the environment for this bug / do you have steps to reproduce this problem?

For me, I tried the following step to reproduce this problem

juju run-action --wait vault/leader upload-signed-csr \ pem="$(cat ./ca_intermedate.crt | base64)" \ root-ca="$(cat ./root_ca.crt | base64)" \ allowed-domains='openstack.local'

(1): juju run-action --wait vault/0 get-root-ca --format json \ | jq -r '."unit-vault-0".results.output' \ | base64 -w 0 \ | xargs -I {} \ juju config openstack-service-checks trusted_ssl_ca={}

And I couldn't reproduce it.

However, if I replace (1) with juju config openstack-service-checks trusted_ssl_ca=$(cat ./ca_intermedate.crt | base64). I will get the same problem as you mentioned.

sudeephb commented 9 months ago

(by aym-frikha) Hello Chi,

I don't have access to the environment anymore, but you reproduced the environment with trusted_ssl_ca config. Basically we are not using vault for providing certificates, but we are using configs provided by the charms.

Thanks

sudeephb commented 9 months ago

(by raychan96) Hi Aymen, thanks for you inputs. I think using vault or charm configs (e.g. cinder's ssl_cert) will basically follow a similar process, and it's not the main concern here.

The error "unable to get local issuer certificate" is because check_ssl_cert needs to find the all the certificates (root, intermediates, application certificate) to verify the entire certificate chain (check_ssl_cert was introduce recently). If all configurations are correct (from both the charm and o-s-c), then the complain from check_ssl_cert is valid, the certificates have some problems, and an user should try to fix the problem. This is the purpose of this nrpe check.

For example, if you configure cinder ssl_cert="$(cat app.crt | ca_intermediate.crt | base64)", then o-s-c should be configured with trusted_ssl_ca="$(cat root_ca.crt)". That is o-s-c should be able to find the chain: app.crt -> ca_intermediate.crt -> root_ca.crt (root CA trusted by o-s-c). If you miss trusted_ssl_ca="$(cat root_ca.crt)" or miss ca_intermediate.crt, then you will find this error.

However, it does not means you encounter this problem, nor there's no bug in openstack-service-checks. I do spotted a problem during the investigation that might leads to the problem you are seeing, and it's mentioned in comment #7 by Marcus. Basically, if someone initially configure trusted_ssl_ca with a combined certificates, the symbolic links in /etc/ssl/certs/ will be broken / incorrect, and cannot be fixed by re-configuring trusted_ssl_ca + update-ca-certificates. It can only be fixed by trusted_ssl_ca + update-ca-certificates --fresh.