BCDevOps / developer-experience

This repository is used to track all work for the BCGov Platform Services Team (This includes work for: 1. Platform Experience, 2. Developer Experience 3. Platform Operations/OCP 3)
Apache License 2.0
8 stars 17 forks source link

Explore additional automation around handling of replacement certificates for an Openshift cluster #3551

Open wmhutchison opened 1 year ago

wmhutchison commented 1 year ago

Describe the issue At present we have automation via ansible playbooks that will handle the installation/replacement of the API and wildcard APPS certificates for any given Openshift cluster using ansible vault to store the certs to be installed/replaced.

We do not have automation for the formatting of the raw certificate data to be injected into the ansible vault file for a given cluster, nor do we have automation to QA the certificates stored to confirm the certs are good/valid or not corrupt. This ticket will explore this additional automation to see what can be done about this.

Additional context The certificate data along with supporting playbooks are all stored in the platform-ops repository. The new automation will likely also reside here as well.

How does this benefit the users of our platform? Additional automation will reduce the possibility of human error between receiving signed certificates from IMS to injecting the certificates into the ansible vault files without manual intervention if possible. This reduces the amount of time spent on QA or dealing with introduced human errors, and allows more team involvement for the QA process.

Definition of done TBD

wmhutchison commented 5 months ago

Copy/pasting below the QA process I've developed to date for how SSL certs are currently stored/managed. One of the things to note is that the certificate bundle is separated out, with root/intermediate separated from the signed cert. Will be seeing what native modules can accomplish as much of this as possible versus having a bunch of ansible command line executions of openssl to match. A lot of what I'm seeing with ssl-related ansible modules is meant for dealing with file-stored certificates rather than just focussing on the contents of a variable which is what we're aiming for here.

wmhutchison commented 5 months ago

==Methodology for Doing QA on PR branch regarding replacement certs for a cluster.== Requires installation of yq from https://github.com/mikefarah/yq/releases

==Check over New API certificate==

ansible-vault view inventory/group_vars/klab2/vault.yaml | yq .vault_api_crt > vault_api_crt_combined.txt
ansible-vault view inventory/group_vars/klab2/vault.yaml | yq .vault_api_key > vault_api_key.txt
sed '/-----END CERTIFICATE-----/q' vault_api_crt_combined.txt > vault_api_crt.txt
awk '/-----END CERTIFICATE-----/,0' vault_api_crt_combined.txt | tail -n+2 > vault_api_crt_root.txt
  1. Check modulus, the output of the following two commands need to be the same to ensure we've matched the correct key to signed certificate.

    openssl rsa -noout -modulus -in vault_api_key.txt | openssl md5
    openssl x509 -noout -modulus -in vault_api_crt.txt | openssl md5
  2. Check to make sure we're using the correct signed cert with new expiry date.

    openssl x509 -in vault_api_crt_combined.txt -text -noout | grep -A3 Validity
  3. Run a general sanity check with the signed certifcate against the intermediate/root certs.

    openssl verify -verbose -CAfile vault_api_crt_root.txt vault_api_crt.txt

==Check over New APPS certificate==

ansible-vault view inventory/group_vars/klab2/vault.yaml | yq .vault_apps_crt > vault_apps_crt_combined.txt
ansible-vault view inventory/group_vars/klab2/vault.yaml | yq .vault_apps_key > vault_apps_key.txt
sed '/-----END CERTIFICATE-----/q' vault_apps_crt_combined.txt > vault_apps_crt.txt
awk '/-----END CERTIFICATE-----/,0' vault_apps_crt_combined.txt | tail -n+2 > vault_apps_crt_root.txt
  1. Check modulus, the output of the following two commands need to be the same to ensure we've matched the correct key to signed certificate.

    openssl rsa -noout -modulus -in vault_apps_key.txt | openssl md5
    openssl x509 -noout -modulus -in vault_apps_crt.txt | openssl md5
  2. Check to make sure we're using the correct signed cert with new expiry date.

    openssl x509 -in vault_apps_crt_combined.txt -text -noout | grep -A3 Validity
  3. Run a general sanity check with the signed certifcate against the intermediate/root certs.

    openssl verify -verbose -CAfile vault_apps_crt_root.txt vault_apps_crt.txt
wmhutchison commented 5 months ago

While the current priority is reducing human error in both the populating of a vault file with signed certificate data and doing a QA on it, another item to pursue later on is to include the CSR creation process as well in ansible, since then we can take that sensitive private key and shove it directly into ansible vault and not allow it to reside on the UTIL server needing removal later.

wmhutchison commented 4 months ago

Here are the general categories of things I want this automation to be able to handle. With this approach, we never have to worry about a sensitive key file being left somewhere it shouldn't be, it'll always reside and be referenced in ansible-vault.

Also to "drink the Koolaid" any script/playbook development should be containerized, if only to allow full flexiblity in terms of messing with various ansible modules that'd be otherwise messy to install.

wmhutchison commented 1 month ago

This needs to be broken up into two tickets put under a single EPIC.

  1. Ticket to handle file management for key/certs/intermediates to/from ansible vault.
  2. Ticket to handle QA of the contents inside ansible vault.

For that breakdown, promote 1. for implementation sooner rather than later so that possible mitigation concerns with Entrust can be covered off. It's pretty simple to run manual QA and that is where we delve more into the bowels of various ansible modules.

IanKWatts commented 1 month ago

@wmhutchison FYI, I'm doing testing with cert-manager, too. #4224

wmhutchison commented 1 month ago

@IanKWatts cert-manager will not be a good fit for the cluster cert management, since the same automation which we currently use to renew existing certs also is responsible for installing the same certs into a brand new cluster, at which point not much of anything is on the cluster at that time, including cert-manager.