canonical / microceph

MicroCeph is snap-deployed Ceph with built-in clustering
https://snapcraft.io/microceph
GNU Affero General Public License v3.0
224 stars 35 forks source link

Adding easy certificate rotation for Rados Gateway in combination with LetsEncrypt Certificates #421

Open maximsachs opened 2 months ago

maximsachs commented 2 months ago

Issue report

What version of MicroCeph are you using ?

I first used 19.2.0~git+snap36f71d7700 to try out the rgw ssl features, but it didnt have all I needed, so I switched back to 18.2.4+snapc9f2b08f92. Since I currently need to manually modify the radosgw.conf file anyways I prefer to be on stable version.

Use this section to describe the channel/revision which produces the unexpected behaviour. This information can be fetched from the installed: section of sudo snap info microceph output.

What are the steps to reproduce this issue ?

  1. Using certbot to create SSL Certificates
  2. Configuring rgw to use the certificates with the method from https://github.com/canonical/microceph/pull/355
  3. After 60 days the certificates are renewed by Certbot
  4. RGW doesnt use the new certs.

What happens (observed behaviour) ?

The command to add the certificates to rgw uses base64 instead of files: https://github.com/canonical/microceph/pull/355#issuecomment-2294154200

This passes the certificates as "$(sudo base64 -w0 ~/server.crt)" which breaks the link with the original Letsencrypt certificate files, since it internally copies the files to the snap directory. Which means upon certbot renew the updated certificates are not updated for rgw.

What were you expecting to happen ?

It would be nice to have a microceph update rgw ----ssl-certificate=..... of some sort that can be placed into the certbot deploy hook for using the automated renewal process. Or in the ideal case have microceph handle the certbot setup/renewal as well.

Relevant logs, error output, etc.

The command I generally use for certbot config is:

  certbot certonly \
      --dns-cloudflare \
      --dns-cloudflare-credentials cloudflare.ini \
      --dns-cloudflare-propagation-seconds 90 \
      --non-interactive --agree-tos \
      --email my@email.com \
      -d "your.example.domain" \
      --deploy-hook "certbot_deploy_hook.sh"

If it’s considerably long, please paste to https://gist.github.com/ and insert the link here.

Additional comments.

My current solution is to write a deploy hook that copies the certificates to ssl_certificate=/var/snap/microceph/common/server.crt ssl_private_key=/var/snap/microceph/common/server.key and then restarts the rgw service, but we should not be manually editing snap content, instead it should be going through the microceph api.

abasu0713 commented 1 month ago

+1 for this please. I have a HA Ceph Cluster (built using Microceph) with multiple RGW instances running on it with support for Virtual Bucket Hosting. Currently I am using cert-manager that generates certificates using External Issuer (Cloudflare) for all my services and deployments from a single control plane.. My certs are generated for Envoy at the LoadBalancer level, where I currently terminate SSL for the RGW instances. I would like to change that and use SSL certs for securing strict/full encryption of traffic. I have a separate K8s job (running daily) that syncs my certs to a secure Ceph/S3 bucket. Which I then use from different nodes/machines where non-k8s services run. This is done using a simple systemd service (that is baked into the custom ISO I built) that runs an bash script periodically and syncs the target SSLs on the machines (using S3 object tags) in a desired location and then restarts the RGW service. I am currently in the process of testing this. But I rather prefer this idea where a webhook can be used for refreshing/renewing SSL Certs.