authzed / spicedb-operator

Kubernetes controller for managing instances of SpiceDB
Apache License 2.0
62 stars 26 forks source link

Provide a method for attaching sidecars (or patching both the Deployment and Job simultaneously) #249

Open jawnsy opened 11 months ago

jawnsy commented 11 months ago

Summary

Provide some means of attaching a sidecar to both the Deployment and Job.

Background

Google Cloud SQL supports encryption and IAM authentication using the Cloud SQL Proxy service, running as a sidecar container.

The recommended deployment methodology is to use a sidecar container, because the proxy does not support authentication (anyone connecting to the proxy inherits the credentials that the proxy can access, so using a sidecar is the safest way to ensure that only authorized workloads can connect through the proxy).

Workarounds

jawnsy commented 11 months ago

Even if you patch things, the migration job does not quite work correctly, because:

  1. The migration container always expects the database to be accessible, but when the Cloud SQL Auth Proxy is starting up, it will not be ready yet - a solution is for the migration service to retry every few seconds until it succeeds (though Kubernetes will detect the migration container as "crashed" and restart it anyways)
  2. The proxy container will still be running after the migration container exits, so the job will not complete

So, for now, perhaps the best option is to use a username/password for database authentication.

ecordell commented 11 months ago

This isn't an option for most kube clusters in the wild just yet, but https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#api-for-sidecar-containers I think would at least make patching the job work for this. Any chance you're on a cluster that you can enable Alpha features on?

The proxy container will still be running after the migration container exits, so the job will not complete

This is interesting. The https://github.com/GoogleCloudPlatform/cloud-sql-proxy-operator supports injecting into Jobs, but I don't see how anyone can use that feature.

A hacky way could be a timeout on the sql proxy pod? Give the proxy 1 minute to run migrations and then exit 0 so that the migration container controls overall job success (but if your data gets very large you might need to play with that number).

I did find this writeup: https://medium.com/teamsnap-engineering/properly-running-kubernetes-jobs-with-sidecars-ddc04685d0dc which suggests sharing the process namespace between the pods and killing the proxy process when the primary pod completes. That's an option but seems like a lot of work to replace a thing that's already built-in to newer versions of kube.

adamstrawson commented 10 months ago

If it helps at all, we use cloud-sql-proxy sidecars on various migrations, they use a quitquitquit standard (which is becoming more commonly used) as a way of issuing a SIGTERM to the container once the migration finishes.

In our case as an example, the sidecar container for cloud-sql-proxy looks like:

       - name: cloud-sql-proxy
          image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.1.1
          args:
          <snip>
          - "--quitquitquit"

And then on the service (which is setup with Helm)

      automigration:
        enabled: true
        customCommand: [/bin/sh, -c]
        customArgs:
        - migrate sql -e --yes; wget --post-data '{}' -O /dev/null -q http://127.0.0.1:9091/quitquitquit

(Only wget is available in this container, not ideal, but working with what we have avaliable)

ecordell commented 3 weeks ago

@jawnsy @adamstrawson

In kube 1.29+ the sidecar containers feature is enabled by default. Have either of you successfully tried the cloud sql proxy with this?