authzed / spicedb-operator

Kubernetes controller for managing instances of SpiceDB
Apache License 2.0
60 stars 24 forks source link

Support Autoscaling #88

Open AyWa opened 1 year ago

AyWa commented 1 year ago

Hello,

I had an initial looks at the operator, I wonder if there is any way to have autoscaling like https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/ ?

ecordell commented 1 year ago

Right now, there's not a great way to use HPA with the operator. The operator enforces a replica count and writes it every time the config changes, so you will see HPA and the operator fight to change the replicas (not continuously, just when there is a config change to the cluster, but it's still not ideal).

We definitely intend to support autoscaling with the operator, though it may or may not involve the HPA autoscaler. Depending on what we do for https://github.com/authzed/spicedb-operator/issues/82 for example, we may be able to scale up by adding nodes and filling their cache before they start responding to traffic.

Frequent scale up / scale down is probably not ideal for performance since by default we only store 1 copy of a cached item. We could bump the cache spread up, which would require more memory but may make scaling up and down quickly less disruptive (even without cache warming). This could make a lot sense as a way to deploy SpiceDB since it is frequently CPU bound.

If there's interest, we could do something short term, like adding a setting to keep the operator from writing replicas so that other tools (HPA) can take over.

@AyWa I'm going to rename this and keep it as a tracking issue for autoscaling support - thanks for kicking off the discussion!

tarjanik commented 1 year ago

Hi @ecordell What's the status on this? For us it's likely making sense scaling up and down, even with temporarily reduced performance. To test this, it would be nice to try the workaround, until we have something more sophisticated.

ecordell commented 7 months ago

@tarjanik Nothing currently in-flight to support this, but ideas (and PRs!) are welcome.

The brute-force idea to expose this would be:

But there might be some better options - I'd need to double check how HPA is implemented; if it directly uses the scale APIs then this should be possible just by enabling the scale api, with no code changes, and deploying an HPA object and pointing it to the SpiceDBCluster.