coder / coder

Provision remote development environments via Terraform
https://coder.com
GNU Affero General Public License v3.0
7.88k stars 656 forks source link

Horizontally scale provisioners on Kubernetes #8183

Closed ammario closed 7 months ago

ammario commented 1 year ago

From discussing with @coadler, Terraform takes up the vast majority of CPU resources on the average Coder deployment. Provisioner demand is also highly variable, spiking when developers are beginning and ending their work days. For reasons of cost, performance, and reliability, we should have a well-supported path to horizontally scale provisioners. Our largest deployments run Kubernetes, so it's the natural place to start.

I believe this scaling can be simply accomplished with these product tweaks:

  1. Expose build-queue-length as a metric that the Horizontal Pod Autoscaler can target
  2. Add a provisionerd flag to run one build then exit, creating ephemerality and down-scaling
  3. Provide a provisionerd helm chart wrapping this all together
spikecurtis commented 1 year ago

Add a provisionerd flag to run one build then exit, creating ephemerality and down-scaling

This is sounding much closer to "serverless" / Lambda computing than Horizontal Pod Autoscaler.

If we want provisionerd to work with the Horizontal Pod Autoscaler, we should make sure that when it gets SIGTERM it finishes the current job(s) and does not request another. We'll also need to set a long termination grace period, or risk Kubernetes killing the pod in the middle of a build.

We should also define and support readiness checks on provisionerd if we don't already have it.

ammario commented 1 year ago

In a Kubernetes world where operators design their pods to be stateless, I think ephemeral/one-shot provisioners have far more advantages than disadvantages. Chief among them is security. Multiple teams / templates can re-use the same set of provisioners without fear of another group meddling with their data. This deployment mode is also required for us to provide provisioners via SaaS.

spikecurtis commented 1 year ago

Two related points here.

First, I'm concerned that ephemeral/one-shot provisioners will not play nicely with the Horizontal Pod Autoscaler, in particular. I do think they have nice properties overall, but we may need a different K8s control loop to get one-shot provisioners working.

HPA computes a ratio between an observed metric value O and target value T. If the metric is queue length, then really, our target value is 0. But, that blows up the ratio.

So, we'd have to set the target value to, say, 1.

HPA computes the desired replicas, D in terms of the current replicas, C. D = C (O/T). Ok, so say the queue is empty O=0, T=1, and we have a minimum number of replicas set at 1, so C=1. Now 30 builds come in, O=30, so then it computes D=30 and fires up 29 more provisioners. Great!

They start working, and the queue is emptied, so O=0, and suddenly D=0. Uh oh, the HPA is going to start killing our provisioners! So, we need to disable scale-down on the HPA.

Ok, now we have 30 provisioners working on 30 builds, and let's say another 30 builds get queued. So, O=30, C=30, and thus D=900. That's not good! We don't want to start 870 more provisioners.

My high level point is that the HPA is designed to scale long-lived processes that continuously handle load, not one-off jobs.

spikecurtis commented 1 year ago

Second, I 100% agree that security is a great reason to run each provisioner job in its own isolated container. But, we don't need Kubernetes or even Docker to do this. We should be containerizing provisioner jobs any time we run them on Linux, so that even if you run on bare metal or a VM, we extend those security benefits to customers.

deansheather commented 1 year ago

First, I'm concerned that ephemeral/one-shot provisioners will not play nicely with the Horizontal Pod Autoscaler, in particular. I do think they have nice properties overall, but we may need a different K8s control loop to get one-shot provisioners working.

We could provide a K8s deployment that listens for jobs from the DB and then schedules pods on kubernetes to handle each job which wouldn't interact with HPA

deansheather commented 1 year ago

I think the MVP of this is to just provide a setting in Helm to launch n provisioner daemon replicas with x concurrency each in a separate deployment, and have an automated registration process so these Helm-managed provisioners can immediately talk to the coder API. This could be accomplished with a randomly generated shared secret (generated with template crypto utils in Helm) that gets mounted as an env var to both coder and the provisioners.

deansheather commented 1 year ago

I don't think most customers on K8s are going to bother with HPA as evidenced by V1, so while a one-shot build option and scheduler similar to gitlab actions K8s runners would be a great feature, this would be more useful in the short term IMO

ammario commented 1 year ago

Thanks for that detail @spikecurtis.

@deansheather curious to understand more about why v1 users didn't use HPA. Was it burdensome to use, difficult to understand, all of the above? It's clear from our scale-tests that terraform is the vast majority of our CPU usage. It's also clear from our large customers that provisioning will be concentrated in a couple of hours each day, when an engineer is starting and finishing work.

So, I don't see how we achieve good resource utilization without it.

deansheather commented 1 year ago

IDK you'd have to ask the v1 customers. Most of the time customers just made sure they had enough coder replicas and didn't bother with HPA as far as I could tell. Some did (using resource constraints) and their mileage varied a lot.

spikecurtis commented 1 year ago

In v1 we actively discourage customers from autoscaling coderd, because when you scale down, it drops connections. They tend to use a lot of browser based IDEs, and very few of them enable STUN. Several of our larger customers were autoscaling and we told them to stop because they were also getting complaints from users about disconnects.

Provisionerd doesn't handle this kind of traffic, so it can be autoscaled, and I do think it'll be worthwhile to get a good autoscaling solution for large deployments.