buildkite / agent-stack-k8s

Spin up an autoscaling stack of Buildkite Agents on Kubernetes
MIT License
80 stars 31 forks source link

Overhaul of MaxInFlightLimiter #370

Closed DrJosh9000 closed 2 months ago

DrJosh9000 commented 2 months ago

What

Basically rewrite the whole MaxInFlightLimiter.

Relevant to #302 and might fix it

Why

Despite Create taking a context parameter, under the hood, add could block while waiting for completions (a sync.Cond) to wake the goroutine. This is a consequence of sync.Cond itself not being context-aware. At minimum, after detecting context cancellation, completions.Broadcast() would need to be added (somewhere), and a check for context added to the completions.Wait() loop. But...

The mutex locking was probably overly coarse-grained. In add it somewhat makes sense to guard most of the method, since it needs to wrap completions.Wait(). Everywhere else the mutex was really only guarding the map tracking which jobs were in-flight. Reads on the map happened at the start of methods and writes towards the end of functions. But looking at the purpose of these reads and writes, the informer callbacks (OnAdd, OnDelete) exist to make "the model" more closely match reality when out of sync with it (i.e. track jobs that were started by a previous incarnation of the controller), and once add has decided that the job hasn't already been started, then it effectively has ownership of starting that job.

Finally, despite defining mu to be a sync.RWMutex, nothing (not even the sync.Cond or the InFlight method) used RLock/RUnlock, so it could have been a plain sync.Mutex.

So I decided some rethinking of the limiter mechanisms was in order.

How

wolfeidau commented 2 months ago

@DrJosh9000 would it be better to abstract this a little, rather than interacting with the scheduling channel ect directly in this code?

DrJosh9000 commented 2 months ago

@wolfeidau sounds good. I tweaked the tests a bit and discovered an edge case I had forgotten, that this would have helped with. So we now have tryTakeToken and tryReturnToken.