Open aressem opened 5 months ago
Hi @aressem, did you discover anything with your tests where the number is set to 0?
@DrJosh9000 , the pipeline works as expected with in-flight
set to 0. I don't know what that number might be now, but I suspect it is steadily increasing :)
Same issue when testing with max-in-flight: 1
on v0.11.0
, at some point controller stops taking new jobs even though there are no jobs/pods running in the namespace besides the controller iteself.
2024-05-21T21:31:57.923Z DEBUG limiter scheduler/limiter.go:79 max-in-flight reached {"in-flight": 1}
We have the
agent-stack-k8s
up and running and works fine for a while. However, it suddenly stops accepting new jobs and the last thing it outputs is (we turned on debug):We currently only have a single pipeline, single cluster and single queue. When this happens there are no jobs or pods named
buildkite-${UUID}
in the k8s cluster. Executingkubectl -n buildkite rollout restart deployment agent-stack-k8s
makes the controller happy again and it starts jobs from the queue.I suspect that there is something that should decrement the
in-flight
number, but fails to do so. We are now running a test where this number is set to 0 to see if that works around the problem.