Open artem-zinnatullin opened 4 days ago
Thanks for raising this @artem-zinnatullin, it's a good point.
Would it be possible to modify the logic around StaleCh <-chan
added in #389 so that:
wdyt @DrJosh9000?
As of right now this issue seems to be last missing bit before we can try to swap https://github.com/EmbarkStudios/k8s-buildkite-plugin to the official buildkite/agent-stack-k8s controller in production CI! 😅
Thanks @artem-zinnatullin, that could probably be made to work. But I would like to dedicate a solid block of time to think about it - if we tackle this, it's likely to land in v0.17.0, since I'm planning on getting v0.16.0 out the door today.
👍 ❤️
Testing a fix for #382 with
controller:0.15.0-14-g68932d3
build I found thatbuildkite/agent-stack-k8s
apparently does not have any (?) logic to delete Pending Jobs/Pods for cancelled jobs/builds!We heavily rely on
Cancel Intermediate Builds
setting in Buildkite (see docs) which cancels in-flight builds on same branch when a new commit is pushed to a PR.Current behavior of the
buildkite/agent-stack-k8s
controller keeps Pending Jobs/Pods in K8S even after after a Buildkite job/build cancelled thus flooding the K8S cluster with resource allocations, then actually starts those jobs and consumes CPU time leading to overspending $$$.Expected behavior:
buildkite/agent-stack-k8s
controller should send Job/Pod "Delete" request to K8S for a cancelled Buildkite Job that is not in Running state on K8S.