buildkite / agent-stack-k8s

Spin up an autoscaling stack of Buildkite Agents on Kubernetes
MIT License
79 stars 30 forks source link

Skip checkout when there are parse errors #275

Closed triarius closed 7 months ago

triarius commented 7 months ago

When the controller attempts to parse the Kubernetes plugin config and encounters an error, it will run a "failure job" that echos the error message as the command so that it is surfaced in the job logs. However, for private repositories that need secrets for checkout, this will fail because the container is not given access to those secrets.

Rather than give the container access to the secrets, in this PR, we skip the checkout step entirely when there are such errors: it is not needed and can only slow down the job or cause some other error.

The error message container is run with the default agent image, which is public, so it should not need any more resources like imagePullSecrets to run. I guess this means that it won't necessarily work on k8s clusters that are air-gapped, but this image is needed for other containers like the agent start container and the init container, so no other jobs would work on such clusters anyway.

In any case, I don't think transferring things like gitEnvFrom and imagePullSecret are going to work if there was an error in the specification of those fields. We want this failure job to be as bare bones as possible to reduce the chance of errors in the Pipeline YAML from preventing it from running.

Fixes: #233, #273

Note to reviews

I've implemented this as a skipCheckout flag on the Build method. I'm aware that such practices don't align with our coding standards. But, I think the entire Build method needs to refactored, and I don't want to do that yet. There are a few PRs in flight (#262, #248, #258) that modify the Build method, so a significant refactor would introduce too many merge conflicts.

After they are merged, we should refactor this method into smaller, composable pieces so that when there is a parse error, the checkout container is omitted and an error message container is substituted for the job command containers.