buildkite / agent-stack-k8s

Spin up an autoscaling stack of Buildkite Agents on Kubernetes
MIT License
77 stars 30 forks source link

Make it possible to run init/systemd inside the build container #283

Open aressem opened 5 months ago

aressem commented 5 months ago

I would like to start the build container with a working systemd inside. This means having /sbin/init as PID 1.

Currently there is no good way to do this as the container command is hardcoded: https://github.com/buildkite/agent-stack-k8s/blob/2bf25843d9aaa1cbf2b0b0bb6b742cb1187b5e1c/internal/controller/scheduler/scheduler.go#L248

It would be nice to be able to override this command to be able to start /sbin/init and then the buildkite-agent with its arguments.

We could of course hack this into a container image derived from the official one, but would rather not.

Any help making this work would be highly appreciated.

moskyb commented 5 months ago

g'day @aressem! currently, we hardcode the the container command for the exec container, as we need to be able to explicitly control how the agent is started in that container, as there's a bit of a dance involved in booting k8s agents. we don't anticipate changing this very soon, but we might look into it in the future.

i'd like to know more about your use case of systemd inside a k8s container, however. what are the problems you're trying to solve?

aressem commented 5 months ago

My use case is that I would like to start a set of services with systemd (e.g. dockerd) and also test systemd units as part of our builds. To successfully start systemd the /sbin/init command must have PID 1.

Currently the buildkite-agent grabs the PID 1 as it is the first command in the container. We could do some nasty stuff like swapping the PIDs after startup if possible, but it would be much cleaner to allow the command of the build container to be a wrapper. This wrapper would then start the buildkite-agent with the provided arguments and then exec itself into PID 1.

Another approach would be to allow an init container to be inserted after the copy-agent init container. This way we could replace the /workspace/build-agent used by the build containers to be the above mentioned wrapper.

Any workarounds or ideas that would help us to have this wrapper in place would be helpful.

aressem commented 5 months ago

We have a workaround for this where we have used Kyverno to add a MutatingAdmissionWebhook that fires up systemd with PID 1 and executed the build-agent with its arguments. This works well.

Just saw that the new feature with the podSpecPatch using strategic merge patches and it might be that this could solve this issue as well?