buildkite / agent

The Buildkite Agent is an open-source toolkit written in Go for securely running build jobs on any device or network
https://buildkite.com/
MIT License
804 stars 295 forks source link

Support running steps as Kubernetes Jobs #420

Closed mikekap closed 3 years ago

mikekap commented 7 years ago

It would be very convenient to run steps as kubernetes Job. This would enable using autoscaling (machines) based on kubernetes resource utilization, as well as sizing steps' resources for maximal cluster utilization. As it is now, you can size the agent itself, but you have to give it enough resources to run your maximally large step.

keithpitt commented 7 years ago

Interesting... I've not used a lot of kubernetes myself, but I've been meaning to have a play with it! Have you had any thoughts on how you'd see this working? I'd love to throw around some ideas here and see what we come up with :)

mikekap commented 7 years ago

The approach I had in mind would be something like a two process model:

Getting this to work isn't terribly hard. The buildkite agent just needs a mode where it runs a specific job id and exits. The master is a bit more work, but should be pretty easy to do since it just listens for incoming jobs and schedules them - it doesn't even have to keep track of completion status, since Kube does that.

If those two are set up, just running something like https://github.com/openai/kubernetes-ec2-autoscaler will auto-scale instances as jobs come in, which would be pretty awesome :)

lox commented 6 years ago

The buildkite agent just needs a mode where it runs a specific job id and exits.

We implemented --disconnect-after-job a while back which will disconnect after running a single job, which gets some of the way there.

We're hoping to get some time to focus on a Kubernetes stack, which will include some of this stuff. Thanks for your patience!

regner commented 5 years ago

Any update for progress on this? is it on the roadmap at all?

lox commented 5 years ago

No progress yet I'm afraid, have you seen https://github.com/webflow/kubekite?

nullren commented 5 years ago

It would be really nice if there were an option to combine with --disconnect-after-job providing a way to target a specific job ID. This way when launching a buildkite-agent as a job in kubernetes, you would know specifically what job it is running.

@lox is there an API that can be called from the agent that can either coerce ping to target a specific job? Mostly getting the idea by looking at both of these which seems like a good entry point for that. https://github.com/buildkite/agent/blob/ce69197180f2ff5b17a64259f46731d19a98c9ca/agent/agent_worker.go#L49 https://github.com/buildkite/agent/blob/ce69197180f2ff5b17a64259f46731d19a98c9ca/agent/agent_worker.go#L93

Thoughts?

Globegitter commented 5 years ago

@Iox what is the priority of this, while I did get kubekite to work I did have to do some manual work on it and it does feel like it should be supported officially.

prestonvanloon commented 4 years ago

Any update on this?

yob commented 4 years ago

It would be really nice if there were an option to combine with --disconnect-after-job providing a way to target a specific job ID. This way when launching a buildkite-agent as a job in kubernetes, you would know specifically what job it is running.

Version 3.17.0 of the agent (released December 2019 added an --acquire-job flag that does exactly this.

$ buildkite-agent start --help | grep acquire
   --acquire-job value                    Start this agent and only run the specified job, disconnecting after it's finished [$BUILDKITE_AGENT_ACQUIRE_JOB]

We're not using it in any official buildkite tools yet. However anecdotally we know of a few folks using it to create one-shot agents on fargate and kubernetes. Generally there's an operator running somewhere that detects new jobs and creates a new agent pod to run the job then exit.

If you're interested in developing the ideas further, there's some occasional discussion in the #kubernetes channel on the community slack. We'd love to build and release something official for Kubernetes, but it's not on our near-term roadmap for now (too much to do, not enough time).

keithduncan commented 3 years ago

I think I’m going to close this with the addition of --acquire-job, the agent itself supports this nicely and anything else would be agent orchestration layered on top like the Elastic CI Stack for AWS, buildkite/helm etc 😄 :tada: