buildkite / agent-stack-k8s

Spin up an autoscaling stack of Buildkite Agents on Kubernetes
MIT License
79 stars 30 forks source link

Support resource requests/limits on checkout and agent containers #234

Open mbarrien opened 10 months ago

mbarrien commented 10 months ago

agent-stack-k8s allows us to set the resource requests/limits (e.g. cpu and memory) for the controller that launches pods, but does not provide a way to specify those requests/limits for the Buildkite agent pods it launches. Implicitly, you can provide resources for pods launched by the podSpec specified in a step, but not for the pods that the controller launches for checking out the Git repo or for monitoring the pods it launched.

This is especially needed if we are running the commands directly in the agent container (e.g. with no podSpec, just using a regular command step), where our commands could run some more cpu/memory intensive operations (in combination with our own custom agent containers with our own libraries installed). Proper requests/limits will allow us to combine agent-stack-k8s with an autoscaler like Karpenter to dynamically size our Buildkite compute appropriately.

Even better would be supporting different container-level resource limits/requests depending on if the agent is running the command block directly versus just using a podSpec and launching separate containers (where we can provide our own requests/limits). Even better on top of that would be having a concept of default requests/limits for containers specified in a podSpec.

DrJosh9000 commented 10 months ago

Hi @mbarrien, thanks for raising this. We've planned to add something to set these.

Do you have opinions or a suggestion for how the syntax might look? e.g.

steps:
  - plugins:
      - kubernetes:
          podSpec:
            containers:
              - image: blah/alpine
                command: ["echo", "hello!"]
                resources:
                  requests:
                    cpu: 100m
                    memory: 100Mi
          defaultResources:
            requests:
              cpu: 1000m
              memory: 1Gi
mbarrien commented 10 months ago

Ideally, I'd have something that can be done at the time I provision the helm chart, rather than having to weave this default resources all throughout every single Kubernetes workflow step. This feels like something that should be handled by an infra/platform team rather than something our application engineers would have to write themselves (since application engineers currently are in charge of their own workflow definitions for their products, and having them manage memory requirements just adds to the friction). It also reduces the readability of the workflows, having to mentally filter out all these resource requirements strewn about.

Doing it at Helm install time would mirror our current workflow of provisioning those resources from the deprecated buildkite agent chart from https://github.com/buildkite/charts

Similarly, I was about to write an issue asking for gitEnvFrom to be something that can be defaulted at Helm install time, for similar reasons.

DrJosh9000 commented 10 months ago

That's a good point, we'll look into it. If you feel motivated, we'll accept a PR adding a value for default resources to the Helm chart for the controller to use when creating new pods.

mbarrien commented 10 months ago

This also said, maybe having support for BOTH the Helm chart, and the defaultResources (or defaults: {resources: ...}}) would be useful, in case we do want to override the underlying default values provisioned in Helm for specific steps.

Would be open to implementing the PR, but cannot commit to it right now given other work.

IngCr3at1on commented 8 months ago

Hey y'all; I stumbled on this issue while looking for something else and I just wanted to throw a potential solution on here really quick.

Kubernetes already allows for setting defaults for containers and pods inside of a namespace, it could absolutely be added to the helm chart but even without it this can be accomplished super easily

https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/

apiVersion: v1
kind: LimitRange
metadata:
  name: resource-ranges
spec:
  limits:
    - default:
        memory: 1000Mi
        cpu: 1000m
      defaultRequest:
        memory: 600Mi
        cpu: 500m
      max:
        memory: 6000Mi
        cpu: 4000m
      type: Container

kubectl -n <buildkite namespace> apply -f <path to the above yaml>


It might be nice to have a way to differentiate between "agent defaults" and "non-agent defaults" but I wanted to mention this regardless.

42atomys commented 7 months ago

Hi all, the pull request https://github.com/buildkite/agent-stack-k8s/pull/262 will allow you to customize any containers