buildkite / agent-stack-k8s

Spin up an autoscaling stack of Buildkite Agents on Kubernetes
MIT License
79 stars 30 forks source link

Enhancing PodSpec Customization with JSON Merge Patch Strategy (helm-controller and plugin) #247

Closed 42atomys closed 6 months ago

42atomys commented 8 months ago

Hello everyone :wave:,

I'd like to propose an enhancement to the buildkite/agent-stack-k8s project that would greatly increase the flexibility and customization capabilities for users. The idea is to leverage the JSON Merge Patch strategy (as outlined in RFC 7386, JSON Merge Patch) to allow comprehensive customization of the Kubernetes PodSpec used by the Buildkite agents.

My points:

Enhance Kubernetes plugin PodSpec Customization: Implement the jsonmerge strategy for PodSpec configuration. This would enable any step within the Buildkite pipeline to override or extend the default PodSpec settings, allowing for more granular control over the runtime environment of CI jobs (and by extension allow customization of any fields of checkout/agent/container-x containers)

Controller Helm Chart Configuration Enhancements: Modify the controller's Helm charts to provide an option for configuring the agent and podspec. This change would enable agents to override the PodSpec for all jobs they create, offering greater flexibility in job execution and environment setup.

Example Use Case:

Consider a scenario where a specific step in the Buildkite pipeline requires a specific serviceAccountName that aren't globally applicable, we can only add the podspec on each steps. By adopting the JSON Merge Patch strategy, these customizations can be seamlessly integrated into the job's PodSpec without impacting the configuration of other steps or requiring manual intervention.

Benefits:

Increased flexibility and control over job execution environments. Enhanced ability to meet specific security, networking, or performance requirements for individual steps within a pipeline. Streamlined configuration process, reducing the need for workarounds or manual adjustments.

Thank you for considering this enhancement. I believe it would significantly improve the usability and adaptability of the buildkite/agent-stack-k8s project for complex Kubernetes deployments and I can handle this enhancement due to a future huge usage of this stack on my compagnies :)

Best regards,

triarius commented 8 months ago

Hi @42atomys I think this is a great idea! Some way of adding a default PodSpec would be very useful.

I also think we could do something similar for the checkout container, which might address some of the needs in https://github.com/buildkite/agent-stack-k8s/issues/239 and https://github.com/buildkite/agent-stack-k8s/issues/227.

42atomys commented 8 months ago

Hi @triarius, thanks for the feedback, I currently work on my fork with our infrastructure to make it simple by the usage, after reading issues you have linked, I think my work can resolve one issue or both :)

I will come back with one or multiples PRs on few days after have a robust solution tested in pre-production 🚀

triarius commented 8 months ago

Thanks, that would be greatly appreciated!

42atomys commented 8 months ago

As first test are done in my side this is the proposition of configuration (work on agent config.yaml AND job kubernetes plugin). We use the Strategic Merge Patch from Kubernetes project.

# ....
tags:
  - queue=default
ssh-credentials-secret: buildkite-ssh-github-creds
pod-spec-patch:
  serviceAccountName: buildkite-agent-sa
  automountServiceAccountToken: true
  containers:
    - name: container-0
      env:
      - name: GITHUB_TOKEN
        valueFrom:
          secretKeyRef:
            name: github-secrets
            key: github-token

The following code attach the buildkite-agent-sa to the pod running the job, and add the GITHUB_TOKEN to the container container-0 without replace the already present environment variables defined by the controller 🎉

Fell free to give feedback !

42atomys commented 7 months ago

The first implementation are done in #262 and is used currently, this changes embed the #248 PR due to linked workflow