flux-iac / tofu-controller

A GitOps OpenTofu and Terraform controller for Flux
https://flux-iac.github.io/tofu-controller/
Apache License 2.0
1.22k stars 131 forks source link

tf-runner base images #1364

Open milas opened 1 month ago

milas commented 1 month ago

(Feel free to transfer this to the flux-iac/tf-runner-images repo if you think that's a better location.)

I am happy to help with this work and open a PR, but I want to get maintainer buy-in first

Problem

The default/community-provided runner images are using old versions of the tf-runner binary from a discontinued source.

Context

Currently, the runner image that's used is ghcr.io/flux-iac/tf-runner, built from the https://github.com/flux-iac/tf-runner-images repo.

However, this repo itself is primarily meant for managing the Terraform binaries (see flux-iac/tf-runner-images#4 for an unreviewed PR that was working on adding tofu).

The tf-runner binary for the controller, on the other hand, comes from ./cmd/runner in this repo is currently coming from ghcr.io/weaveworks/tf-runner: https://github.com/flux-iac/tf-runner-images/blob/9e90f3199eb9a0e5098c0d04bc3f4fa4bf7fac51/.github/workflows/release-runner-images.yaml#L55

I can't find any source Git repo for that - I'm guessing it was lost with Weaveworks 😭

Proposal

I think the current approach of having the runner images repo separate makes sense, but the tf-runner-images repo needs a place to get the tf-runner binary (or build it from).

[That's why I opened the issue here: we should make the tf-runner binary artifact available for downstream use in the runner image.]

  1. Add a new target to the Dockerfile in this repo to build the tf-runner binary when building controller images Simplified example (no caching/cross-compilation):

    FROM go AS build-runner
    RUN go build -o /out/tf-runner ./cmd/runner
    
    FROM scratch AS runner-bin
    COPY --link --from=build-runner /out/tf-runner /
  2. Publish a "binary-only" runner OCI image @ ghcr.io/flux-iac/tf-runner-bin
  3. Use it over in tf-runner-images Simplified example:

    ARG TF_CONTROLLER_VERSION=0.123.0
    ARG TOFU_VERSION=1.7.1
    
    FROM ghcr.io/flux-iac/tf-runner-bin:${TF_CONTROLLER_VERSION} AS tf-runner-bin
    FROM ghcr.io/opentofu/opentofu:${TOFU_VERSION} AS upstream-opentofu
    
    FROM alpine:3.20 AS base
    COPY --link --from=tf-runner-bin /tf-runner /usr/local/bin/tf-runner
    ...
    FROM base AS tofu
    COPY --link --from=upstream-opentofu /usr/local/bin/tofu /usr/local/bin/tofu
    # HACK: controller uses hardcoded `terraform` binary name
    RUN ln -s /usr/local/bin/tofu /usr/local/bin/terraform

That will allow the runner images to be loosely coupled to the controller as they are now, to retain the relative ease of releasing runner images with a variety of OpenTofu versions independent of the tofu-controller release cycle.

References

ilithanos commented 2 weeks ago

@milas thanks for creating this issue, and thanks for offering to help, we could use all the help we can get currently, as the two original maintainers are out looking for work, and that leaves some big shoes to fill for the rest of us.

This issue highlight some structural issues that I wasn't currently aware of.

Currently the actual runner images build and released are build based on the docker files and source code within the tofu-controller repository.

I remember talk about moving the runners out of this repo, but that move hasn't been done yet.

So the way it currently finds the binary is from within this repository.

When looking over the release-runners action i did notice some old references to weaveworks that could be creating some errors.