bazel-contrib / rules_oci

Bazel rules for building OCI containers
Apache License 2.0
294 stars 156 forks source link

Feature Request: Promote multi_arch example to oci_multi_arch_image rule #228

Open AttilaTheFun opened 1 year ago

AttilaTheFun commented 1 year ago

Hi folks! I feel like building a multi-arch image is a common enough workflow that it merits being a proper rule in the repo instead of just an example: https://github.com/bazel-contrib/rules_oci/blob/main/examples/multi_arch/BUILD.bazel

This one seems pretty trivial to just pull in as a rule, though I understand it would slightly increase the surface area you have to maintain.

thesayyn commented 1 year ago

We have plans to make oci_image_index transition an image for multiple platforms but we avoiding it until the optional toolchains feature becomes stable and widespread. We have concerns over introducing transitions to rules_oci because of how many users suffered when rules_docker introduced them immaturely.

thesayyn commented 1 year ago

for now, the answer for this FR is no, unfortunately. we'll keep the issue open for tracking purposes, however.

AttilaTheFun commented 1 year ago

@thesayyn what would you think about adding something like aspect bazel lib's platform_transition_filegroup a first-party rule? As it stands IMHO it's a bit strange that you need to either write your own rule or use a third-party dependency to cover a fairly common use case (building an image index including an image built for a different platform).

Maybe we could call it "oci_image_transition" and it would have attributes for an image and a platform and it would transition the image to that platform. I actually prefer this approach to having oci_image_index do the transition since it's more explicit what's going on. It would look like:

    # Builds an image with the distroless base from the tar layer.
    oci_image(
        name = "_image",
        base = "@distroless_base",
        entrypoint = ["/binary"],
        tars = [":_layer"],
    )

    # Transitions the image to linux_arm64.
    oci_image_transition(
        name = "_linux_arm64_image",
        oci_image_transition = ":_image",
        target_platform = "@io_bazel_rules_go//go/toolchain:linux_arm64",
    )

    # Transitions the image to linux_amd64.
    oci_image_transition(
        name = "_linux_amd64_image",
        image = ":_image",
        target_platform = "@io_bazel_rules_go//go/toolchain:linux_amd64",
    )

    # Builds an image index from the linux_arm64 and linux_amd64 images.
    oci_image_index(
        name = "_image_index",
        images = [
            ":_linux_arm64_image",
            ":_linux_amd64_image",
        ],
    )

This does still introduce a transition but in a pretty locked-down manner. And once you have a better option it could be deprecated if you want. WDYT?

thesayyn commented 10 months ago

We want to support this but as I said this will create a lot of confusion around why their cc_binary or go_binary with cgo on doesn't work. I'd rather keep this out of rules_oci until bazel's toolchain support improves.

moroten commented 7 months ago

Optional toolchains are available in Bazel 6 and 7 but not in 5.4.1. What is the support range for rules_oci? Where are optional toolchains needed?

By the way, what is it that doesn't work with cc_binary, go_binary and cgo when using transitions? I don't have the background of the rules_docker problems, but maybe that background explains the problems.

Given architecture, I suggest having an oci_image bound to one platform and letting oci_index or oci_multi_arch_index being platform independent. Implementing this as a middleman multi_arch rule introduces an oci_image-like target that is multi-architecture, which sounds strange to me.

moroten commented 7 months ago

@thesayyn @alexeagle, do you have any input on direction for the multi arch issue?

thesayyn commented 7 months ago

Sorry, i was out sick.

It's not that optional toolchains are not supported but the bazel ecosystem hasn't adopted it yet, and adding the transition to the feature matrix here will increase number of help requests we get from rules_oci users.

background on rules_docker https://github.com/bazelbuild/rules_docker/pull/1963 (PR that introduced transitions to rules_docker) https://github.com/bazelbuild/rules_docker/issues/2275 https://github.com/bazelbuild/rules_docker/issues/2052 https://github.com/bazelbuild/rules_docker/issues/2036

Where are optional toolchains needed?

If you are on a mac and trying to use this with rules_go you'll get cgo related toolchain errors or if you have cc_binary somewhere in your dependency graph your build simply starts failing. One could debate that this is an issue with the ruleset failing but unfortunately that's not how majority of users think when their build fails because they used an attribute oci_image_index provided. (https://github.com/bazel-contrib/rules_oci/issues/472 is an example of this even though we have no transitions)

Another thing is that we can't just get away with implementing this in oci_image_index, as oci_image has to do the right thing with base architecture attributes, like making sure base image matches the target configuration down to os/arch/variant.

Currently this is all left to the user, by using select and platform_transition_filegroup which makes the user liable for breaking their builds.

What is the support range for rules_oci

We don't care about Bazel 5, it's a nice to have but "not too bad" if some things are broken.

When we started we left out the transition part on purpose because of whole set of toolchain issues around Bazel.

moroten commented 7 months ago

@thesayyn I hope you are better now and thank you for the background story. After reading a lot, I think I get the picture.

I think the root cause, in the transition code in rules_docker, is that the default behaviour is to pass operating_system=linux and architecture=amd64 and not to build an image for the user specified --platforms. I believe that the image targets should be produced according to the current target platform or that the rules_docker transition should at least have been a no-op unless architecture or operating_system was explicitly specified,

In #531, I suggest to let the default platforms=[] resolve in a no-op transition. This should mean that the user is aware when activating a multi-architecture image index. I also let all single-platform targets to follow the --platforms. My opinion is that oci_image is a single-platform target and oci_image_index is a (potential) multi-platform target.

moroten commented 7 months ago

I find the following example a bit counter intuitive:

js_image_layer(
    name = "layers",
    binary = ":binary",
    platform = ":amd64_linux",
    root = "/app"
)

oci_image(
    name = "image",
    cmd = ["/app/main"],
    entrypoint = ["bash"],
    tars = [
        ":layers"
    ]
)

Building bazel build --platforms=:windows //:js_image_layer will create a layer that is not for Windows, but rather for amd64_linux. The same applies for :image. Maybe an oci_image_index for a single platform should not do a transition and that the transition is only done for a multi-arch index. Either its platforms attribute disallows a single entry list or a single-arch index is one rule and a multi-arch a different rule.

alexeagle commented 7 months ago

I know the --platforms flag is one way that users can "hold" this. In our client repos we never use that, instead we figure that the target architecture of the image is decided by where it is deployed which is fixed in time, and therefore expect the choice is baked into the BUILD file. Users shouldn't have to explicitly pass any flag, just bazel run any deliverable targets and the right artifacts are shipped to a container registry.

How does your suggestion work for that kind of user?

moroten commented 7 months ago

When using bazel run, --platforms need to be the host platform, which is also the default. Without any transition, we will build the image for the host platform as well.

Where do we set platforms and perform the transition, the oci_push/oci_tarball, oci_image_index or oci_image rule?

Should there be a provider which tells that this dependency path has gone through a platform transition and disallow a second one? Then it would be possible to add the platforms, or platform for oci_tarball, attribute on all levels.

Thinking outside the box, what about bazel run --run_under=@rules_oci//:push --platforms=... //:my_image? This would remove the push targets and the executable from oci_tarball. Unfortunately, --run_under targets are not build for the host platform (why?), so maybe some transition hacking is needed to build it for the host platform and fake the output until "fixed" in Bazel.

My conclusion:

alexeagle commented 7 months ago

That's a lot of design space to explore, and to be honest I also think there's a high chance that we make a wrong decision and are then semver-committed to whatever we made. That's why @thesayyn and I have been very conservative about leaving this problem "in userspace" so far. It's already possible (if maybe inconvenient) for users to serve themselves with orthogonal transition rules. This is nice because whatever problems they encounter are their own making. Once we ship a built-in feature, we will get piles of user support requests. Only a very well-crafted feature and accompanying documentation will ensure that most users can get it right. If many users trip over this and get it wrong, then we're either on the hook for a lot of unpaid volunteer support time for them, or accept that "Bazel is too hard" and let them fail.

As it is, we don't have paying customers asking for this so we haven't prioritized it. Of course I would love to have your help to contribute, but we need a lot of confidence that we have correct answers to all the questions you're raising.