bazel-contrib / rules_oci

Bazel rules for building OCI containers
Apache License 2.0
309 stars 159 forks source link

How to push image to multiple repositories? #248

Open prestonvanloon opened 1 year ago

prestonvanloon commented 1 year ago

I am trying to replicate logic from rules_docker where I can have a container_bundle given to a docker_push.

container_bundle(
    name = "image_bundle",
    images = {
        "gcr.io/prysmaticlabs/prysm/beacon-chain:latest": ":image_with_creation_time",
        "index.docker.io/prysmaticlabs/prysm-beacon-chain:latest": ":image_with_creation_time",
    },
    tags = ["manual"],
    visibility = ["//beacon-chain:__pkg__"],
)

docker_push(
    name = "push_images",
    bundle = ":image_bundle",
    tags = ["manual"],
    visibility = ["//beacon-chain:__pkg__"],
)

The end result was that I pushed an image to multiple repositories with a single target.

thesayyn commented 1 year ago
oci_push(
    name = "push",
    image = ":image_with_creation_time",
    repository = "gcr.io/prysmaticlabs/prysm/beacon-chain",
    remote_tags = ["latest"]
)

oci_push(
    name = "push",
    image = ":image_with_creation_time",
    repository = "index.docker.io/prysmaticlabs/prysm-beacon-chain",
    remote_tags = ["latest"]
)

this can be done by simply having two oci_push targets.

prestonvanloon commented 1 year ago

this can be done by simply having two oci_push targets.

I understand that, but it doesn't scale well. In our case, we have 4 repository / tag variations per image. Even with macros to expand to multiple targets, it is not possible push all from one target / bazel command. (https://github.com/bazelbuild/bazel/issues/10855)

I'm looking for feature parity with the functionality of docker_push from rules_docker.

thesayyn commented 1 year ago

Even with macros to expand to multiple targets, it is not possible to push all from one target / bazel command.

this could be done by running all oci_push targets in a sh_binary target.

I'm looking for feature parity with the functionality of docker_push from rules_docker.

this is essentially what container_bundle does as I said above.

Unfortunately, we are -rc so we can not introduce breaking changes.

alexeagle commented 1 year ago

You can use https://github.com/keith/rules_multirun to make a single bazel runnable target.

You could write a macro that emulates container_bundle or even spell it out in a BUILD file:

load("@rules_oci//oci:defs.bzl", "oci_image", "oci_push")
load("@rules_multirun//:defs.bzl", "command", "multirun")

oci_image(
    name = "image",
    os = "linux",
    architecture = "amd64",
)

_REPOS = ["index.docker.io/alexeagle/test1", "ghcr.io/<OWNER>/image"]

[
    oci_push(
        name = "push{}".format(i),
        image = ":image",
        repository = repo,
    )
    for i, repo in enumerate(_REPOS)
]

[
    command(
        name = "cmd{}".format(i),
        command = ":push{}".format(i),
        arguments = ["--tag", "latest"],
    )
    for i in range(len(_REPOS))
]

multirun(
    name = "deliver",
    commands = [
       "cmd{}".format(i)
        for i in range(len(_REPOS))
    ],
    jobs = 0, # Set to 0 to run in parallel, defaults to sequential
)

WDYT?

As a design choice, we want rules_oci to only contain things that aren't already possible by layering with other rulesets, keeping it orthogonal and low-maintenance.

aignas commented 1 year ago

@alexeagle, thanks for the example, having it in the rules_oci docs on how things in rules_docker translate to rules_oci would be useful. I agree with keeping it orthogonal may be the right approach here.

malt3 commented 1 year ago

Sorry for piggy backing on this issue but I just ran into this while trying to upgrade from a pre 1.0 version. The api for oci_push in v1.0 is a regression for my use case. I discussed my use case back here: https://github.com/bazel-contrib/rules_oci/issues/69#issuecomment-1491537586

The original issue regarding stamping / file inputs for oci_push also included the idea that --repository should accept file paths as well: #46

There was also a pull requests open to specifically implement this: https://github.com/bazel-contrib/rules_oci/pull/154

Is there any possibility to change this api (or add another rule that makes the repository a file input again) as it was previously? (There was repotags that was a combination of repository and remote_tags and it took a file as input).

I'd be happy to help out and contribute a patch if this is in line with the maintainers.

EDIT: read here that making repository a label again is actually planned. So I will just wait. Thanks for keeping the use case in mind!

prestonvanloon commented 1 year ago

@alexeagle sorry for the late reply... Thanks for the suggestion. The use of multi-run works OK, but I am not able to use -- --tag latest in the command like I could with oci_push. In your suggestion, it's hard coded to "latest" but it won't always be "latest" in our CI.

Edit: My original example was also hardcoded, but we use environment variable from workspace status which worked in rules_docker but does not work here.

container_bundle(
    name = "image_bundle",
    images = {
        "gcr.io/prysmaticlabs/prysm/beacon-chain:{DOCKER_TAG}": ":image_with_creation_time",
        "index.docker.io/prysmaticlabs/prysm-beacon-chain:{DOCKER_TAG}": ":image_with_creation_time",
    },
    tags = ["manual"],
    visibility = ["//beacon-chain:__pkg__"],
)

docker_push(
    name = "push_images",
    bundle = ":image_bundle",
    tags = ["manual"],
    visibility = ["//beacon-chain:__pkg__"],
)

See: https://github.com/bazelbuild/rules_docker#stamping

wyattanderson commented 1 year ago

WDYT?

As a design choice, we want rules_oci to only contain things that aren't already possible by layering with other rulesets, keeping it orthogonal and low-maintenance.

An example of a "thing that isn't possible by layering other rulesets" might be efficiently pushing multiple images at once. For example, using the MultiWrite API from google/go-containerregistry (which I think crane uses under the hood) to push multiple images in an efficient fashion. We have a build process where we push potentially hundreds of images with new tags but very few (if any) actual layer changes, and it sounds like this would be the most performant way to push all of those images.

It feels like this should eventually be possible when this crane issue is resolved; if there are other feature additions that need to be made to crane to facilitate this, I'd be happy to lend a hand there.

I don't think go-containerregistry adequately handles rate limiting either at the moment, but that's another thing that I think would only be possible with an in-process implementation of parallel push, versus a naive approach of just spawning as many processes as there are images and hoping for the best. We currently run into issues with pushing to AWS ECR because container_push from rules_docker doesn't have any knobs for controlling concurrency.

SanjayVas commented 12 months ago

Building off of https://github.com/bazel-contrib/rules_oci/issues/248#issuecomment-1610078722, there are arguably two separate but related issues here:

  1. Pushing the same image to multiple registries/repositories.
  2. Efficiently bundling image pushes (e.g. pushing multiple images to the same registry).

Perhaps (2) should be split off into another issue, as that's the part that it's difficult to do on top of rules_oci. The fact that a solution for it might also resolve (1) is just a bonus.

blackliner commented 4 months ago

IMHO oci_push should not be a run target, but something that happens during build time. This way, bazel will automatically handle the parallelism. For small scale the run approach is nice, but if you have a monorepo and potentially hundreds of container images that need to be released ...

alexeagle commented 4 months ago

bazel build shouldn't have side-effects, at least following the idiom you only expect it to result in updates to the bazel-out tree. It should be idempotent, but if you talk to a remote registry then building the same thing twice will do two different things. OTOH you can squint and see a Bazel remote cache as a CAS with an Action Cache in front of it, which is a lot like an OCI registry. So perhaps OCI artifacts are just 'cached intermediate artifacts' and populating that cache is a fine side-effect for bazel build to have.

There's some discussion of this on some other thread that I'm having trouble finding right now. TBH the maintainers here just don't have time and effort to make progress on a design for that right now. I'm also not clear whether such an experiment necessarily has to be performed in rules_oci, or if you could prove the concept in a separate derivative ruleset to start with.

blackliner commented 4 months ago

Agree with your reasoning. It is all about reproducability, and in my books this includes oci_images and honestly any pkg_tar that ends up synced to S3.

But with the current (design) limitation of being able to only run a single "run target" per invocation (need to take a crack at rules_multirun at some point), it is really tricky to scale nicely. We are really just at the beginning of migrating to rules_oci, but we have a plethora of container images in our monorepo to migrate, most of them serve as CI containers, others are being released to customers. Having to specify each target individually is somewhat more annoying than just saying bazel build --config=some_release_config //... (the oci_image and s3_sync target would be manual by default, and the config would make them part of the :all build tree)

alexeagle commented 4 months ago

There shouldn't be a need to list them explicitly. A bazel query command can be piped in a shell one-liner

blackliner commented 4 months ago

correct, but I would now need to maintain this additional "job executor" (even if it is just some xargs and parallel). I would rather see bazel handling this from the getgo.

Don't get me wrong please, I know it is doable, I am just questioning if it is the right way/design.

SanjayVas commented 4 months ago

IMO the issue here isn't really about how to push to multiple repositories with a single target, as that's reasonably simple to do with your own wrapper rules or something like rules_multirun. Also I don't think considering changing something fundamental to Bazel like bazel build having side effects outside of Bazel output trees/caches is in scope or even makes sense.

The focus should be on what functionality can either only be provided by implementing this in rules_oci or would be significantly more difficult to do so. I believe that's primarily making it more efficient.

There is additionally an argument for having common functionality be in rules_oci so it can be maintained by those who already have experience writing/maintaining Bazel rules. That is to say there has been a general philosophy thus far that only maintainers of Bazel rules repos should need to know how to write a rule. For better or worse, I believe that's an argument that the rules_oci maintainers have rejected.