docker / buildx

Docker CLI plugin for extended build capabilities with BuildKit
Apache License 2.0
3.56k stars 481 forks source link

Pushing cache to "registry" cache with multi-node builder only uploads cache from one node #1044

Open potiuk opened 2 years ago

potiuk commented 2 years ago

Originally reported at https://github.com/moby/buildkit/issues/2758

It's been confirmed by @tonistiigi that this is a problem with buildx multi-node builder.

When you are building a multi-platform image with multiple builders (to avoid emulation) and use --cache-to type=registry, the resulting registry cache only contains cache for the platform that that was build last.

I tried to utilize buildkit to build Apache Airflow (https://github.com/apache/airflow) multi-platform images. I am using latest buildkit and latest docker:.

Hosts used for the multi-platform builds

I have two builder hosts:

1) AMD builder (Linux Mint 20.3) with buildx plugin installed github.com/docker/buildx v0.7.1 05846896d149da05f3d6fd1e7770da187b52a247 - docker builder created there

2) ARM Builder (Mac Pro M1 late 2021) with DockerDesktop 4.6.0 (with buildx pre-install installed) - with new Virtualization framework enabled.

Builder configuration

I configured my buildx builds to use both builders. I connected the MacOS builder to the Linux Host via forwarded docker socket and I am running all my multi-platform builds from the Linux Host.

This is the builders I see with docker buildx ls:

airflow_cache       docker-container                     
  airflow_cache0    unix:///var/run/docker.sock running  linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64
  airflow_cache1    tcp://127.0.0.1:2375        running  linux/arm64, linux/amd64, linux/amd64/v2, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6

Build command

I want to build a multi-platform image for both ARM and AMD and I want to do it in a single buildx command. Additionally I want to store cache for both platfiorms in the same image but with :cache tag.

My image is multi-staging, so I want to push cache for all stages (hence mode=max)

The (simpliified) command to run the build is:

docker buildx build --progress=default --pull --platform linux/amd64,linux/arm64 \
    --cache-from=ghcr.io/potiuk/airflow/main/ci/python3.7:cache \
    --cache to=type=registry,ref=ghcr.io/potiuk/airflow/main/ci/python3.7:cache,mode=max  \
    --push 
    -t ghcr.io/potiuk/airflow/main/ci/python3.7:latest --target main . -f Dockerfile.ci

While the ghcr.io/potiuk/airflow/main/ci/python3.7:latest image is perfectly fine (nice, multiplatform image), the ghcr.io/potiuk/airflow/main/ci/python3.7:cache image only contains cache by the "LAST" build image - i.e if the AMD image was faster to build and push cache, the cache from the ARM builder pushed later seems to override the AMD cache stored there. I could not find any way to somehow merge those two caches (especially that I cannot specifiy two different cache destination for each of the platforms). This renders the --cache-to,type=registrry essentially useless for multiplatform builds.

I reverted to "inline" mode and it seems to work, but I would really love to keep the latest cache in a separate tag of the image.

Nick-0314 commented 2 years ago

The cache of arm64 overwrites the cache of amD64, causing only one of the two platforms to be available. Using inline mode was a bit expensive for us to manage local storage, there were frequent disk space runs out, and the cache would disappear after the BuildKit container was restarted

Nick-0314 commented 2 years ago

@tonistiigi Can this problem be circumvented by adding the default schema suffix to cache-to-Registry? For example: "repo/ubuntu:cache-linux-arm64". Is this easy to develop? It is currently possible to define multiple cache-from, but once cache-to can be suffixed, I can cache-from multiple schemas

potiuk commented 2 years ago

This is what I am planning to do - but then such multiplatform image cannot be prepared with single buildx command because you can specify only one --cache-to when you run single multi-platform build even with remote builders

Which renders buildx feature of preparing multi-platform image with remote builders in a single command pretty useless.

potiuk commented 2 years ago

What I actually plan to do is do it in two steps (until it is fixed):

1) Build a single multiplatform image and push it without cache 2) Run separate two steps to AGAIN build and push (only cache) in two separate commands for two platforms separately

This is quite an overhead though the build cacje in builders will be reused so the overhead for running 3 commands instead of one should be bearable.

Rongronggg9 commented 2 years ago

@potiuk There is another workaround that does not require building twice.

Node *: build Docker image on its own and push it to a standalone repo with cache. Main node: concat these images together

docker manifest create USERNAME/REPOSITORY:TAG --amend USERNAME/REPOSITORY-NODE1:TAG --amend USERNAME/REPOSITORY-NODE2:TAG --amend USERNAME/REPOSITORY-NODE*:TAG
docker manifest push USERNAME/REPOSITORY:TAG

Refer to https://github.com/knatnetwork/github-runner/blob/399a888e5c9de2a38854a07570df661d59749284/.github/workflows/build.yml#L116 if you need an actual use case.

I consider it is possible to use only one repo by just using a standalone image tag and cache tag for each node.

I consider docker manifest may also be able to operate registry-cache tags instead of just image tags, so probably there are other workarounds. If you do a try, could you please comment and let me know?

potiuk commented 2 years ago

Yeah. That's what I wanted to avoid to manually manipulate manifests. I prefer to rely on buildx behaviour.

This way I do not have to rely or get the "Nodes" and can nicely use multi-node builder just knowing it's name (and then pushing cache can be done from any node).

Also I think it has some nice properties to separate the caches out in different tag. We have our own "development environment" called breeze which hides the complexity of where (and when) the cache is used from and it makes it easy to decide which cache to use based on platform. And it's makaes it super easy to track and diagnose user issues as they can copy&paste the verbose command they used and it's a bit easier to track the history of that particular cache. So I will stick to that.

The overhead is very little actually, because in both steps I use the same builders (ARM and AMD hardware based) and the first step just builds a single multplatform image with --push , where the two subsequent steps just run single platform cache but they are reusing the local cache already built in the first step.

jobcespedes commented 2 years ago

What I actually plan to do is do it in two steps (until it is fixed):

1. Build a single multiplatform image and push it without cache

2. Run separate two steps to AGAIN build and push (only cache) in two separate commands for two platforms separately

Trying this approach, I found that the generated manifest in step 1, is generated with one of two digests (random). The reason for this, I believe, is because it randomly orders the manifest list. This is an additional issue when trying to design a idempotent pipeline.

Attached is an example with the diff of two manifests it randomly generates for two architectures:

--- /tmp/meta-538b4.json      2022-06-20 22:39:33.302897680 -0600
+++ /tmp/meta-80e8a.json      2022-06-20 22:39:57.467873367 -0600
@@ -3,24 +3,24 @@
   "manifest": {
     "schemaVersion": 2,
     "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
-    "digest": "sha256:538b4667e072b437a5ea1e0cd97c2b35d264fd887ef686879b0a20c777940c02",
+    "digest": "sha256:80e8a68eb9363d64eabdeaceb1226ae8b1794e39dd5f06b700bae9d8b1f356d5",
     "size": 743,
     "manifests": [
       {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
-        "digest": "sha256:cef1b67558700a59f4a0e616d314e05dc8c88074c4c1076fbbfd18cc52e6607b",
+        "digest": "sha256:2bc150cfc0d4b6522738b592205d16130f2f4cde8742cd5434f7c81d8d1b2908",
         "size": 1367,
         "platform": {
-          "architecture": "arm64",
+          "architecture": "amd64",
           "os": "linux"
         }
       },
       {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
-        "digest": "sha256:2bc150cfc0d4b6522738b592205d16130f2f4cde8742cd5434f7c81d8d1b2908",
+        "digest": "sha256:cef1b67558700a59f4a0e616d314e05dc8c88074c4c1076fbbfd18cc52e6607b",
         "size": 1367,
         "platform": {
-          "architecture": "amd64",
+          "architecture": "arm64",
           "os": "linux"
         }
       }
potiuk commented 2 years ago

What I actually ended up I simply run two separate steps to push each cache separately. It turned out that I do not "really" need a combined image for development. The only difficulty is that in our automation scripts we derive the cache name from the platform we run it on (but since we have it all encapsulated in breeze development environment of ours - it was actually pretty easy:

https://github.com/apache/airflow/blob/88363b543f6f963247c332e9d7830bc782ed6e2d/dev/breeze/src/airflow_breeze/params/common_build_params.py#L104

https://github.com/apache/airflow/blob/88363b543f6f963247c332e9d7830bc782ed6e2d/dev/breeze/src/airflow_breeze/params/common_build_params.py#L139

jedevc commented 2 years ago

Currently, buildx has support for merging manifest outputs from the builder results. I think it should be possible to implement similar support for merging cache manifests, it should be very similar to the existing logic.

However, we don't have support for push-by-digest to just push content without a tag for the registry exporter, which would need to be a separate fix first on buildkit.

lorenzogrv commented 1 year ago

Same problem here. We build at CI using a dual remote builders strategy, partial code to exemplify:

  - docker buildx create --name buildx --driver docker-container --use --platform linux/amd64 --bootstrap ssh://$AMD64_HOST
  - docker buildx create --name buildx --append --platform linux/arm64 --bootstrap ssh://$ARM64_HOST
  - docker buildx build
      --push
      --platform linux/amd64,linux/arm64
     --cache-from=registry.example.null/image-name:buildcache
     --cache-to=type=registry,mode=max,ref=registry.example.null/image-name:buildcache
     --tag registry.example.null/image-name:example-tag
    # ...

The :buildcache image will only store the cache for the last completed build. As the cache isn't for both platforms, it "rotates" between each one each time the CI builds

I will attempt to adapt the @Rongronggg9 workaround (thanks for sharing <3) and report here for reference

lorenzogrv commented 1 year ago

We noticed the problem no longer persist after bumping our CI jobs to use docker:20.10.23 with docker:20.10.23-dind service.

Both cache exported and imported seem correct, build times reduced to a range similar to local usage with local cache.

digglife commented 1 year ago

Hmm, bumped into this today. Seems I have to do the manifest manually.

andrey-bondar commented 8 months ago

Hey! Any new about this issue?