Open potiuk opened 2 years ago
The cache of arm64 overwrites the cache of amD64, causing only one of the two platforms to be available. Using inline mode was a bit expensive for us to manage local storage, there were frequent disk space runs out, and the cache would disappear after the BuildKit container was restarted
@tonistiigi Can this problem be circumvented by adding the default schema suffix to cache-to-Registry? For example: "repo/ubuntu:cache-linux-arm64". Is this easy to develop? It is currently possible to define multiple cache-from, but once cache-to can be suffixed, I can cache-from multiple schemas
This is what I am planning to do - but then such multiplatform image cannot be prepared with single buildx command because you can specify only one --cache-to when you run single multi-platform build even with remote builders
Which renders buildx feature of preparing multi-platform image with remote builders in a single command pretty useless.
What I actually plan to do is do it in two steps (until it is fixed):
1) Build a single multiplatform image and push it without cache 2) Run separate two steps to AGAIN build and push (only cache) in two separate commands for two platforms separately
This is quite an overhead though the build cacje in builders will be reused so the overhead for running 3 commands instead of one should be bearable.
@potiuk There is another workaround that does not require building twice.
Node *: build Docker image on its own and push it to a standalone repo with cache. Main node: concat these images together
docker manifest create USERNAME/REPOSITORY:TAG --amend USERNAME/REPOSITORY-NODE1:TAG --amend USERNAME/REPOSITORY-NODE2:TAG --amend USERNAME/REPOSITORY-NODE*:TAG
docker manifest push USERNAME/REPOSITORY:TAG
Refer to https://github.com/knatnetwork/github-runner/blob/399a888e5c9de2a38854a07570df661d59749284/.github/workflows/build.yml#L116 if you need an actual use case.
I consider it is possible to use only one repo by just using a standalone image tag and cache tag for each node.
I consider docker manifest
may also be able to operate registry-cache tags instead of just image tags, so probably there are other workarounds. If you do a try, could you please comment and let me know?
Yeah. That's what I wanted to avoid to manually manipulate manifests. I prefer to rely on buildx behaviour.
This way I do not have to rely or get the "Nodes" and can nicely use multi-node builder just knowing it's name (and then pushing cache can be done from any node).
Also I think it has some nice properties to separate the caches out in different tag. We have our own "development environment" called breeze
which hides the complexity of where (and when) the cache is used from and it makes it easy to decide which cache to use based on platform. And it's makaes it super easy to track and diagnose user issues as they can copy&paste the verbose command they used and it's a bit easier to track the history of that particular cache. So I will stick to that.
The overhead is very little actually, because in both steps I use the same builders (ARM and AMD hardware based) and the first step just builds a single multplatform image with --push
, where the two subsequent steps just run single platform cache but they are reusing the local cache already built in the first step.
What I actually plan to do is do it in two steps (until it is fixed):
1. Build a single multiplatform image and push it without cache 2. Run separate two steps to AGAIN build and push (only cache) in two separate commands for two platforms separately
Trying this approach, I found that the generated manifest in step 1, is generated with one of two digests (random). The reason for this, I believe, is because it randomly orders the manifest list. This is an additional issue when trying to design a idempotent pipeline.
Attached is an example with the diff of two manifests it randomly generates for two architectures:
--- /tmp/meta-538b4.json 2022-06-20 22:39:33.302897680 -0600
+++ /tmp/meta-80e8a.json 2022-06-20 22:39:57.467873367 -0600
@@ -3,24 +3,24 @@
"manifest": {
"schemaVersion": 2,
"mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
- "digest": "sha256:538b4667e072b437a5ea1e0cd97c2b35d264fd887ef686879b0a20c777940c02",
+ "digest": "sha256:80e8a68eb9363d64eabdeaceb1226ae8b1794e39dd5f06b700bae9d8b1f356d5",
"size": 743,
"manifests": [
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
- "digest": "sha256:cef1b67558700a59f4a0e616d314e05dc8c88074c4c1076fbbfd18cc52e6607b",
+ "digest": "sha256:2bc150cfc0d4b6522738b592205d16130f2f4cde8742cd5434f7c81d8d1b2908",
"size": 1367,
"platform": {
- "architecture": "arm64",
+ "architecture": "amd64",
"os": "linux"
}
},
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
- "digest": "sha256:2bc150cfc0d4b6522738b592205d16130f2f4cde8742cd5434f7c81d8d1b2908",
+ "digest": "sha256:cef1b67558700a59f4a0e616d314e05dc8c88074c4c1076fbbfd18cc52e6607b",
"size": 1367,
"platform": {
- "architecture": "amd64",
+ "architecture": "arm64",
"os": "linux"
}
}
What I actually ended up I simply run two separate steps to push each cache separately. It turned out that I do not "really" need a combined image for development. The only difficulty is that in our automation scripts we derive the cache name from the platform we run it on (but since we have it all encapsulated in breeze
development environment of ours - it was actually pretty easy:
Currently, buildx has support for merging manifest outputs from the builder results. I think it should be possible to implement similar support for merging cache manifests, it should be very similar to the existing logic.
However, we don't have support for push-by-digest
to just push content without a tag for the registry exporter, which would need to be a separate fix first on buildkit.
Same problem here. We build at CI using a dual remote builders strategy, partial code to exemplify:
- docker buildx create --name buildx --driver docker-container --use --platform linux/amd64 --bootstrap ssh://$AMD64_HOST
- docker buildx create --name buildx --append --platform linux/arm64 --bootstrap ssh://$ARM64_HOST
- docker buildx build
--push
--platform linux/amd64,linux/arm64
--cache-from=registry.example.null/image-name:buildcache
--cache-to=type=registry,mode=max,ref=registry.example.null/image-name:buildcache
--tag registry.example.null/image-name:example-tag
# ...
The :buildcache
image will only store the cache for the last completed build. As the cache isn't for both platforms, it "rotates" between each one each time the CI builds
I will attempt to adapt the @Rongronggg9 workaround (thanks for sharing <3) and report here for reference
We noticed the problem no longer persist after bumping our CI jobs to use docker:20.10.23
with docker:20.10.23-dind
service.
Both cache exported and imported seem correct, build times reduced to a range similar to local usage with local cache.
Hmm, bumped into this today. Seems I have to do the manifest manually.
Hey! Any new about this issue?
Originally reported at https://github.com/moby/buildkit/issues/2758
It's been confirmed by @tonistiigi that this is a problem with buildx multi-node builder.
When you are building a multi-platform image with multiple builders (to avoid emulation) and use
--cache-to
type=registry
, the resulting registry cache only contains cache for the platform that that was build last.I tried to utilize buildkit to build Apache Airflow (https://github.com/apache/airflow) multi-platform images. I am using latest buildkit and latest docker:.
Hosts used for the multi-platform builds
I have two builder hosts:
1) AMD builder (Linux Mint 20.3) with
buildx
plugin installed github.com/docker/buildx v0.7.1 05846896d149da05f3d6fd1e7770da187b52a247 -docker builder
created there2) ARM Builder (Mac Pro M1 late 2021) with DockerDesktop 4.6.0 (with buildx pre-install installed) - with new Virtualization framework enabled.
Builder configuration
I configured my buildx builds to use both builders. I connected the MacOS builder to the Linux Host via forwarded docker socket and I am running all my multi-platform builds from the Linux Host.
This is the builders I see with
docker buildx ls
:Build command
I want to build a multi-platform image for both ARM and AMD and I want to do it in a single buildx command. Additionally I want to store cache for both platfiorms in the same image but with
:cache
tag.My image is multi-staging, so I want to push cache for all stages (hence mode=max)
The (simpliified) command to run the build is:
While the ghcr.io/potiuk/airflow/main/ci/python3.7:latest image is perfectly fine (nice, multiplatform image), the
ghcr.io/potiuk/airflow/main/ci/python3.7:cache
image only contains cache by the "LAST" build image - i.e if the AMD image was faster to build and push cache, the cache from the ARM builder pushed later seems to override the AMD cache stored there. I could not find any way to somehow merge those two caches (especially that I cannot specifiy two different cache destination for each of the platforms). This renders the--cache-to,type=registrry
essentially useless for multiplatform builds.I reverted to "inline" mode and it seems to work, but I would really love to keep the latest cache in a separate tag of the image.