Closed Bidski closed 1 year ago
When using a docker-container
builder you have to set the registry configuration for the BuildKit daemon: https://github.com/docker/buildx/blob/master/docs/guides/custom-registry-config.md
In your case the configuration will look like this:
[registry."192.168.189.102:5000"]
http = true
insecure = true
@tonistiigi @jedevc I wonder if could read DockerAPI.Info(ctx).RegistryConfig.InsecureRegistryCIDRs
and automatically set the registry config (if not already populated) in the container when creating a docker-container
builder?
I think the core issue in this issue is that registry.insecure
isn't permitted on the --cache-from
/--cache-to
flags for the registry exporter (see here).
So the buildx command line with --cache-from=type=registry,ref=192.168.189.102:5000/image:test,registry.insecure=true
won't use the right config settings. Ideally, we should probably support the registry.insecure
flag here as well (it's bitten me before in the past as well).
I think using the DockerAPI to automatically set the registry config does have the issue that we can get out of sync with the docker daemon's config - if a registry is changed from insecure=true
to insecure=false
, etc. I think that's a more persistent issue with other buildx options as well though :thinking:
we can get out of sync with the docker daemon's config
Yes indeed
I think I am still missing something.
One the machine that hosts the registry I have this buildkit.toml
(192.168.189.102
is that machines IP address -- should be roughly equivalent to 127.0.1.1
)
debug = true
insecure-entitlements = [ "network.host", "security.insecure" ]
[registry."192.168.189.102:5000"]
http = true
insecure = true
I think create a buildx instance as
docker buildx rm buildx_instance && docker buildx create --name buildx_instance --driver-opt env.BUILDKIT_STEP_LOG_MAX_SIZE=-1 --driver-opt env.BUILDKIT_STEP_LOG_MAX_SPEED=-1 --config docker/buildkitd.toml && docker buildx use buildx_instance
and docker buildx inpsect buildx_instance
shows
Name: buildx_instance
Driver: docker-container
Nodes:
Name: buildx_instance0
Endpoint: unix:///var/run/docker.sock
Driver Options: env.BUILDKIT_STEP_LOG_MAX_SIZE="-1" env.BUILDKIT_STEP_LOG_MAX_SPEED="-1"
Status: running
Buildkit: v0.10.5
Platforms: linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/amd64/v4, linux/386
and then building the image with this command
docker buildx build --output type=image,\"name=192.168.189.102:5000/image:test,192.168.189.102:5000/image:test_cache\",push=true -t 192.168.189.102:5000/image:test --file docker/Dockerfile --pull --build-arg platform=generic --cache-from type=registry,ref=192.168.189.102:5000/image:test_cache --cache-to type=registry,ref=192.168.189.102:5000/image:test_cache,mode=max docker
shows
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 17.51kB 0.0s
=> resolve image config for docker.io/docker/dockerfile:1.3 6.0s
=> CACHED docker-image://docker.io/docker/dockerfile:1.3@sha256:42399d4635eddd7a9b8a24be879d2f9a930d0ed040a61324cfdf59ef1357b3b2 0.0s
=> => resolve docker.io/docker/dockerfile:1.3@sha256:42399d4635eddd7a9b8a24be879d2f9a930d0ed040a61324cfdf59ef1357b3b2 0.0s
=> [internal] load metadata for docker.io/library/archlinux:base-devel-20220710.0.67642 0.3s
=> importing cache manifest from 192.168.189.102:5000/image:test_cache 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 1.33kB
=========== SNIP (all layers cached) ===========
=> exporting to image 0.1s
=> => exporting layers 0.0s
=> => exporting manifest sha256:2f6d120f45bd4fd1b5123a0e039e036112dcbf72377352739f9b4c7ee97bdd5b 0.0s
=> => exporting config sha256:f0d3b939ab9782a6f877ca7ad7a2c9d2134ba9da036e340f40e7b8d58ffa26d1 0.0s
=> => pushing layers 0.1s
=> => pushing manifest for 192.168.189.102:5000/image:test@sha256:2f6d120f45bd4fd1b5123a0e039e036112dcbf72377352739f9b4c7ee97bdd5b 0.0s
=> exporting cache 12.7s
=> => preparing build cache for export
=========== SNIP (=> => writing layer sha256:{many shas}) ===========
=> => writing config sha256:922f75492f9fce1124446fd27e4fee2f93f05022001fa150614e9a8bb57dc47e 0.0s
=> => writing manifest sha256:b9341065e8df4dbe07b712101bea55ec3ab6bed200f07bf48db18b3e14a4c2c6
However, when I try to build the same image on a different machine on the same network (192.168.189.102 is accessible from this machine), I get
=> ERROR importing cache manifest from 192.168.189.102:5000/image:test_cache
and the logs for the buildx container shows
time="2022-10-24T22:41:46Z" level=debug msg=resolving host="192.168.189.102:5000"
time="2022-10-24T22:41:46Z" level=debug msg="do request" host="192.168.189.102:5000" request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=buildkit/v0.10 request.method=HEAD url="http://192.168.189.102:5000/v2/image/manifests/test_cache"
time="2022-10-24T22:41:46Z" level=debug msg="fetch response received" host="192.168.189.102:5000" response.header.content-length=100 response.header.content-type="application/json; charset=utf-8" response.header.date="Mon, 24 Oct 2022 22:41:46 GMT" response.header.docker-distribution-api-version=registry/2.0 response.header.x-content-type-options=nosniff response.status="404 Not Found" url="http://192.168.189.102:5000/v2/image/manifests/test_cache"
time="2022-10-24T22:41:46Z" level=info msg="trying next host - response was http.StatusNotFound" host="192.168.189.102:5000"
time="2022-10-24T22:41:46Z" level=debug msg=resolving host="192.168.189.102:5000"
time="2022-10-24T22:41:46Z" level=debug msg="do request" host="192.168.189.102:5000" request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=buildkit/v0.10 request.method=HEAD url="https://192.168.189.102:5000/v2/image/manifests/test_cache"
time="2022-10-24T22:41:46Z" level=info msg="trying next host" error="failed to do request: Head \"https://192.168.189.102:5000/v2/image/manifests/test_cache\": http: server gave HTTP response to HTTPS client" host="192.168.189.102:5000"
time="2022-10-24T22:41:46Z" level=debug msg="error while importing cache manifest from cmId=192.168.189.102:5000/image:test_cache: failed to do request: Head \"https://192.168.189.102:5000/v2/image/manifests/test_cache\": http: server gave HTTP response to HTTPS client"
So it appears that the buildkit config worked, but it can't find the manifest for test_cache
even though the registry machine could find it? I also tried to docker push 192.168.189.102:5000/image:test_cache
(in case it was built on the registry machine but not pushed to the registry) but it says
The push refers to repository [192.168.189.102:5000/image]
An image does not exist locally with the tag: 192.168.189.102:5000/image
On another note, does docker/buildx look at /etc/hosts
on the host machine? I have entries in there so I can use a name rather than the IP address, but whenever I use the name in the docker buildx build
command line I get errors saying that a lookup on 1.1.1.1
for the name failed.
I think the core issue in this issue is that registry.insecure isn't permitted on the --cache-from/--cache-to flags for the registry exporter
I'll take a look at adding this :+1:
However, when I try to build the same image on a different machine on the same network (192.168.189.102 is accessible from this machine)
Hm, to me the error message you've shared looks like an HTTP/HTTPS mismatch. Is there any chance that the registry you're pointing to is serving both? Or behind a load-balancer or something that's doing terminated TLS?
On another note, does docker/buildx look at /etc/hosts on the host machine?
Nope, networking controls are complex enough that would probably not work out well :smile: docker-container
does support a network
parameter , so you can configure it like you would a normal docker network (though DNS settings aren't properly respected... see https://github.com/moby/buildkit/issues/3210)
Interestingly, docker run
supports an --add-host
flag for establishing manually mapped hostnames to IP addresses. That would be a nice option to expose to the docker-container to allow manually mapping through specified hosts.
Hm, to me the error message you've shared looks like an HTTP/HTTPS mismatch. Is there any chance that the registry you're pointing to is serving both? Or behind a load-balancer or something that's doing terminated TLS?
Not that I'm aware of (there is nothing I specifically implemented), but it is part of corporate LAN so perhaps there is something there that is causing issues? Do you have any suggestions on how I would test for this?
Hm, to me the error message you've shared looks like an HTTP/HTTPS mismatch. Is there any chance that the registry you're pointing to is serving both? Or behind a load-balancer or something that's doing terminated TLS?
Not that I'm aware of (there is nothing I specifically implemented), but it is part of corporate LAN so perhaps there is something there that is causing issues? Do you have any suggestions on how I would test for this?
I have spoken to my IT guys and I have been told that everything on this LAN is HTTP and when we set up the registry all we did was run docker run -d -p 5000:5000 --restart=always --name registry registry:2
@jedevc can you provide any further insight into this? Am I using the registry to cache the image layers incorrectly? or, is there a bug in docker/buildx? or, is there a bug in my registry setup?
There are two distinct issues.
Your buildx builder seems to be incorrectly configured: see https://github.com/docker/buildx/issues/1370#issuecomment-1288516840. Config in /etc/docker/daemon.json
is not propagated through to docker-container builders, so it has to be added through the buildkitd.toml file.
I think that possibly the issue you're encountering where it works on one machine but not the other could be caused by a configuration mismatch? Have you configured the buildkit.toml
file on each machine that does the build? Buildx builders are per-machine, so each one will need to be individually configured.
BuildKit does not support the registry.insecure
option on --cache-to
, even though it is supported on the --output
flag. This isn't a bug, but a feature parity issue. I've opened a tracking issue in buildkit: https://github.com/moby/buildkit/issues/3266, which is where that fix will need to be made. This would let the original command in your first post work.
There are two distinct issues.
- Your buildx builder seems to be incorrectly configured: I think that possibly the issue you're encountering where it works on one machine but not the other could be caused by a configuration mismatch?
I revisited the configuration on both machines. I thought I had setup the buildkitd.toml
on both machines, but I must have missed something. Re-creating the buildx instances on both machines now allows both machines to successfully import the cache. However, on the second machine only the first 12/86 layers are actually cached despite the build context and Dockerfile being identical on both machines (same github branch on both machines with no file changes). Could something have gotten messed up in exporting/importing the cache?
The sha
s that it is downloading here seem to be the layers that the other machine cached into the registry (I checked a couple of them and they correspond to some of the writing layer sha256:XXXXXXX
lines that were listed after => exporting cache
)
=> CACHED [stage-0 12/86] RUN cd /usr/local && ln -sf lib lib64 0.0s
=> [stage-0 13/86] COPY usr/local/bin/install-from-source.py /usr/local/bin/install-from-source.py 4.2s
=> => sha256:bd528d2577159e0ce51ca2884c48457bd71e590e230420edb755023dcf5b922a 3.76kB / 3.76kB 0.1s
=> => sha256:bca3f07763dd90d1235deec8bbdf633cd809fd9ba87d4ee4edcad9b9bf5dc3e6 152B / 152B 0.1s
=> => sha256:9b0177bccf71ffab69777a4270558eb835a3959a9f26837373c3a5e317b1b7d8 185B / 185B 0.1s
=> => sha256:e8e5f3fb4d2533074effc99bf1106171520843d0e6e054d905e1cfa990d04f8b 9.78kB / 9.78kB 0.1s
=> => sha256:8b677945581654e666c7eaae1de57c0268bdb4c2ff99d1899a7cc6dea029b910 233B / 233B 0.0s
=> => sha256:5d6a08ff0bf3050d3888223aa5eccdddab75862ba904f7a5f7a7a49f170c3ff3 224B / 224B 0.0s
=> => sha256:623eeefe45f44f9b29fa9e6d0790ed29181138088b21cdf4321236e2a5e52f55 380B / 380B 0.0s
=> => sha256:e082a62dfa80e775f651da7d1433e07373d7893c8aa579c83040c7619619e157 534B / 534B 0.0s
=> => sha256:d91ef4f5429d7c09946dc7888cb014b99a4f91006bcbc1382bcb6e5857830c55 281B / 281B 0.0s
=> => sha256:fc89bea3de1c6a1b356aa49a6b94eb3bdeda28205881f398ac4f2f7ffbd73299 31.46MB / 4.07GB 3.6s
=> => sha256:57cdcc5b6ba04f4b199919110b9e257754c1d633724d2ee94058b332273f5943 32.51MB / 114.13MB 3.6s
=> => sha256:bd7665b2f2f4f669917d5a26d25f50ed8f69da7d54f73b8214256595921b3af6 8.14kB / 8.14kB 0.0s
=> => sha256:319fb84e027c3db31df3d44d9c4baec2e73d77b853832ca909e879d91985ad0a 8.93MB / 8.93MB 0.8s
=> => sha256:b8df6b19916f34e335fbd9550123d1a4ad91942d28dc629c07fe37beb4f906ef 26.21MB / 223.70MB 3.5s
However, on the second machine only the first 12/86 layers are actually cached despite the build context and Dockerfile being identical on both machines (same github branch on both machines with no file changes). Could something have gotten messed up in exporting/importing the cache?
@Bidski Can you post your Dockerfile please or link to your repo if public? Would like to repro but looking at [stage-0 12/86]
it seems quite huge.
Also, just to confirm - you are using mode=max
on the cache?
@crazy-max unfortunately I don't think I can make this public. You are correct in assuming how large it is, it is also quite computationally intensive to build some of the layers. This is one of the key reasons we are trying to get this layer cache in the registry working.
Our dockerfile is basically either copying files from the build context into the image, or we are mounting scripts as part of the RUN command (RUN --mount...) and executing those scripts, or we are just running commands inside the image.
Does the UID/GID of the owner of the files in the build context have any impact on the cache validity? This is the only thing that I know of that would be different about the files on the two machines (apart from their last modified timestamp).
@jedevc I am using mode=max
, see https://github.com/docker/buildx/issues/1370#issuecomment-1289737904
So for the most part this all seems to work now. I changed to using --cache-to=type=inline,mode=max
. However, I still have a couple of systems where only the first couple of layers are cached (seems to be consistently the first 14/87 layers are caching), whereas every other system I have tested this on will cache the first 80/87 layers (this is expected as we change a build arg for layer 80).
The only thing that seems to be different between the systems seems to be the docker version. This is the output of docker version
on a system that works (caches 80/87 layers)
Client:
Version: 20.10.21
API version: 1.41
Go version: go1.19.2
Git commit: baeda1f82a
Built: Thu Oct 27 21:30:31 2022
OS/Arch: linux/amd64
Context: default
Experimental: true
Server:
Engine:
Version: 20.10.21
API version: 1.41 (minimum version 1.12)
Go version: go1.19.2
Git commit: 3056208812
Built: Thu Oct 27 21:29:34 2022
OS/Arch: linux/amd64
Experimental: true
containerd:
Version: v1.6.9
GitCommit: 1c90a442489720eec95342e1789ee8a5e1b9536f.m
runc:
Version: 1.1.4
GitCommit:
docker-init:
Version: 0.19.0
GitCommit: de40ad0
and this is from a system that doesnt work
Client:
Version: 20.10.17
API version: 1.41
Go version: go1.17.11
Git commit: 100c701
Built: Mon Jun 6 23:02:46 2022
OS/Arch: linux/amd64
Context: default
Experimental: true
Server:
Engine:
Version: 20.10.17
API version: 1.41 (minimum version 1.12)
Go version: go1.17.11
Git commit: a89b842
Built: Mon Jun 6 23:00:51 2022
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: v1.6.7
GitCommit: 0197261a30bf81f1ee8e6a4dd2dea0ef95d67ccb
runc:
Version: 1.1.3
GitCommit: v1.1.3-0-g6724737
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Is there anything that happened between these two versions that may be related to this issue?
--cache-to=type=inline
does not support mode=max
.
I'm gonna close this issue, in preference of the one on the buildkit repo - any remaining problems with caching/etc aren't related to the original insecure registry problem. If you're still having issues, then a different issue/discussion would be the right place for it, or even a thread in #buildkit on our community slack :heart:
I have an insecure registry setup on my local network and I am trying to pull from that registry as part of building my image with buildx.
I have the following setup.
In
/etc/docker/daemon.json
In my buildx instance
And finally, my buildx command line
However, in the output of that command I see
and
docker logs
saysWhat am I missing here? Why does docker/buildx insist on treating my insecure registry as a secure registry?