docker / buildx

Docker CLI plugin for extended build capabilities with BuildKit
Apache License 2.0
3.56k stars 481 forks source link

Unable to pull from insecure registry #1370

Closed Bidski closed 1 year ago

Bidski commented 2 years ago

I have an insecure registry setup on my local network and I am trying to pull from that registry as part of building my image with buildx.

I have the following setup.

In /etc/docker/daemon.json

{
    "experimental": true,
    "insecure-registries": [ "192.168.189.102:5000" ]
}

In my buildx instance

$ docker buildx inspect buildx_instance
Name:   buildx_instance
Driver: docker-container

Nodes:
Name:           buildx_instance0
Endpoint:       unix:///var/run/docker.sock
Driver Options: network="host" env.BUILDKIT_STEP_LOG_MAX_SIZE="-1" env.BUILDKIT_STEP_LOG_MAX_SPEED="-1"
Status:         running
Flags:          --allow-insecure-entitlement security.insecure --debug
Buildkit:       v0.10.5
Platforms:      linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/amd64/v4, linux/386

And finally, my buildx command line

docker buildx build -t local_image:test --pull --output=type=docker --cache-from=type=registry,ref=192.168.189.102:5000/image:test,registry.insecure=true --allow security.insecure --progress plain docker

However, in the output of that command I see

#10 importing cache manifest from 192.168.189.102:5000/image:test
#10 ERROR: failed to do request: Head "https://192.168.189.102:5000/v2/image/manifests/test": http: server gave HTTP response to HTTPS client

and docker logs says

time="2022-10-24T03:35:02Z" level=debug msg=resolving host="192.168.189.102:5000"
time="2022-10-24T03:35:02Z" level=debug msg="do request" host="192.168.189.102:5000" request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=buildkit/v0.10 request.method=HEAD url="https://192.168.189.102:5000/v2/image/manifests/test"
time="2022-10-24T03:35:02Z" level=info msg="trying next host" error="failed to do request: Head \"https://192.168.189.102:5000/v2/image/manifests/test\": http: server gave HTTP response to HTTPS client" host="192.168.189.102:5000"
time="2022-10-24T03:35:02Z" level=debug msg="error while importing cache manifest from cmId=192.168.189.102:5000/image:test: failed to do request: Head \"https://192.168.189.102:5000/v2/image/manifests/test\": http: server gave HTTP response to HTTPS client"

What am I missing here? Why does docker/buildx insist on treating my insecure registry as a secure registry?

crazy-max commented 2 years ago

When using a docker-container builder you have to set the registry configuration for the BuildKit daemon: https://github.com/docker/buildx/blob/master/docs/guides/custom-registry-config.md

In your case the configuration will look like this:

[registry."192.168.189.102:5000"]
http = true
insecure = true

@tonistiigi @jedevc I wonder if could read DockerAPI.Info(ctx).RegistryConfig.InsecureRegistryCIDRs and automatically set the registry config (if not already populated) in the container when creating a docker-container builder?

jedevc commented 2 years ago

I think the core issue in this issue is that registry.insecure isn't permitted on the --cache-from/--cache-to flags for the registry exporter (see here).

So the buildx command line with --cache-from=type=registry,ref=192.168.189.102:5000/image:test,registry.insecure=true won't use the right config settings. Ideally, we should probably support the registry.insecure flag here as well (it's bitten me before in the past as well).

I think using the DockerAPI to automatically set the registry config does have the issue that we can get out of sync with the docker daemon's config - if a registry is changed from insecure=true to insecure=false, etc. I think that's a more persistent issue with other buildx options as well though :thinking:

crazy-max commented 2 years ago

we can get out of sync with the docker daemon's config

Yes indeed

Bidski commented 2 years ago

I think I am still missing something.

One the machine that hosts the registry I have this buildkit.toml (192.168.189.102 is that machines IP address -- should be roughly equivalent to 127.0.1.1)

debug = true
insecure-entitlements = [ "network.host", "security.insecure" ]

[registry."192.168.189.102:5000"]
  http = true
  insecure = true

I think create a buildx instance as

docker buildx rm buildx_instance && docker buildx create --name buildx_instance --driver-opt env.BUILDKIT_STEP_LOG_MAX_SIZE=-1 --driver-opt env.BUILDKIT_STEP_LOG_MAX_SPEED=-1 --config docker/buildkitd.toml && docker buildx use buildx_instance

and docker buildx inpsect buildx_instance shows

Name:   buildx_instance
Driver: docker-container

Nodes:
Name:           buildx_instance0
Endpoint:       unix:///var/run/docker.sock
Driver Options: env.BUILDKIT_STEP_LOG_MAX_SIZE="-1" env.BUILDKIT_STEP_LOG_MAX_SPEED="-1"
Status:         running
Buildkit:       v0.10.5
Platforms:      linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/amd64/v4, linux/386

and then building the image with this command

docker buildx build --output type=image,\"name=192.168.189.102:5000/image:test,192.168.189.102:5000/image:test_cache\",push=true -t 192.168.189.102:5000/image:test --file docker/Dockerfile --pull --build-arg platform=generic --cache-from type=registry,ref=192.168.189.102:5000/image:test_cache --cache-to type=registry,ref=192.168.189.102:5000/image:test_cache,mode=max docker

shows

 => [internal] load .dockerignore                                                                                                                                                                                                        0.0s
 => => transferring context: 2B                                                                                                                                                                                                          0.0s
 => [internal] load build definition from Dockerfile                                                                                                                                                                                     0.0s
 => => transferring dockerfile: 17.51kB                                                                                                                                                                                                  0.0s
 => resolve image config for docker.io/docker/dockerfile:1.3                                                                                                                                                                             6.0s
 => CACHED docker-image://docker.io/docker/dockerfile:1.3@sha256:42399d4635eddd7a9b8a24be879d2f9a930d0ed040a61324cfdf59ef1357b3b2                                                                                                        0.0s
 => => resolve docker.io/docker/dockerfile:1.3@sha256:42399d4635eddd7a9b8a24be879d2f9a930d0ed040a61324cfdf59ef1357b3b2                                                                                                                   0.0s
 => [internal] load metadata for docker.io/library/archlinux:base-devel-20220710.0.67642                                                                                                                                                 0.3s
 => importing cache manifest from 192.168.189.102:5000/image:test_cache                                                                                                                                                             0.0s
 => [internal] load build context                                                                                                                                                                                                        0.0s
 => => transferring context: 1.33kB  
=========== SNIP (all layers cached) ===========
 => exporting to image                                                                                                                                                                                                                   0.1s
 => => exporting layers                                                                                                                                                                                                                  0.0s
 => => exporting manifest sha256:2f6d120f45bd4fd1b5123a0e039e036112dcbf72377352739f9b4c7ee97bdd5b                                                                                                                                        0.0s
 => => exporting config sha256:f0d3b939ab9782a6f877ca7ad7a2c9d2134ba9da036e340f40e7b8d58ffa26d1                                                                                                                                          0.0s
 => => pushing layers                                                                                                                                                                                                                    0.1s
 => => pushing manifest for 192.168.189.102:5000/image:test@sha256:2f6d120f45bd4fd1b5123a0e039e036112dcbf72377352739f9b4c7ee97bdd5b                                                                                                 0.0s
 => exporting cache                                                                                                                                                                                                                     12.7s
 => => preparing build cache for export         
=========== SNIP (=> => writing layer sha256:{many shas}) ===========        
 => => writing config sha256:922f75492f9fce1124446fd27e4fee2f93f05022001fa150614e9a8bb57dc47e                                                                                                                                            0.0s
 => => writing manifest sha256:b9341065e8df4dbe07b712101bea55ec3ab6bed200f07bf48db18b3e14a4c2c6   

However, when I try to build the same image on a different machine on the same network (192.168.189.102 is accessible from this machine), I get

 => ERROR importing cache manifest from 192.168.189.102:5000/image:test_cache

and the logs for the buildx container shows

time="2022-10-24T22:41:46Z" level=debug msg=resolving host="192.168.189.102:5000"
time="2022-10-24T22:41:46Z" level=debug msg="do request" host="192.168.189.102:5000" request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=buildkit/v0.10 request.method=HEAD url="http://192.168.189.102:5000/v2/image/manifests/test_cache"
time="2022-10-24T22:41:46Z" level=debug msg="fetch response received" host="192.168.189.102:5000" response.header.content-length=100 response.header.content-type="application/json; charset=utf-8" response.header.date="Mon, 24 Oct 2022 22:41:46 GMT" response.header.docker-distribution-api-version=registry/2.0 response.header.x-content-type-options=nosniff response.status="404 Not Found" url="http://192.168.189.102:5000/v2/image/manifests/test_cache"
time="2022-10-24T22:41:46Z" level=info msg="trying next host - response was http.StatusNotFound" host="192.168.189.102:5000"
time="2022-10-24T22:41:46Z" level=debug msg=resolving host="192.168.189.102:5000"
time="2022-10-24T22:41:46Z" level=debug msg="do request" host="192.168.189.102:5000" request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=buildkit/v0.10 request.method=HEAD url="https://192.168.189.102:5000/v2/image/manifests/test_cache"
time="2022-10-24T22:41:46Z" level=info msg="trying next host" error="failed to do request: Head \"https://192.168.189.102:5000/v2/image/manifests/test_cache\": http: server gave HTTP response to HTTPS client" host="192.168.189.102:5000"
time="2022-10-24T22:41:46Z" level=debug msg="error while importing cache manifest from cmId=192.168.189.102:5000/image:test_cache: failed to do request: Head \"https://192.168.189.102:5000/v2/image/manifests/test_cache\": http: server gave HTTP response to HTTPS client"

So it appears that the buildkit config worked, but it can't find the manifest for test_cache even though the registry machine could find it? I also tried to docker push 192.168.189.102:5000/image:test_cache (in case it was built on the registry machine but not pushed to the registry) but it says

The push refers to repository [192.168.189.102:5000/image]
An image does not exist locally with the tag: 192.168.189.102:5000/image

On another note, does docker/buildx look at /etc/hosts on the host machine? I have entries in there so I can use a name rather than the IP address, but whenever I use the name in the docker buildx build command line I get errors saying that a lookup on 1.1.1.1 for the name failed.

jedevc commented 2 years ago

I think the core issue in this issue is that registry.insecure isn't permitted on the --cache-from/--cache-to flags for the registry exporter

I'll take a look at adding this :+1:

However, when I try to build the same image on a different machine on the same network (192.168.189.102 is accessible from this machine)

Hm, to me the error message you've shared looks like an HTTP/HTTPS mismatch. Is there any chance that the registry you're pointing to is serving both? Or behind a load-balancer or something that's doing terminated TLS?

On another note, does docker/buildx look at /etc/hosts on the host machine?

Nope, networking controls are complex enough that would probably not work out well :smile: docker-container does support a network parameter , so you can configure it like you would a normal docker network (though DNS settings aren't properly respected... see https://github.com/moby/buildkit/issues/3210)

Interestingly, docker run supports an --add-host flag for establishing manually mapped hostnames to IP addresses. That would be a nice option to expose to the docker-container to allow manually mapping through specified hosts.

Bidski commented 2 years ago

Hm, to me the error message you've shared looks like an HTTP/HTTPS mismatch. Is there any chance that the registry you're pointing to is serving both? Or behind a load-balancer or something that's doing terminated TLS?

Not that I'm aware of (there is nothing I specifically implemented), but it is part of corporate LAN so perhaps there is something there that is causing issues? Do you have any suggestions on how I would test for this?

Bidski commented 2 years ago

Hm, to me the error message you've shared looks like an HTTP/HTTPS mismatch. Is there any chance that the registry you're pointing to is serving both? Or behind a load-balancer or something that's doing terminated TLS?

Not that I'm aware of (there is nothing I specifically implemented), but it is part of corporate LAN so perhaps there is something there that is causing issues? Do you have any suggestions on how I would test for this?

I have spoken to my IT guys and I have been told that everything on this LAN is HTTP and when we set up the registry all we did was run docker run -d -p 5000:5000 --restart=always --name registry registry:2

Bidski commented 2 years ago

@jedevc can you provide any further insight into this? Am I using the registry to cache the image layers incorrectly? or, is there a bug in docker/buildx? or, is there a bug in my registry setup?

jedevc commented 2 years ago

There are two distinct issues.

  1. Your buildx builder seems to be incorrectly configured: see https://github.com/docker/buildx/issues/1370#issuecomment-1288516840. Config in /etc/docker/daemon.json is not propagated through to docker-container builders, so it has to be added through the buildkitd.toml file.

    I think that possibly the issue you're encountering where it works on one machine but not the other could be caused by a configuration mismatch? Have you configured the buildkit.toml file on each machine that does the build? Buildx builders are per-machine, so each one will need to be individually configured.

  2. BuildKit does not support the registry.insecure option on --cache-to, even though it is supported on the --output flag. This isn't a bug, but a feature parity issue. I've opened a tracking issue in buildkit: https://github.com/moby/buildkit/issues/3266, which is where that fix will need to be made. This would let the original command in your first post work.

Bidski commented 2 years ago

There are two distinct issues.

  1. Your buildx builder seems to be incorrectly configured: I think that possibly the issue you're encountering where it works on one machine but not the other could be caused by a configuration mismatch?

I revisited the configuration on both machines. I thought I had setup the buildkitd.toml on both machines, but I must have missed something. Re-creating the buildx instances on both machines now allows both machines to successfully import the cache. However, on the second machine only the first 12/86 layers are actually cached despite the build context and Dockerfile being identical on both machines (same github branch on both machines with no file changes). Could something have gotten messed up in exporting/importing the cache?

The shas that it is downloading here seem to be the layers that the other machine cached into the registry (I checked a couple of them and they correspond to some of the writing layer sha256:XXXXXXX lines that were listed after => exporting cache)

 => CACHED [stage-0 12/86] RUN cd /usr/local && ln -sf lib lib64                                                                  0.0s
 => [stage-0 13/86] COPY usr/local/bin/install-from-source.py /usr/local/bin/install-from-source.py                               4.2s
 => => sha256:bd528d2577159e0ce51ca2884c48457bd71e590e230420edb755023dcf5b922a 3.76kB / 3.76kB                                    0.1s
 => => sha256:bca3f07763dd90d1235deec8bbdf633cd809fd9ba87d4ee4edcad9b9bf5dc3e6 152B / 152B                                        0.1s
 => => sha256:9b0177bccf71ffab69777a4270558eb835a3959a9f26837373c3a5e317b1b7d8 185B / 185B                                        0.1s
 => => sha256:e8e5f3fb4d2533074effc99bf1106171520843d0e6e054d905e1cfa990d04f8b 9.78kB / 9.78kB                                    0.1s
 => => sha256:8b677945581654e666c7eaae1de57c0268bdb4c2ff99d1899a7cc6dea029b910 233B / 233B                                        0.0s
 => => sha256:5d6a08ff0bf3050d3888223aa5eccdddab75862ba904f7a5f7a7a49f170c3ff3 224B / 224B                                        0.0s
 => => sha256:623eeefe45f44f9b29fa9e6d0790ed29181138088b21cdf4321236e2a5e52f55 380B / 380B                                        0.0s
 => => sha256:e082a62dfa80e775f651da7d1433e07373d7893c8aa579c83040c7619619e157 534B / 534B                                        0.0s
 => => sha256:d91ef4f5429d7c09946dc7888cb014b99a4f91006bcbc1382bcb6e5857830c55 281B / 281B                                        0.0s
 => => sha256:fc89bea3de1c6a1b356aa49a6b94eb3bdeda28205881f398ac4f2f7ffbd73299 31.46MB / 4.07GB                                   3.6s
 => => sha256:57cdcc5b6ba04f4b199919110b9e257754c1d633724d2ee94058b332273f5943 32.51MB / 114.13MB                                 3.6s
 => => sha256:bd7665b2f2f4f669917d5a26d25f50ed8f69da7d54f73b8214256595921b3af6 8.14kB / 8.14kB                                    0.0s
 => => sha256:319fb84e027c3db31df3d44d9c4baec2e73d77b853832ca909e879d91985ad0a 8.93MB / 8.93MB                                    0.8s
 => => sha256:b8df6b19916f34e335fbd9550123d1a4ad91942d28dc629c07fe37beb4f906ef 26.21MB / 223.70MB                                 3.5s
crazy-max commented 2 years ago

However, on the second machine only the first 12/86 layers are actually cached despite the build context and Dockerfile being identical on both machines (same github branch on both machines with no file changes). Could something have gotten messed up in exporting/importing the cache?

@Bidski Can you post your Dockerfile please or link to your repo if public? Would like to repro but looking at [stage-0 12/86] it seems quite huge.

jedevc commented 2 years ago

Also, just to confirm - you are using mode=max on the cache?

Bidski commented 2 years ago

@crazy-max unfortunately I don't think I can make this public. You are correct in assuming how large it is, it is also quite computationally intensive to build some of the layers. This is one of the key reasons we are trying to get this layer cache in the registry working.

Our dockerfile is basically either copying files from the build context into the image, or we are mounting scripts as part of the RUN command (RUN --mount...) and executing those scripts, or we are just running commands inside the image.

Does the UID/GID of the owner of the files in the build context have any impact on the cache validity? This is the only thing that I know of that would be different about the files on the two machines (apart from their last modified timestamp).

@jedevc I am using mode=max, see https://github.com/docker/buildx/issues/1370#issuecomment-1289737904

Bidski commented 1 year ago

So for the most part this all seems to work now. I changed to using --cache-to=type=inline,mode=max. However, I still have a couple of systems where only the first couple of layers are cached (seems to be consistently the first 14/87 layers are caching), whereas every other system I have tested this on will cache the first 80/87 layers (this is expected as we change a build arg for layer 80).

The only thing that seems to be different between the systems seems to be the docker version. This is the output of docker version on a system that works (caches 80/87 layers)

Client:
 Version:           20.10.21
 API version:       1.41
 Go version:        go1.19.2
 Git commit:        baeda1f82a
 Built:             Thu Oct 27 21:30:31 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.21
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.19.2
  Git commit:       3056208812
  Built:            Thu Oct 27 21:29:34 2022
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          v1.6.9
  GitCommit:        1c90a442489720eec95342e1789ee8a5e1b9536f.m
 runc:
  Version:          1.1.4
  GitCommit:        
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

and this is from a system that doesnt work

Client:
 Version:           20.10.17
 API version:       1.41
 Go version:        go1.17.11
 Git commit:        100c701
 Built:             Mon Jun 6 23:02:46 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.17
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.17.11
  Git commit:       a89b842
  Built:            Mon Jun 6 23:00:51 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.6.7
  GitCommit:        0197261a30bf81f1ee8e6a4dd2dea0ef95d67ccb
 runc:
  Version:          1.1.3
  GitCommit:        v1.1.3-0-g6724737
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Is there anything that happened between these two versions that may be related to this issue?

jedevc commented 1 year ago

--cache-to=type=inline does not support mode=max.

I'm gonna close this issue, in preference of the one on the buildkit repo - any remaining problems with caching/etc aren't related to the original insecure registry problem. If you're still having issues, then a different issue/discussion would be the right place for it, or even a thread in #buildkit on our community slack :heart: