docker / buildx

Docker CLI plugin for extended build capabilities with BuildKit
Apache License 2.0
3.52k stars 473 forks source link

setting DOCKER_HOST variable changes default builder #1834

Open nicks opened 1 year ago

nicks commented 1 year ago

Contributing guidelines

I've found a bug and checked that ...

Description

If I set DOCKER_HOST, it mysteriously overrides my default builder, even in other shells where DOCKER_HOST is not set.

Repro steps:

docker buildx create --name=buildx-multiarch --driver=docker-container
docker buildx use buildx-multiarch
DOCKER_HOST=unix:///home/nick/.docker/desktop/docker.sock docker buildx bake
docker buildx ls

Expected behaviour

After this sequence of commands, I would expect buildkit-multiarch to be the default builder.

Actual behaviour

My default builder has mysteriously changed to desktop-linux

Buildx version

github.com/docker/buildx v0.10.5 86bdced7766639d56baa4c7c449a4f6468490f87

Docker info

Client: Docker Engine - Community
 Version:    24.0.0
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.10.5
    Path:     /home/nick/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.18.1
    Path:     /usr/lib/docker/cli-plugins/docker-compose
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.0
    Path:     /usr/lib/docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.19
    Path:     /usr/lib/docker/cli-plugins/docker-extension
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v0.1.0-beta.4
    Path:     /usr/lib/docker/cli-plugins/docker-init
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     /usr/lib/docker/cli-plugins/docker-sbom
  scan: Docker Scan (Docker Inc.)
    Version:  v0.26.0
    Path:     /usr/lib/docker/cli-plugins/docker-scan
  scout: Command line tool for Docker Scout (Docker Inc.)
    Version:  v0.12.0
    Path:     /usr/lib/docker/cli-plugins/docker-scout
WARNING: Plugin "/home/nick/.docker/cli-plugins/docker-buildx-backup" is not valid: plugin candidate "buildx-backup" did not match "^[a-z][a-z0-9]*$"
WARNING: Plugin "/usr/lib/docker/cli-plugins/docker-compose.14.backup" is not valid: plugin candidate "compose.14.backup" did not match "^[a-z][a-z0-9]*$"

Server:
 Containers: 14
  Running: 8
  Paused: 0
  Stopped: 6
 Images: 196
 Server Version: 24.0.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 3dce8eb055cbb6872793272b4f20ed16117344f8
 runc version: v1.1.7-0-g860f061
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 5.15.49-linuxkit-pr
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 6
 Total Memory: 7.526GiB
 Name: docker-desktop
 ID: a92cef06-564f-4766-91bd-bc9e839af9fa
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false

Builders list

docker buildx ls
NAME/NODE                  DRIVER/ENDPOINT  STATUS   BUILDKIT PLATFORMS
buildx-docker-container    docker-container                   
  buildx-docker-container0 desktop-linux    inactive          
buildx-multiarch           docker-container                   
  buildx-multiarch0        desktop-linux    running  v0.11.6  linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6
default                    docker                             
  default                  default          running  v0.11.6  linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/riscv64, linux/ppc64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6
desktop-linux *            docker                             
  desktop-linux            desktop-linux    running  v0.11.6  linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6

Configuration

Doesn't matter. Even if there's no bake file and the build is an error, it still happens

Build logs

No response

Additional info

This is boiled down from a larger repro case where a script was invoking buildx with DOCKER_HOST and i couldn't figure out why the buildx builder was changing.

tonistiigi commented 1 year ago

You are missing use --global if you want the choice to persist if you are switching Docker context/endpoint. https://docs.docker.com/engine/reference/commandline/buildx_use/ . This is because a context switch should switch all the Docker commands over to the configuration defined by the context. It would break the flows where the user sets a new context, meaning it will switch containers and images but not the builder.

Setting DOCKER_HOST is slightly different though. I'm not sure if it doesn't work at all for saving non-global default instance values, or the problem is that there is a socket address mismatch against a context configured with the same socket. Another way to look at it is that use defines a default, but env variables like DOCKER_HOST or BUILDX_BUILDER are a way for user to say that they want a specific behavior and not the default (at least if it is not global). When defining DOCKER_HOST doesn't actually change to a different endpoint then I agree that it would be unexpected for the builder behavior to change(not clear if this is the case atm).

The docker context ls behavior is also interesting when DOCKER_HOST is defined. It doesn't try to match context by endpoint, but it means you always are on "default" context, even if you point to some remote endpoint.

nicks commented 1 year ago

When defining DOCKER_HOST doesn't actually change to a different endpoint then I agree that it would be unexpected for the builder behavior to change`

As far as I can tell, it doesn't matter what your starting context is. I can reproduce the issue with both docker contexts:

❯ docker context use desktop-linux
desktop-linux
Current context is now "desktop-linux"

❯ docker context ls
NAME                TYPE                DESCRIPTION                               DOCKER ENDPOINT                                 KUBERNETES ENDPOINT   ORCHESTRATOR
default             moby                Current DOCKER_HOST based configuration   unix:///var/run/docker.sock                                           
desktop-linux *     moby                Docker Desktop                            unix:///home/nick/.docker/desktop/docker.sock                         

❯ docker buildx use buildx-multiarch

❯ DOCKER_HOST=unix:///home/nick/.docker/desktop/docker.sock docker buildx bake
[+] Building 0.0s (0/0)                                                                                                              docker:default
ERROR: failed to find target default

❯ docker buildx ls
NAME/NODE                  DRIVER/ENDPOINT          STATUS   BUILDKIT PLATFORMS
buildx-docker-container    docker-container                           
  buildx-docker-container0 desktop-linux            inactive          
buildx-multiarch           docker-container                           
  buildx-multiarch0        desktop-linux            stopped           
default                    docker                                     
  default                  default                  running  v0.11.6  linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/riscv64, linux/ppc64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6
desktop-linux *            docker                                     
  desktop-linux            desktop-linux            running  v0.11.6  linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6

❯ docker context use default
default
Current context is now "default"

❯ docker buildx use buildx-multiarch

❯ DOCKER_HOST=unix:///home/nick/.docker/desktop/docker.sock docker buildx bake
[+] Building 0.0s (0/0)                                                                                                              docker:default
ERROR: failed to find target default

❯ docker buildx ls
NAME/NODE                  DRIVER/ENDPOINT          STATUS   BUILDKIT PLATFORMS
buildx-docker-container    docker-container                           
  buildx-docker-container0 desktop-linux            inactive          
buildx-multiarch           docker-container                           
  buildx-multiarch0        desktop-linux            stopped
default *                  docker                                     
  default                  default                  running  v0.11.6  linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/riscv64, linux/ppc64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6
desktop-linux              docker                                     
  desktop-linux            desktop-linux            running  v0.11.6  linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6
nicks commented 1 year ago

Confirmed that it persists if I use --global

tonistiigi commented 1 year ago

Does this output make sense to you

» docker context ls
NAME            DESCRIPTION                               DOCKER ENDPOINT                                      ERROR
default         Current DOCKER_HOST based configuration   unix:///var/run/docker.sock
desktop-linux   Docker Desktop                            unix:///Users/tonistiigi/.docker/run/docker.sock
moby24 *                                                  tcp://localhost:2375

» DOCKER_HOST=tcp://localhost:2375 docker context ls
NAME            DESCRIPTION                               DOCKER ENDPOINT                                      ERROR
default *       Current DOCKER_HOST based configuration   tcp://localhost:2375
desktop-linux   Docker Desktop                            unix:///Users/tonistiigi/.docker/run/docker.sock
moby24                                                    tcp://localhost:2375
Warning: DOCKER_HOST environment variable overrides the active context. To use a context, either set the global --context flag, or unset DOCKER_HOST environment variable.

Or should that also be based on socket address changing or not? I think this is the underlying case where the DOCKER_HOST behavior is coming from.

thaJeztah commented 1 year ago

Yes, DOCKER_HOST takes precedence over DOCKER_CONTEXT (as both can change the host to connect to, but DOCKER_HOST was already existing); see https://pkg.go.dev/github.com/docker/cli@v24.0.1+incompatible/cli/command#DockerCli.CurrentContext

I guess the confusing bit is how "context" and "current builder" interact.

thaJeztah commented 1 year ago

Erm, my comment was a bit mixed put, but the link describes the order (and fallbacks0

tonistiigi commented 1 year ago

Yes, DOCKER_HOST takes precedence over DOCKER_CONTEXT

I think this is expected in all cases. Question is that if defining DOCKER_HOST doesn't actually change the target dockerd instance and then should it cause different behavior in any of the commands (eg. like it does in context ls).

I think it is also expected that if DOCKER_HOST or DOCKER_CONTEXT is defined in env, then any docker command goes to the instance set by the env, and not to default for a file from the previous context use or buildx use (with --global being a way to override this).

But again, if defining DOCKER_HOST or DOCKER_CONTEXT doesn't actually change the target, then it is more confusing and harder to understand why the behavior can change in some cases.

nicks commented 1 year ago

I poked around a little bit at the implementation. I think the bug is sometimes buildx uses DOCKER_HOST as the key and sometimes it uses the context name?

❯ docker context use desktop-linux
desktop-linux
Current context is now "desktop-linux"

❯ docker buildx use buildx-multiarch

❯ docker buildx bake
[+] Building 0.0s (0/0)                                                            docker-container:buildx-multiarch
ERROR: failed to find target default

❯ cat ~/.docker/buildx/current 
{"Key":"desktop-linux","Name":"buildx-multiarch","Global":false}

❯ DOCKER_HOST=unix:///home/nick/.docker/desktop/docker.sock docker buildx bake
ERROR: Cannot connect to the Docker daemon at unix:///home/nick/.docker/desktop/docker.sock. Is the docker daemon running?

❯ cat ~/.docker/buildx/current 
{"Key":"unix:///home/nick/.docker/desktop/docker.sock","Name":"","Global":false}

Have we considered making the buildx builder a "property" of the docker context? e.g., so switching the docker context to X, then switching it back would also switch back the original buildx builder?

[edited: accidentally copied the wrong repro steps, updated now]