docker / buildx

Docker CLI plugin for extended build capabilities with BuildKit
Apache License 2.0
3.5k stars 471 forks source link

[docker 20.10+SSH] Cannot build on remote Engine since 0.12.0 #2356

Open LaXiS96 opened 6 months ago

LaXiS96 commented 6 months ago

Contributing guidelines

I've found a bug and checked that ...

Description

I originally reported this issue in the compose repository: https://github.com/docker/compose/issues/11165

Either building via compose or with docker buildx build . hangs for a while and then exits with an error.

I can successfully build images with 0.11.2, while all versions since 0.12.0 including the latest one fail.

The environment is:

The issue could be related to SSH authentication, since even plain docker commands take a while to do anything for no apparent reason, and I've been noticing this since switching from certificate authentication.

Expected behaviour

Running docker buildx build . and the image being built

Actual behaviour

docker buildx build . hangs for about 3 minutes (consistent timing every time) and returns this error:

[+] Building 0.0s (0/0) docker:default
ERROR: listing workers for Build: failed to list workers: Unavailable: connection error: desc = "transport: failed to write client preface: write |1: file already closed"

Buildx version

github.com/docker/buildx v0.13.1 788433953af10f2a698f5c07611dddce2e08c7a0

Docker info

Client:
 Version:    26.0.0
 Context:    default
 Debug Mode: true
 Plugins:
    Path:     C:\Users\me\.docker\cli-plugins\docker-buildx.exe
  compose: Docker Compose (Docker Inc.)
    Version:  v2.20.0
    Path:     C:\Users\me\.docker\cli-plugins\docker-compose.exe

Server:
 Containers: 6
  Running: 1
  Paused: 0
  Stopped: 5
 Images: 11
 Server Version: 20.10.14
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
 runc version:
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.10.162-1.ph4
 Operating System: VMware Photon OS/Linux
 OSType: linux
 Architecture: x86_64
 CPUs: 6
 Total Memory: 7.773GiB
 Name: photon-9662f57d592b
 ID: 37P5:MBLZ:TGUE:HROE:PPZN:56OC:RT7J:ODLZ:B4PZ:URHC:3OGI:EQJJ
 Debug Mode: false
  File Descriptors: 33
  System Time: 2024-03-26T09:52:09.406820113Z
  EventsListeners: 0
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

Builders list

NAME/NODE   DRIVER/ENDPOINT   STATUS    BUILDKIT   PLATFORMS
default*                      error

Cannot load builder default: Get "http://docker.example.com/_ping": context deadline exceeded

Configuration

Not relevant

Build logs

No response

Additional info

No response

LaXiS96 commented 3 months ago

Can anybody check this out? As you can see from the linked compose issue, this problem is also affecting people on Docker Desktop. We are stuck on compose 2.20.0 since all versions after that one vendor a buildx version which does not work at all on Windows.

LaXiS96 commented 2 weeks ago

Hello, is anybody there? What even is the point of opening issues if maintainers don't care about them? I'd be happy to help diagnose the issue and get it resolved...

Problem persists with v0.17.1 as vendored in compose v2.29.7 with the same exact behavior...

LaXiS96 commented 2 weeks ago

To add another data point: I created a new Debian 12 virtual machine and installed the latest Docker Engine v27.3.0. Then I updated the Windows Docker CLI to v27.3.0 and Compose to v2.29.7. The behavior is again exactly the same, so the issue is not limited to Docker Engine v20.10.