docker / buildx

Docker CLI plugin for extended build capabilities with BuildKit
Apache License 2.0
3.57k stars 482 forks source link

TLS timeout with multiarch builds #350

Closed carlonluca closed 3 years ago

carlonluca commented 4 years ago

When I try to do a multiarch build, I get a TLS timeout error when pulling or pushing. For instance:

[...]
 => => pushing layers                                                                                                                                                                                                                                                    10.3s
------
 > exporting to image:
------
failed to solve: rpc error: code = Unknown desc = failed to do request: Head https://registry-1.docker.io/v2/.../sha256:ea514d8f72c75534dbe7d5ceeda09f786eb5e1d9e35335dcd2f8b5ca4a62259a: net/http: TLS handshake timeout

for the command:

docker buildx build --push --platform linux/386,linux/arm/v7,linux/arm64/v8,linux/amd64,linux/ppc64le -t ... -t ... .

Same happens when pulling. My understanding is that docker tries to pull or push many entities concurrently, and this is too much for my network connection. The timeout is very short, like a few seconds. If I build each arch separately, it succeeds after a few attempts. Is there any way to set a longer timeout or to set a max number of concurrent transfers? Thanks.

tonistiigi commented 4 years ago

This is set in https://github.com/moby/buildkit/blob/master/util/resolver/resolver.go#L195 . If you can, please test with a custom build (you can set buildx create to use custom buildkit image) and if this solves the issue we can make it configurable.

carlonluca commented 4 years ago

I'm sorry, I have no precise idea of how to do this. Can you provide some more details please?

paxswill commented 4 years ago

I was also running into this (building on Windows with WSL2), and was able to work around it by lowering the MTU as mentioned here. Setting mine to 1400 worked.

carlonluca commented 4 years ago

I was also running into this (building on Windows with WSL2), and was able to work around it by lowering the MTU as mentioned here. Setting mine to 1400 worked.

Thanks, but does not seem to be working in my case.

crazy-max commented 3 years ago

Closing for housekeeping. Cannot reproduce but let us know if it still happens.

lastzero commented 2 years ago

Yes, not a single multi-arch build succeeded in the past 18 hours. TLS handshake errors in the ARMv7 image, always when downloading specific Go dependencies from GitHub. Running Ubuntu Server 20.04 with the latest updates.

Client: Docker Engine - Community
 Version:           20.10.11
 API version:       1.41
 Go version:        go1.16.9
 Git commit:        dea9396
 Built:             Thu Nov 18 00:37:06 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.11
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.9
  Git commit:       847da18
  Built:            Thu Nov 18 00:35:15 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.12
  GitCommit:        7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc:
  Version:          1.0.2
  GitCommit:        v1.0.2-0-g52b36a2
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Errors:

internal/server/webdav.go:16:2: golang.org/x/net@v0.0.0-20211208012354-db4efeb81f4b: 
  Get "https://goproxy/golang.org/x/net/@v/v0.0.0-20211208012354-db4efeb81f4b.zip": 
  net/http: TLS handshake timeout
pkg/clusters/kmeans.go:9:2: gonum.org/v1/gonum@v0.9.3: 
  Get "https://goproxy/gonum.org/v1/gonum/@v/v0.9.3.zip": 
  net/http: TLS handshake timeout
internal/entity/photo_location.go:14:2: gopkg.in/photoprism/go-tz.v2@v2.1.1: 
  Get "https://goproxy/gopkg.in/photoprism/go-tz.v2/@v/v2.1.1.zip": 
  net/http: TLS handshake timeout 

Reducing the MTU of the bridge network to 1300 didn't help either:

[
    {
        "Name": "bridge",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "172.17.0.0/16",
                    "Gateway": "172.17.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {},
        "Options": {
            "com.docker.network.bridge.default_bridge": "true",
            "com.docker.network.bridge.enable_icc": "true",
            "com.docker.network.bridge.enable_ip_masquerade": "true",
            "com.docker.network.bridge.host_binding_ipv4": "0.0.0.0",
            "com.docker.network.bridge.name": "docker0",
            "com.docker.network.driver.mtu": "1300"
        },
        "Labels": {}
    }
]
davidnewhall commented 2 years ago

Happens every day. Only affects armv7. Using Docker Desktop v4.12.0 on macOS 12.4. I just clicked the button to "delete everything" (not factory reset), then triggered my build (that works sometimes), and got the error below. Removing armv7 form the list of multi-arch images removes the error. amd64 and arm64 build fine. I can also reproduce this by setting DOCKER_HOST to a remote linux machine running docker. It produces the same errors. The TLS error often appears on different random packages it's trying to download, but always during go mod download.

 => CANCELED [linux/arm64 builder 7/7] RUN CGO_ENABLED=1 make clean notifiarr.arm64.linux                                                                                                                                                                                                                                                      110.9s
------
 > [linux/arm/v7 builder 4/7] RUN go mod download:
#0 315.6 go: github.com/saracen/go7z-fixtures@v0.0.0-20190623165746-aa6b8fba1d2f: Get "https://storage.googleapis.com/proxy-golang-org-prod/56d40d1d0fafdd3b-github.com:saracen:go7z-fixtures-v0.0.0-20190623165746-aa6b8fba1d2f.zip?Expires=1663509975&GoogleAccessId=gcs-urlsigner-prod%40golang-modproxy.iam.gserviceaccount.com&Signature=bP0vrP3Ch%2Fz1K8Ht%2BN62G6qsJwlAhjU0JqjTfIUfNr2Wq83EbLjW%2FYuJ4y6YDARMCosP5mXyoS3TEEcsQRFkUHfk9VVEff6yMH2UHwoUs1CIDRQQWox1PWvTCckAwWS0o7ZqTF8YdbFMy06bm%2F6Lha9%2Baak%2FJYq8560ucAVEH2%2Fj5saeVjfdRNFqnZnpQ8Ze4zIYf2nQqfOhfqg68CHGrqCsstzIM%2FF4nwGbUIJCD4YCxMhqMWEoMJoM1FfJNgRqmccMqp1AL0WXfNUTVjhHQcZoBO0anF4ZMa591Y7IAzBq9L9Xd%2BhEt4QUhTSa%2Fu3h4R0aheUXWH2S5sYlR5M%2Bmw%3D%3D": net/http: TLS handshake timeout
------
Dockerfile:12
--------------------
  10 |     WORKDIR /src
  11 |     COPY go.mod go.sum .
  12 | >>> RUN go mod download
  13 |     COPY . .
  14 |
--------------------
ERROR: failed to solve: process "/bin/sh -c go mod download" did not complete successfully: exit code: 1
~/go/src/github.com/Notifiarr/notifiarr/init/docker
slayer commented 1 year ago

the same issue on arm64, setting mtu to 1300 does not help

❯ docker buildx build .
...
ERROR: failed to solve: nginx:1-alpine: failed to do request: Head "https://registry-1.docker.io/v2/library/nginx/manifests/1-alpine": dial tcp 34.194.164.123:443: i/o timeout
❯ docker ps
CONTAINER ID   IMAGE                           COMMAND                  CREATED         STATUS         PORTS                       NAMES
242d4c082262   moby/buildkit:buildx-stable-1   "buildkitd"              2 minutes ago   Up 2 minutes                               buildx_buildkit_practical_wozniak0
❯ docker exec -it 24 sh
/ # ifconfig
eth0      Link encap:Ethernet  HWaddr 02:42:AC:11:00:02
          inet addr:172.17.0.2  Bcast:172.17.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1300  Metric:1
          RX packets:18 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1502 (1.4 KiB)  TX bytes:266 (266.0 B)
CreeksideAB commented 1 year ago

Any solution to this?

jobcespedes commented 1 year ago

Same error message here

expertonium commented 1 year ago

Hit this issue hard. Only fix was to emulate a technique seen in other go projects, which is to build the go binary external to the docker image, using GOOS to target the arch. Wasn't too excited about factoring the installations out of the Dockerfile, but didn't seem to be anything else to be done. Certainly not if this thread is any indication.

# linux go bin expected
GOOS=linux go build
# multi-arch build, tag, and push
DOCKER_BUILDKIT=1 docker buildx build --push --platform linux/arm/v7,linux/arm64/v8,linux/amd64 --file ./Dockerfile --tag path.dev/to-the-repo/name-of-image:latest .

In this example, the go bin being built from amazing-cli.go. The Dockerfile is like this:

FROM golang:1.21-alpine AS builder

WORKDIR /

COPY . .

FROM scratch

COPY --from=builder /amazing-cli /go/bin/amazing-cli

ENV PATH "/go/bin:$PATH"

CMD ["amazing-cli", "help" ]
aine-etke commented 11 months ago

Please reopen the issue, the problem continues to occur. Happens when building buscarron and pushing it behind a VPN to backblaze b2.

Kovah commented 9 months ago

I can confirm that this still happens. Running on Docker for Mac v4.27 with Docker v25.0.1, build 29cf629. Pushing to a local server in the same network produces constant i/o timeout errors:

ERROR: failed to solve: failed to push my-local-server/repo/image: 
failed to do request: Head "https://my-local-server/blobs/sha256:6b616eaebb387f5aa8190be582614ba110425dde41b8eda1afd89f70d52b1661": 
dial tcp 192.168.20.20:443: i/o timeout

Restarting the build process results at timeouts at different requests. It would be helpful to have a flag to increase the timeouts to check if it's some sort of lagging, or constant network issue.

danog commented 3 weeks ago

+1 over here, changing the MTU (which for some reason isn't the correct one advertised by the router via RA and DHCP) does not help at all.

danog commented 3 weeks ago

To reproduce, simply limit the MTU of your connection (i.e. at the router) to a value smaller than 1500 (actually very common on PPPoE connections that do not support jumbo frames, or on VPNs).

danog commented 2 weeks ago

For future reference, the "fix" was to manually edit the docker.json config file through the Docker desktop settings => Docker engine, adding an "mtu": yourMTU, entry, where your MTU is the correct MTU of your connection, if not known beforehand, can be determined by running the following command:

ping example.com -D -s $((1500 - 28))

Decrease 1500 gradually (1 by 1) until ping succeeds, then set the MTU to that value both in Docker and in the Mac OS system settings.

Unfortunately, both Mac OS and docker desktop completely ignore the MTU advertised by the router using DHCP and RAs.

However, this is also a Docker Desktop bug, because it should automatically configure the MTU based on the MTU of the current connection, configured in the settings.