docker / buildx

Docker CLI plugin for extended build capabilities with BuildKit
Apache License 2.0
3.59k stars 483 forks source link

Pushing large image fails with Gateway errors #1315

Open lehrig opened 2 years ago

lehrig commented 2 years ago

In my multi-arch build of a larger image, I get errors when pushing to the image registry. For quay.io, I get a 502 Bad Gateway error and for docker.io I get a 504 Gateway Time-out error.

How can I resolve those?

Steps to reproduce

Environment configuration:

git clone https://github.com/lehrig/kubeflow-ppc64le-notebook-images
cd kubeflow-ppc64le-notebook-images

export ELYRA_VERSION=3.11.1
export PYTHON_VERSION=3.8
export TENSORFLOW_VERSION=2.8.1
export SUPPORT_GPU=true
export MINOR_RELEASE=0

export IMAGE=quay.io/ibm/kubeflow-notebook-image-ppc64le
export TAG=elyra${ELYRA_VERSION}-py${PYTHON_VERSION}-tf${TENSORFLOW_VERSION}-v${MINOR_RELEASE}
export TARGET=$IMAGE:$TAG

For building into the cache:

docker build --build-arg NB_GID=0 --build-arg ELYRA_VERSION=$ELYRA_VERSION --build-arg PYTHON_VERSION=$PYTHON_VERSION --build-arg TENSORFLOW_VERSION=$TENSORFLOW_VERSION --build-arg SUPPORT_GPU=$SUPPORT_GPU -t $TARGET -f Dockerfile --platform linux/amd64,linux/ppc64le --cache-to=type=local,dest=cache,mode=max .

For pushing to quay.io:

docker build --build-arg NB_GID=0 --build-arg ELYRA_VERSION=$ELYRA_VERSION --build-arg PYTHON_VERSION=$PYTHON_VERSION --build-arg TENSORFLOW_VERSION=$TENSORFLOW_VERSION --build-arg SUPPORT_GPU=$SUPPORT_GPU -t $TARGET -f Dockerfile --platform linux/amd64,linux/ppc64le --push --cache-from=type=local,src=cache .  

For pushing to docker.io:

docker build --build-arg NB_GID=0 --build-arg ELYRA_VERSION=$ELYRA_VERSION --build-arg PYTHON_VERSION=$PYTHON_VERSION --build-arg TENSORFLOW_VERSION=$TENSORFLOW_VERSION --build-arg SUPPORT_GPU=$SUPPORT_GPU -t docker.io/lehrig/kubeflow-notebook-image-ppc64le:elyra3.11.1-py3.8-tf2.8.1-v0 -f Dockerfile --platform linux/amd64,linux/ppc64le --push --cache-from=type=local,src=cache .

Results

Case 1: pushing to quay.io

 => [linux/ppc64le 11/11] WORKDIR /home/jovyan                                                                                                                                                                                                                          75.6s
 => => sha256:dced7e77e0c296412c7e8ccd6e64470b70c070b2a1d894e198369b9c575ff75b 3.82GB / 3.82GB                                                                                                                                                                          75.3s
 => exporting to image                                                                                                                                                                                                                                                 143.5s
 => => # error: failed to copy: unexpected status: 502 Bad Gateway                                                                                                                                                                                                           
 => => # retrying in 1s 

Case 2: pushing to docker.io

 => [linux/ppc64le 11/11] WORKDIR /home/jovyan                                                                                                                                                                                                                          65.4s
 => => sha256:dced7e77e0c296412c7e8ccd6e64470b70c070b2a1d894e198369b9c575ff75b 3.82GB / 3.82GB    
 => exporting to image                                                                                                                                                                                                                                                  78.1s
 => => # error: failed to copy: unexpected status: 504 Gateway Time-out                                                                                                                                                                                                      
 => => # retrying in 1s

Environment

lehrig commented 2 years ago

I've tested the same on a machine with a better upstream connection - this actually worked.

Still, the question remains how to go about environments with a bad upstream connection. Is there any way to increase some kind of timeout?