docker / build-push-action

GitHub Action to build and push Docker images with Buildx
https://github.com/marketplace/actions/build-and-push-docker-images
Apache License 2.0
4.11k stars 527 forks source link

Transitive errors are not catched and retried, like HTTP 504 Gateway Timeout from a container registry #1028

Closed clemlesne closed 5 months ago

clemlesne commented 6 months ago

Contributing guidelines

I've found a bug, and:

Description

When a container registry hangs for a few seconds, the build fails every time. It often happens with mcr.microsoft.com, as an example.

Expected behaviour

When happened a transitive error, example HTTP 504 Gateway Timeout, the container pull/push should retry exponentially at least 3 or 5 times before sending an error.

Actual behaviour

When happened a transitive error, example HTTP 504 Gateway Timeout, the container pull/push fails.

Repository URL

mcr.microsoft.com/dotnet/aspnet

Workflow run URL

https://github.com/clemlesne/azure-pipelines-agent/actions/runs/7286419763/job/19855113917

YAML workflow

- name: Build & push container
  uses: docker/build-push-action@v5.1.0
  with:
    build-args: |
      XX_VERSION=${{ env.XX_VERSION }}
    cache-from: type=gha
    cache-to: type=gha
    context: src/docker
    file: src/docker/Dockerfile-${{ matrix.os }}
    labels: ${{ steps.meta.outputs.labels }}
    platforms: ${{ matrix.arch }}
    provenance: true
    outputs: type=registry,oci-mediatypes=true,compression=estargz,compression-level=9,force-compression=true
    sbom: true
    tags: ${{ steps.meta.outputs.tags }}

Workflow logs

full.log

BuildKit logs

No response

Additional info

No response

crazy-max commented 6 months ago
--------------------
  18 |     # - Azure CLI system requirements (Python 3.8, plus C/Rust build tools for libs non pre-built on this platform)
  19 | >>> RUN rm -f /etc/apt/apt.conf.d/docker-clean \
  20 | >>>     && echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache
  21 |     ARG JQ_VERSION
--------------------
ERROR: failed to solve: failed to compute cache key: failed to copy: httpReadSeeker: failed open: unexpected status code https://mcr.microsoft.com/v2/dotnet/aspnet/blobs/sha256:77497f1931c0c890a15dde3886db7cee5576eccf75528050937eaa3d726f1f68: 504 Gateway Timeout

We can't do much about it, there might be some issues with Microsoft Artifact Registry, suggest to contact them.

clemlesne commented 6 months ago

I do confirm they are transitive (non persistent) issues with Microsoft Container Registry.

The point is, temporary failures can happen with a lot of services, and this action should at least retry a few times before exit the task with an error code. There is no need to restart manually the whole GitHub Actions if the issue is gone 5 secs later.

crazy-max commented 6 months ago

There is already a retry logic for pull/copy operations.

Can you update the setup buildx action step to enable debug: https://github.com/clemlesne/azure-pipelines-agent/actions/runs/7286419763/workflow#L309-L314

      - name: Setup Docker Buildx
        uses: docker/setup-buildx-action@v3.0.0
        with:
          version: v${{ env.BUILDX_VERSION }}
          buildkitd-flags: --debug
          driver-opts: |
            image=moby/buildkit:v${{ env.BUILDKIT_VERSION }}

More info https://docs.docker.com/build/ci/github-actions/configure-builder/#buildkit-container-logs