docker / build-push-action

GitHub Action to build and push Docker images with Buildx
https://github.com/marketplace/actions/build-and-push-docker-images
Apache License 2.0
4.1k stars 525 forks source link

Build summary generation failing #1143

Closed leocencetti closed 1 week ago

leocencetti commented 1 week ago

Contributing guidelines

I've found a bug, and:

Description

The generation of the build summary in the post-build job (added by https://github.com/docker/build-push-action/releases/tag/v6.0.0) fails

Expected behaviour

Generation should succeed

Actual behaviour

The post-build job fails unexpectedly with the following error:

error: Unavailable: connection error: desc = "error reading server preface: http2: frame too large"

The error can be reproduced when rerunning the workflow

Repository URL

No response

Workflow run URL

No response

YAML workflow

name: Build toolchain

on:
  workflow_call:
    inputs:
      push:
        description: Push image to registry
        default: false
        type: boolean
      tag:
        description: Optional tag
        type: string

env:
  REGISTRY: ghcr.io

defaults:
  run:
    shell: bash

permissions:
  contents: read
  packages: write

jobs:
  build-toolchain:
    name: Build rootfs toolchain
    runs-on:
      - self-hosted
      - linux
      - ARM64
    container:
      image: ghcr.io/leocencetti/docker:latest
      options: --privileged
      credentials:
        username: ${{ github.actor }}
        password: ${{ secrets.GITHUB_TOKEN }}
      volumes:
        - /var/lib/docker:/var/lib/docker
        - /var/cache/github-runner:/tmp/cache/

    steps:
      - name: Check out the repo
        uses: actions/checkout@v4.1.6
        with:
          submodules: recursive
          token: ${{ secrets.ACCESS_TOKEN }}

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Login to Docker Hub
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Generate image tags
        id: image_meta
        uses: docker/metadata-action@v5.5.0
        with:
          images: my-image
          tags: |
            type=raw,value=rid-${{ github.run_id }}
            type=raw,value=${{ inputs.tag }},enable=${{ inputs.tag != '' }}
            type=raw,value=latest,enable=${{ github.event_name == 'release' }}

      - name: Build and push image
        uses: docker/build-push-action@v6.0.0
        with:
          push: ${{ inputs.push }}
          build-args: |
            GIT_REF=${{ github.ref }}
            GIT_SHA=${{ github.sha }}
          tags: ${{ steps.image_meta.outputs.tags }}
          labels: ${{ steps.image_meta.outputs.labels }}
          secrets: |
            image_password=${{ secrets.IMAGE_PASSWORD }}
          context: .
          file: Dockerfile
          target: payload
          cache-from: |
            type=local,src=/tmp/cache/.buildx-cache
            type=local,src=/tmp/cache/.buildx-cache-new
          cache-to: type=local,dest=/tmp/cache/.buildx-cache-new,mode=max
          load: ${{ !inputs.push }}

Workflow logs

Post job cleanup.
/usr/bin/docker exec  eef2ffbc1bcffc4394ac503367b038b661a13cfadddf8db48cd4c2b72d8ec728 sh -c "cat /etc/*release | grep ^ID"
Generating build summary
  exporting build record to /__w/_temp/docker-actions-toolkit-u0hgvJ/export
  /usr/bin/mkfifo /__w/_temp/docker-actions-toolkit-u0hgvJ/buildx-in-GfD6vA.fifo
  /usr/bin/mkfifo /__w/_temp/docker-actions-toolkit-u0hgvJ/buildx-out-55Fw6E.fifo
  docker buildx --builder builder-39f22689-5718-4eaf-b370-5e4d83eddf10 dial-stdio
  docker run --rm -i -v /github/home/.docker/buildx/refs:/buildx-refs -v /__w/_temp/docker-actions-toolkit-u0hgvJ/export:/out docker.io/dockereng/export-build:latest --ref-state-dir=/buildx-refs --node=builder-39f22689-5718-4eaf-b370-5e4d83eddf10/builder-39f22689-5718-4eaf-b370-5e4d83eddf100 --ref=e5749kkypteaysamhknl3lfgs --uid=0 --gid=0
  Unable to find image 'dockereng/export-build:latest' locally
  latest: Pulling from dockereng/export-build
  170e3bcedcd0: Pulling fs layer
  5b2524eeb8ff: Pulling fs layer
  5b2524eeb8ff: Download complete
  170e3bcedcd0: Verifying Checksum
  170e3bcedcd0: Download complete
  170e3bcedcd0: Pull complete
  5b2524eeb8ff: Pull complete
  Digest: sha256:3dfedea3148487c108965dede834f22e81528fc5b2f3989e4b8ecec2f8fe10ae
  Status: Downloaded newer image for dockereng/export-build:latest
  2024/06/19 09:21:22 error: Unavailable: connection error: desc = "error reading server preface: http2: frame too large"
  github.com/moby/buildkit/util/stack.Enable
    /go/pkg/mod/github.com/moby/buildkit@v0.13.1/util/stack/stack.go:77
  github.com/moby/buildkit/util/grpcerrors.FromGRPC
    /go/pkg/mod/github.com/moby/buildkit@v0.13.1/util/grpcerrors/grpcerrors.go:198
  github.com/moby/buildkit/util/grpcerrors.UnaryClientInterceptor
    /go/pkg/mod/github.com/moby/buildkit@v0.13.1/util/grpcerrors/intercept.go:41
  google.golang.org/grpc.(*ClientConn).Invoke
    /go/pkg/mod/google.golang.org/grpc@v1.59.0/call.go:35
  github.com/moby/buildkit/api/services/control.(*controlClient).ListWorkers
    /go/pkg/mod/github.com/moby/buildkit@v0.13.1/api/services/control/control.pb.go:2306
  github.com/moby/buildkit/client.(*Client).ListWorkers
    /go/pkg/mod/github.com/moby/buildkit@v0.13.1/client/workers.go:31
  main.run
    /src/main.go:103
  main.main
    /src/main.go:80
  runtime.main
    /usr/local/go/src/runtime/proc.go:267
  runtime.goexit
    /usr/local/go/src/runtime/asm_arm64.s:1197
  failed to list workers
  github.com/moby/buildkit/client.(*Client).ListWorkers
    /go/pkg/mod/github.com/moby/buildkit@v0.13.1/client/workers.go:33
  main.run
    /src/main.go:103
  main.main
    /src/main.go:80
  runtime.main
    /usr/local/go/src/runtime/proc.go:267
  runtime.goexit
    /usr/local/go/src/runtime/asm_arm64.s:1197
  failed to list workers
  main.run
    /src/main.go:105
  main.main
    /src/main.go:80
  runtime.main
    /usr/local/go/src/runtime/proc.go:267
  runtime.goexit
    /usr/local/go/src/runtime/asm_arm64.s:1197
  Warning: Process "docker run" exited with code 1
Removing temp folder /__w/_temp/docker-actions-toolkit-ifKwVX
Post cache
  State not set

BuildKit logs

No response

Additional info

No response

crazy-max commented 1 week ago

Thanks for reporting, looking at your workflow:

    runs-on:
      - self-hosted
      - linux
      - ARM64
    container:
      image: ghcr.io/leocencetti/docker:latest
      options: --privileged
      credentials:
        username: ${{ github.actor }}
        password: ${{ secrets.GITHUB_TOKEN }}
      volumes:
        - /var/lib/docker:/var/lib/docker
        - /var/cache/github-runner:/tmp/cache/

This does not look like a common setup :sweat_smile:

What is the ghcr.io/leocencetti/docker:latest image? Seems to be a private package, would you mind sharing it if possible?

Also not sure what kind of runner you're using looking at self-hosted, linux, ARM64 but seems like these are self-hosted runners. Can you share the full workflow logs to help use figure out what's going on? And also enable debug for BuildKit to have containers logs: https://docs.docker.com/build/ci/github-actions/configure-builder/#buildkit-container-logs.

leocencetti commented 1 week ago

What is the ghcr.io/leocencetti/docker:latest image? Seems to be a private package, would you mind sharing it if possible?

This is roughly equivalent to this dockerfile, I am just using ubuntu:22.04 as the base image instead of alpine (with the required package manager adaptations).

Also not sure what kind of runner you're using looking at self-hosted, linux, ARM64 but seems like these are self-hosted runners

Yes, I am using a docker-in-docker (DIND) workflow on a self-hosted ARM64 runner (NVIDIA).

Can you share the full workflow logs to help use figure out what's going on?

Yes. I've collected the logs from relevant jobs in the workflow. I have omitted the docker build logs as they contain private info (and are probably unrelated). logs.zip

On a side note, the image I am building is on the larger side (some GB), and the full workflow logs are quite verbose (5k+ lines). I noticed that the logs are fetched by the action to produce the summary, so I am wondering if their size could be the issue. I don't seem to have this problem when building smaller (and less verbose) images using the same setup.

crazy-max commented 1 week ago

Yes. I've collected the logs from relevant jobs in the workflow. I have omitted the docker build logs as they contain private info (and are probably unrelated). logs.zip

Thanks! Looking at the logs it seems you're using an old version of buildx:

2024-06-19T11:49:15.0952010Z [command]/usr/local/bin/docker buildx version
2024-06-19T11:49:15.1712504Z github.com/docker/buildx v0.11.2 9872040b6626fb7d87ef7296fd5b832e8cc2ad17

That doesn't support dial-stdio command introduced in Buildx 0.13.0: https://github.com/docker/buildx/releases/tag/v0.13.0

Can you make this change in your workflow to use latest stable and see if it fixes the issue on your side?:

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
        with:
          version: latest
          buildkitd-flags: --debug

I will try to repro on my side with older version.

Edit: Was able to repro:

image

crazy-max commented 1 week ago

Opened https://github.com/docker/build-push-action/pull/1145 to mitigate the issue. You can test with:

      - name: Build and push image
        uses: crazy-max/docker-build-push-action@summary-check
leocencetti commented 1 week ago

@crazy-max I tried this morning to run the CI workflow with the latest buildx (v0.15.1) and I still get a failure (not the same one though):

docker buildx --builder builder-3e2fdd69-2ba2-4478-b367-2501d8cef169 dial-stdio
  docker run --rm -i -v /github/home/.docker/buildx/refs:/buildx-refs -v /__w/_temp/docker-actions-toolkit-0j650g/export:/out docker.io/dockereng/export-build:latest --ref-state-dir=/buildx-refs --node=builder-3e2fdd69-2ba2-4478-b367-2501d8cef169/builder-3e2fdd69-2ba2-4478-b367-2501d8cef1690 --ref=b5ap7n4x5arkdbv617hvgbprg --uid=0 --gid=0
  2024/06/20 06:42:45 failed to fill local state: failed to stat local ref directory /buildx-refs/builder-3e2fdd69-2ba2-4478-b367-2501d8cef169/builder-3e2fdd69-2ba2-4478-b367-2501d8cef1690: stat /buildx-refs/builder-3e2fdd69-2ba2-4478-b367-2501d8cef169/builder-3e2fdd69-2ba2-4478-b367-2501d8cef1690/: no such file or directory
  Warning: Failed to export build record: /__w/_temp/docker-actions-toolkit-0j650g/export/rec.dockerbuild not found

Note, I am not using your latest fix yet, although I doubt it will help here (buildx version is fine)...

Full logs: logs.zip

crazy-max commented 1 week ago

and I still get a failure (not the same one though):

docker buildx --builder builder-3e2fdd69-2ba2-4478-b367-2501d8cef169 dial-stdio
  docker run --rm -i -v /github/home/.docker/buildx/refs:/buildx-refs -v /__w/_temp/docker-actions-toolkit-0j650g/export:/out docker.io/dockereng/export-build:latest --ref-state-dir=/buildx-refs --node=builder-3e2fdd69-2ba2-4478-b367-2501d8cef169/builder-3e2fdd69-2ba2-4478-b367-2501d8cef1690 --ref=b5ap7n4x5arkdbv617hvgbprg --uid=0 --gid=0
  2024/06/20 06:42:45 failed to fill local state: failed to stat local ref directory /buildx-refs/builder-3e2fdd69-2ba2-4478-b367-2501d8cef169/builder-3e2fdd69-2ba2-4478-b367-2501d8cef1690: stat /buildx-refs/builder-3e2fdd69-2ba2-4478-b367-2501d8cef169/builder-3e2fdd69-2ba2-4478-b367-2501d8cef1690/: no such file or directory
  Warning: Failed to export build record: /__w/_temp/docker-actions-toolkit-0j650g/export/rec.dockerbuild not found

Temp folder /__w/_temp looks odd compared to what we have on GitHub public runners /home/runner/work/_temp: https://github.com/docker/build-push-action/actions/runs/9585155679/job/26430413150#step:6:7 but don't think that's the issue. I wonder if volumes mount are just broken with your current setup when using your DinD image. Maybe we should rely on docker cp instead of volumes :thinking:. I also see that the local ref cannot be found with /github/home/.docker/buildx/refs:/buildx-refs.

Can you add these extra steps after - name: Build and push image and give the logs?:

      - name: Check docker config
        run: |
          tree -punahig /github/home/.docker

      - name: Dump context
        uses: crazy-max/ghaction-dump-context@v2