docker / build-push-action

GitHub Action to build and push Docker images with Buildx
https://github.com/marketplace/actions/build-and-push-docker-images
Apache License 2.0
4.13k stars 532 forks source link

Multiplatform build slows drastically after the first platform #982

Closed K20shores closed 4 months ago

K20shores commented 9 months ago

Contributing guidelines

I've found a bug, and:

Description

Creating a multiplatform build results in a build time that is very long or one that doesn't finish. I have two examples

In the micm project

  1. Building for one platform succeeds in 7 minutes
  2. Building for two platforms takes over an hour, so I canceled it
  3. Here's another I won't cancel. I expect it to time out

In another project when I tried this a few months ago, the build timed out after six hours for multiple platforms.

Expected behaviour

The build doesn't time out for more than one platform.

Actual behaviour

The build does time out for more than one platform.

Repository URL

https://github.com/NCAR/micm

Workflow run URL

https://github.com/NCAR/micm/actions/runs/6433827511

YAML workflow

name: Create and publish a Docker image

on:
  push:
    branches: ['release', '292-add-a-docker-image-publish']
    tags:
      - '*'

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  build-and-push-image:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write

    steps:
      - name: Checkout repository
        uses: actions/checkout@v3
        with:
          submodules: recursive

      - name: Login to Container Registry
        uses: docker/login-action@v2
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract metadata (tags, labels) for Docker
        id: meta
        uses: docker/metadata-action@98669ae865ea3cffbcbaa878cf57c20bbf1c6c38
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2

      - name: Build and push Docker image
        uses: docker/build-push-action@v5
        with:
          context: .
          platforms: linux/amd64,linux/arm64
          file: docker/Dockerfile.publish
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}

Workflow logs

No response

BuildKit logs

No response

Additional info

No response

crazy-max commented 9 months ago

Same as https://github.com/docker/build-push-action/issues/977#issuecomment-1752100260 but looking at your Dockerfile: https://github.com/NCAR/micm/blob/292-add-a-docker-image-publish/docker/Dockerfile.publish

FROM fedora:37

RUN dnf -y update \
    && dnf -y install \
        cmake \
        gcc-c++ \
        gdb \
        git \
        make \
        zlib-devel \
        llvm-devel \
    && dnf clean all

# copy the MICM code
COPY . /micm/

# build the library and run the tests
RUN mkdir /build \
      && cd /build \
      && cmake \
        -D CMAKE_BUILD_TYPE=release \
        -D ENABLE_LLVM:BOOL=TRUE \
        -D ENABLE_JSON:BOOL=TRUE \
        ../micm \
      && make install -j 8

WORKDIR /build

You might be able to use cross-compilation with https://github.com/tonistiigi/xx/.

See for example https://github.com/crazy-max/docker-msmtpd/blob/21e387c379fe37fdb0249aaa42f95bf3fbc824fc/Dockerfile#L15-L27 or https://github.com/crazy-max/docker-7zip/blob/8f719b2ce3074818119cc19f2b16de5177bf0ad3/Dockerfile#L16-L27 or https://github.com/crazy-max/docker-qbittorrent/blob/2c6eaead6eb3dad5256ed54a097bbf0a87d28c71/Dockerfile#L26-L36

K20shores commented 9 months ago

@crazy-max thanks for this. I'll look into this in the near future.

polarathene commented 9 months ago

FWIW, you could leverage caching with RUN --mount=type=cache if the cmake build is time consuming. This needs an additional action to export/import the cache mount as it's separate from the build layer cache that docker/build-push-action manages (see this advice).

If your builds aren't running in Github Actions runners (eg: remote build), then you may also hit a problem that affects Docker / containerd releases with LimitNOFILE in docker.service and containerd.service systemd configs. I mention this especially because you're using DNF, and that is known to be excessively slow on affected environments.

Looking at your 7 min CI run, the dnf RUN was 1 minute and the cmake 4.5 minutes. The Github runner isn't affected by the limits issue and I don't think QEMU emulation affects that concern any differently.

You should expect platforms relying on QEMU in the CI to be quite slow (hence importance of leveraging caching, especially for cmake if you can't cross-compile).

K20shores commented 9 months ago

@polarathene thanks! I'm giving that a try. The action

K20shores commented 9 months ago

I suppose I did it wrong. It seems that no cache was used...