How to use matrix for multi-platform builds?

felipecrs commented 1 year ago

I know I can use the same runner to build all the platforms at the same time, but this causes my builds to take 2 hours instead of 20 minutes if I split to different runners.

I was able to achieve something similar with:

name: ci

on:
  push:
    branches:
      - "main"
  pull_request:
    branches:
      - "main"

jobs:
  build:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        platform:
          - linux/amd64
          - linux/386
          - linux/arm/v6
          - linux/arm/v7
          - linux/arm64
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Set up QEMU
        uses: docker/setup-qemu-action@v2
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2

      - name: Set cache name
        id: cache-name
        run: |
          echo 'cache-name=asterisk-cache-${{ matrix.platform }}' | sed 's:/:-:g' >> $GITHUB_OUTPUT

      - name: Build and push
        uses: docker/build-push-action@v4
        with:
          context: asterisk
          platforms: ${{ matrix.platform }}
          tags: asterisk
          cache-from: type=gha
          cache-to: type=local,dest=/tmp/asterisk-cache,mode=max

      - name: Upload cache
        uses: actions/upload-artifact@v3
        with:
          name: asterisk-cache-${{ steps.cache-name.outputs.cache-name }}
          path: /tmp/asterisk-cache
          if-no-files-found: error
          retention-days: 1

  push:
    runs-on: ubuntu-latest
    needs: build
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Set up QEMU
        uses: docker/setup-qemu-action@v2
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2

      - name: Download cache
        uses: actions/download-artifact@v3
        with:
          path: /tmp/asterisk-cache

      - name: Get lowercase GitHub username
        id: repository_owner
        uses: ASzc/change-string-case-action@v5
        with:
          string: ${{ github.repository_owner }}

      - name: Docker meta
        id: meta
        uses: docker/metadata-action@v4
        with:
          images: |
            ghcr.io/${{ steps.repository_owner.outputs.lowercase }}/asterisk-hass-addon
          tags: |
            type=ref,event=branch
            type=ref,event=pr
            type=semver,pattern={{version}}

      - name: Login to DockerHub
        if: github.event_name == 'push' || github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository && github.actor != 'dependabot[bot]'
        uses: docker/login-action@v2
        with:
          registry: ghcr.io
          username: ${{ github.repository_owner }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Build and push
        uses: docker/build-push-action@v4
        with:
          context: asterisk
          platforms: |
            linux/amd64
            linux/386
            linux/arm/v6
            linux/arm/v7
            linux/arm64
          push: ${{ github.event_name == 'push' || github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository && github.actor != 'dependabot[bot]' }}
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=local,src=/tmp/.buildx-cache
          cache-to: type=gha,mode=max

The problem is that it takes 10 minutes to upload the cache and then more 5 minutes to download the cache again.

Is there any suggestion to circumvent this?

K-shir0 commented 1 year ago

@felipecrs

I was able to handle individual jobs by using the method posted here.

Reference: https://github.com/docker/build-push-action/issues/671#issuecomment-1373055782

felipecrs commented 1 year ago

That's very interesting!

I wonder if it would be possible to have buildx pushing different platforms without the need of being a single call.

Currently if I push a single --platform, it seems to override the previously pushed one.

crazy-max commented 1 year ago

https://github.com/docker/docs/pull/17180 has been merged. See https://docs.docker.com/build/ci/github-actions/multi-platform/#distribute-build-across-multiple-runners

felipecrs commented 1 year ago

@crazy-max that's really cool but... last time I tried to use upload-artifact for the job, it took 20-30 minutes only to upload and download it.

It was definitely a showstopper for me.

Do you believe it has improved? Maybe I should try it again.

crazy-max commented 1 year ago

last time I tried to use upload-artifact for the job, it took 20-30 minutes only to upload and download it.

You were uploading all cache export which can be quite expensive to compress > upload > download > decompress. In https://docs.docker.com/build/ci/github-actions/multi-platform/#distribute-build-across-multiple-runners we are just uploading the resulting image tarball.

If you want to use cache in the first job you should consider the gha one with a scope affected for each platform. Let me know if you need some help for this.

felipecrs commented 1 year ago

I will do some testing. Thanks a lot!

felipecrs commented 1 year ago

This is how I'm adding GHA caching on top of the provided example:

- name: Prepare
  run: |
    mkdir -p /tmp/images
    platform=${{ matrix.platform }}
    platform=${platform//\//-}
    echo "TARFILE=${platform}.tar" >> $GITHUB_ENV
    echo "TAG=${{ env.TMP_LOCAL_IMAGE }}:${platform}" >> $GITHUB_ENV
    echo "SCOPE=${{ env.GITHUB_REF_NAME }}-${platform}" >> $GITHUB_ENV
- name: Build
  uses: docker/build-push-action@v4
  with:
    context: .
    platforms: ${{ matrix.platform }}
    tags: ${{ env.TAG }}
    outputs: type=docker,dest=/tmp/images/${{ env.TARFILE }}
    cache-from: type=gha,scope=${{ env.SCOPE }}
    cache-to: type=gha,scope=${{ env.SCOPE }},mode=max

The test is running now.

However, I wonder how can I integrate the push phase of the example with docker/metadata-action. I suppose I can map the tags to -t flags in docker buildx imagetools create with a shell script, but I wonder what should I do about the labels.

sando38 commented 1 year ago

Hi, great to see that approach. I have been using this as well for a couple of months now. And can even adopt some of the commands to mine.

I did not find a solution, however, to annotate labels like "annotations": { "org.opencontainers.image.description": "DESCRIPTION" } to the resulting multi-arch image, yet. Is there a way how this could be achieved?

crazy-max commented 1 year ago

However, I wonder how can I integrate the push phase of the example with docker/metadata-action. I suppose I can map the tags to -t flags in docker buildx imagetools create with a shell script, but I wonder what should I do about the labels.

Following https://docs.docker.com/build/ci/github-actions/multi-platform/#distribute-build-across-multiple-runners, I wonder if we could instead reuse the build push action with a temp dockerfile.

Not tested but here is the idea:

  push:
    runs-on: ubuntu-latest
    needs:
      - build
    services:
      registry:
        image: registry:2
        ports:
          - 5000:5000
    steps:
      -
        name: Download images
        uses: actions/download-artifact@v3
        with:
          name: images
          path: /tmp/images
      -
        name: Load images
        run: |
          for image in /tmp/images/*.tar; do
            docker load -i $image
          done
      -
        name: Push images to local registry
        run: |
          docker push -a ${{ env.TMP_LOCAL_IMAGE }}
      -
        name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
        with:
          driver-opts: network=host
      -
        name: Login to Docker Hub
        uses: docker/login-action@v2
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      -
        name: Temp Dockerfile
        run: |
          mkdir -p /tmp/dkfilectx
          echo "FROM ${{ env.TMP_LOCAL_IMAGE }}" > /tmp/dkfilectx/Dockerfile
      -
        name: Docker meta
        id: meta
        uses: docker/metadata-action@v4
        with:
          images: ${{ env.REGISTRY_IMAGE }}
      -
        name: Push
        uses: docker/build-push-action@v4
        with:
          context: /tmp/dkfilectx
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
      -
        name: Inspect image
        run: |
          docker buildx imagetools inspect ${{ env.REGISTRY_IMAGE }}:${{ steps.meta.outputs.version }}

felipecrs commented 1 year ago

It's failing with:

https://github.com/TECH7Fox/asterisk-hass-addons/actions/runs/4949753398/jobs/8852483607?pr=251#step:11:157

Error: buildx failed with: ERROR: failed to solve: localhost:5000/asterisk-hass-addon: localhost:5000/asterisk-hass-addon:latest: not found

I think it's because the "Push" stage is missing the platforms. I'm trying with it now.

felipecrs commented 1 year ago

Oh no. That's not it. It's because we build the images with the platforms as tags, then later we try to access with :latest.

I think I know how to fix it. Trying now.

felipecrs commented 1 year ago

After several attempts this is where I stopped:

Dockerfile:2
--------------------
   1 |     ARG TARGETPLATFORM
   2 | >>> FROM localhost:5000/asterisk-hass-addon/${TARGETPLATFORM}
   3 |     
--------------------
ERROR: failed to solve: failed to parse stage name "localhost:5000/asterisk-hass-addon/": invalid reference format
Error: buildx failed with: ERROR: failed to solve: failed to parse stage name "localhost:5000/asterisk-hass-addon/": invalid reference format

It does not make sense, it looks like TARGETPLATFORM is not being injected as the buildx docs says so.

My PR is https://github.com/TECH7Fox/asterisk-hass-addons/pull/251 in case you want to have a look.

felipecrs commented 1 year ago

Anyway, this is a LOT of complication for such a simple task.

I wonder if it would be possible to have buildx pushing different platforms without the need of being a single call.

Currently if I push a single --platform, it seems to override the previously pushed one.

@crazy-max do you think it would be possible for buildx to support such a thing? I can open an issue there if you say so.

crazy-max commented 1 year ago

It does not make sense, it looks like TARGETPLATFORM is not being injected as the buildx docs says so.

Can you try with?:

  push:
    runs-on: ubuntu-latest
    needs:
      - build
    services:
      registry:
        image: registry:2
        ports:
          - 5000:5000
    steps:
      -
        name: Download images
        uses: actions/download-artifact@v3
        with:
          name: images
          path: /tmp/images
      -
        name: Load images
        run: |
          for image in /tmp/images/*.tar; do
            docker load -i $image
          done
      -
        name: Push images to local registry
        run: |
          docker push -a ${{ env.TMP_LOCAL_IMAGE }}
            -
        name: Create manifest list and push to local registry
        run: |
          docker buildx imagetools create -t ${{ env.TMP_LOCAL_IMAGE }}:latest \
            $(docker image ls --format '{{.Repository}}:{{.Tag}}' '${{ env.TMP_LOCAL_IMAGE }}' | tr '\n' ' ')
      -
        name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
        with:
          driver-opts: network=host
      -
        name: Login to Docker Hub
        uses: docker/login-action@v2
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      -
        name: Temp Dockerfile
        run: |
          mkdir -p /tmp/dkfilectx
          echo "FROM ${{ env.TMP_LOCAL_IMAGE }}:latest" > /tmp/dkfilectx/Dockerfile
      -
        name: Docker meta
        id: meta
        uses: docker/metadata-action@v4
        with:
          images: ${{ env.REGISTRY_IMAGE }}
      -
        name: Push
        uses: docker/build-push-action@v4
        with:
          context: /tmp/dkfilectx
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          platforms: linux/amd64,linux/arm/v6,linux/arm/v7,linux/arm64
          labels: ${{ steps.meta.outputs.labels }}
      -
        name: Inspect image
        run: |
          docker buildx imagetools inspect ${{ env.REGISTRY_IMAGE }}:${{ steps.meta.outputs.version }}

Anyway, this is a LOT of complication for such a simple task.

Yes as said in https://github.com/docker/docs/pull/17180#issuecomment-1539812294 we could provide a composite action to ease the integration in your workflow.

felipecrs commented 1 year ago

@crazy-max you have some references to LOCAL_IMAGE and TMP_LOCAL_IMAGE. Were they all supposed to be TMP_LOCAL_IMAGE like in the document (https://docs.docker.com/build/ci/github-actions/multi-platform/#distribute-build-across-multiple-runners)?

felipecrs commented 1 year ago

Never mind, I think the answer is no. I'm testing here.

crazy-max commented 1 year ago

crazy-max you have some references to LOCAL_IMAGE and TMP_LOCAL_IMAGE. Were they all supposed to be TMP_LOCAL_IMAGE like in the document (https://docs.docker.com/build/ci/github-actions/multi-platform/#distribute-build-across-multiple-runners)?

That's a typo my bad, should be TMP_LOCAL_IMAGE.

felipecrs commented 1 year ago

@crazy-max it worked! Thanks a lot!

Just for information:

Uploading the images takes 3+ minutes
Downloading them again takes one more minute
Loading them takes <2 minutes
Pushing to local registry ~3 minutes

1 and 2 can be shaved to less than 1 minute if we switch from upload-artifact to actions cache, here is one example:

https://github.com/TECH7Fox/asterisk-hass-addons/pull/236

3 and 4 maybe can save 1 minute in total by leveraging some parallelism with GNU parallel.

However, another approach that could potentially save time is to instead of using a local registry for the job, we could push the temporary images to GHCR itself with an unique tag and have a cleanup job that deletes the temporary images after.

But still, nothing would beat both the speed and simplicity of:

name: ci

on:
  push:
    branches:
      - "main"

jobs:
  build-and-push:
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        platform:
          - linux/amd64
          - linux/386
          - linux/arm/v6
          - linux/arm/v7
          - linux/arm64
    steps:
      -
        name: Checkout
        uses: actions/checkout@v3
      -
        name: Set up QEMU
        uses: docker/setup-qemu-action@v2
      -
        name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
      -
        name: Login to Docker Hub
        uses: docker/login-action@v2
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      -
        name: Build and push
        uses: docker/build-push-action@v4
        with:
          context: .
          platforms: ${{ matrix.platform }}
          push: true
          tags: user/app:latest

If buildx supported it.

felipecrs commented 1 year ago

That's a typo my bad, should be TMP_LOCAL_IMAGE.

Yeah, I realized. No worries!

crazy-max commented 1 year ago

However, another approach that could potentially save time is to instead of using a local registry for the job, we could push the temporary images to GHCR itself with an unique tag and have a cleanup job that deletes the temporary images after.

Made some changes to our example if you want to try: https://github.com/docker/docs/pull/17305 See https://deploy-preview-17305--docsdocker.netlify.app/build/ci/github-actions/multi-platform/#distribute-build-across-multiple-runners

felipecrs commented 1 year ago

I'll try for sure! I'll let you know soon.

felipecrs commented 1 year ago

I think it can be sharpened a little bit by moving the metadata-action to its own job, like in here:

https://github.com/TECH7Fox/asterisk-hass-addons/pull/252

Another thing, since I push images even for pull requests with the pr-<number> format, I wonder if it would not be less resource consuming if I use inline cache instead of GHA.

I just don't know how it would work with this push by digest stuff. For example, if I enable inline cache and push it by digest, how to I consume it back? Will they be retained when running the docker buildx imagetools create, meaning I could set cache-from as something like ${{ env.REGISTRY_IMAGE }}:pr-<number>?

felipecrs commented 1 year ago

@crazy-max the result is amazing. A full build (with cache) now takes less than a minute:

https://github.com/TECH7Fox/asterisk-hass-addons/actions/runs/4960678447/jobs/8876490603

Thanks a lot!

sando38 commented 1 year ago

@crazy-max the result is amazing. A full build (with cache) now takes less than a minute:

https://github.com/TECH7Fox/asterisk-hass-addons/actions/runs/4960678447/jobs/8876490603

Thanks a lot!

I agree, this approach is great. When I implemented it, it ensured, that the test suites running during the build phase are successfully. When building all images with one runner, they tend to fail due to timeouts.

@crazy-max I still have a problem with getting labels into the final manifest: https://github.com/sando38/eturnal/pkgs/container/eturnal/92866123?tag=edge

I configured the workflow pretty much like you have posted in the last link. My workflow file is here, the relevant part is from line 446 to the end. https://github.com/sando38/eturnal/blob/18a056930c7a44ec008186f5576a897e8bc63e9f/.github/workflows/container-build-publish.yml#L446

Not sure if I miss something. Thanks in advance already!

felipecrs commented 1 year ago

Not sure if I miss something.

You are missing metadata-action in your build job. Double-check the example, metadata-action is ran twice, both in build and then in push. In build, you also need to supply the labels input.

sando38 commented 1 year ago

Not sure if I miss something.

You are missing metadata-action in your build job. Double-check the example, metadata-action is ran twice, both in build and then in push. In build, you also need to supply the labels input.

Thanks for the quick reply. It is there:

Line 453
Line 543

felipecrs commented 1 year ago

It's missing the labels input.

sando38 commented 1 year ago

Oh, I thought they are detected automatically.. I will double check. Thanks for the hint.

sando38 commented 1 year ago

Thanks again, the single digests now have labels, however the "merged" manifest still not:

No description provided

https://github.com/sando38/eturnal/pkgs/container/eturnal/92872612?tag=edge

I included the labels. In the push job, I can also see that labels are included in the DOCKER_METADATA_OUTPUT_JSON https://github.com/sando38/eturnal/actions/runs/4961796747/jobs/8879204058#step:10:23 .. any further ideas :)

felipecrs commented 1 year ago

Hm... happened for me too:

https://github.com/TECH7Fox/asterisk-hass-addons/pkgs/container/asterisk-hass-addon/92864819?tag=main

docker / build-push-action

How to use matrix for multi-platform builds? #846