Closed clarkohw closed 5 months ago
+1 . We also run both GH and self hosted runners with many parallel workflow runs that use this action. A cache issue? Pinning to an older version that's been stable for us has patched the issue for us:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
with:
platforms: linux/amd64
version: v0.11.2
buildkitd-flags: --debug
driver-opts: image=moby/buildkit:v0.11.2
cache-binary: false
We also run both GH and self hosted runners with many parallel workflow runs that use this action. A cache issue? Pinning to an older version that's been stable for us has patched the issue for us:
Let us know if this shows up in older version as well. There is nothing atm pointing to issue with our release and parallel workflow runs are out of our control as well.
I have 100 clean runs in a row in https://github.com/tonistiigi/gh-exec-format-error-debug/actions/runs/8606687477 based on another report. If you can point me any differences what should be tried instead to reproduce this then lmk.
For those experiencing this issue, I think @tonistiigi may have used the new updated runner build released yesterday, see https://github.com/actions/runner-images/releases.
Try again with the latest docker/setup-buildx-action@v3 version and see if you are still having the unexpected behaviour.
We also run both GH and self hosted runners with many parallel workflow runs that use this action. A cache issue? Pinning to an older version that's been stable for us has patched the issue for us:
Let us know if this shows up in older version as well. There is nothing atm pointing to issue with our release and parallel workflow runs are out of our control as well.
I have 100 clean runs in a row in https://github.com/tonistiigi/gh-exec-format-error-debug/actions/runs/8606687477 based on another report. If you can point me any differences what should be tried instead to reproduce this then lmk.
Last Friday, the error started showing up very pronounced in our CI Merge queue. It was causing almost all merge queue runs to be booted by the end of the day. Reading up on this error log message "Error: The process '/usr/bin/docker'" and other messages about default network, seemed to point to an issue of matching versions of buildkit with buildx. I tried running action with just the --cache-binary=false to test it with the latest packages, hoping it was a cache issue but the error still showed up.
@gete76 And you still see the issue?
@tonistiigi , I haven't tested that default setting today because I don't want to disrupt our CI. SLOs and what not. I'll have to find a way to test this without disruption.
@tonistiigi , I can tell you it did show up yesterday morning around 11AM EST, when I tested the latest (default) packages with no caching.
For those experiencing this issue, I think @tonistiigi may have used the new updated runner build released yesterday, see https://github.com/actions/runner-images/releases.
Try again with the latest docker/setup-buildx-action@v3 version and see if you are still having the unexpected behaviour.
Thanks, I'll give this a try. It does appear that this is only happening on our GH hosted runners. Our internal ones build off of the summerwind action-runner image.
Atm this looks like a Github side issue related to 20240403.1.0 runner release that now looks to be deleted https://github.com/actions/runner-images/blob/ubuntu20/20240403.1/images/ubuntu/Ubuntu2004-Readme.md (404).
This is related issue https://github.com/actions/runner-images/issues/9632 and comment about release being broken https://github.com/actions/runner-images/issues/9654#issuecomment-2042746391
Confirmed, this new runner image has resolved the issue.
Contributing guidelines
I've found a bug, and:
Description
The docker/setup-buildx-action@v3 sporadically fails on the booting builder step. The sporadic nature of the issue seems similar to https://github.com/docker/setup-buildx-action/issues/283, but i am not using self hosted runners and getting different error messages.
Expected behaviour
The action should install buildx.
Actual behaviour
Occasionally, maybe 10% of the time, the Booting builder step of the action fails.
Repository URL
No response
Workflow run URL
No response
YAML workflow
Workflow logs
but i also recently go this error message:
BuildKit logs
No response
Additional info
One potentially relevant factor is that we run many workflows at the same time (>20) at some times so I was thinking it could be related to that?