docker / buildx

Docker CLI plugin for extended build capabilities with BuildKit
Apache License 2.0
3.54k stars 481 forks source link

The cache export step hangs #537

Open Xplouder opened 3 years ago

Xplouder commented 3 years ago

Hi,

first of all, sorry if this is a double post, but since the other reports I found are a kite old and without recent activity, I decided/try to sum it all here:

Looks like the "preparing build cache for the export" step is hanging pointing to some kind of bug here: image In my last tries, it had more than 1h, no CPU usage, just stuck. Meaning that beside inline cache type, the others are unusable.

How to reproduce:

notes:

Samples

Here is the build commands that i used with just some redacted commands:

local type:

    - docker buildx build
      --cache-from=type=local,src=docker_cache/
      --cache-to=type=local,dest=docker_cache/
      --ssh default=...
      --output type=image,name=registry.dev:foo,push=true
      --platform=linux/amd64,linux/arm,linux/arm64
      .

registry type:

    - docker buildx build
      --cache-from=type=registry,ref=registry.dev/cache
      --cache-to=type=registry,ref=registry.dev/cache
      --ssh default=...
      --output type=image,name=registry:foo,push=true
      --platform=linux/amd64,linux/arm,linux/arm64
      .

Other reports that might be related:

tonistiigi commented 3 years ago

please post a runnable reproducer

umonaca commented 3 years ago

Same here. There seems to be a lot of similar issues here.
I have been stuck with output=type=local,dest=path as well.
I can reproduce the issue but I don't know how to make a minimal reproducer. All I know is that it got stuck in the "copying file" after image is built. BTW it is a multi-platform build, the arm64 image is successfully exported but armv7 always gets stuck in the copying to output stage, after the image is built successfully.

awakecoding commented 3 years ago

I have been trying to figure out for the entire day why a simple docker buildx filesystem export hangs in GitHub Actions, while it works just fine locally in WSL2. Maybe this is the same issue? https://twitter.com/awakecoding/status/1430252223771054084

tonistiigi commented 3 years ago

opened https://github.com/grpc/grpc-go/issues/4722

tonistiigi commented 3 years ago

If someone can make a reproducer using --cache-to that fails in a similar way @awakecoding did to -o type=local with a reproducible system, I could look if it is similar. Still don't quite understand what is the difference between local and tar output if it breaks in grpc level. It could be that local transfers files individually but neither type=tar or --cache-to do not. @bendavies

hectorj-klaxoon commented 2 years ago

I have a similar issue. It doesn't happen all the time, and I'm not sure what triggers it so I can't give a reproducer. The build fully uses the --cache-from (all steps are marked as CACHED), which points to the same registry&image&tag as --cache-to, so I don't think anything actually needs to be pushed.

Deleting the --cache-to tag from the registry allows the next build to succeed.

Sorry, I don't have much more information.

worldspawn commented 2 years ago

I saw this immediately after adding mode=max. I'm caching to/from registry,

arikmaor commented 2 years ago

Any fix?

bbednarek commented 12 months ago

@tonistiigi I created reproducible example in https://github.com/bbednarek/multiple-docker-build repo along with the workaround that we have taken (OCI layout).

I am basically building Docker image in 2 steps, using 2 different Dockerfiles: Docker.builder and Docker. You can find 2 different workflow files which are using 2 different ways to build the final Docker image (you can also run them manually and override default target platform):

On the top of that you can find Makefile which contains 3 self-descriptive jobs to run it locally:

tonistiigi commented 11 months ago

I saw this immediately after adding mode=max. I'm caching to/from registry,

This issue is only about type=local . It was traced to https://github.com/docker/buildx/issues/537#issuecomment-908826867

jjhuff commented 9 months ago

@tonistiigi Unfortunately, grpc/grpc-go#4722 was closed as stale. At this point, our cache exports are so slow, that we might as well not use docker caching at all.