docker / buildx

Docker CLI plugin for extended build capabilities with BuildKit
Apache License 2.0
3.56k stars 481 forks source link

build never finishes in Docker 3.3.3 but works in Docker 3.2.2 #614

Open romansky opened 3 years ago

romansky commented 3 years ago

Expected behavior

build finishes successfully

Actual behavior

build gets stuck in latest version

Information

Please, help us understand the problem. For instance:

Steps to reproduce the behavior

build_issue.tar.gz

  1. use latest version (3.3.3)
  2. run DOCKER_BUILDKIT=1 docker build . or docker build .

Some strange things that I observed, if cp command in vendor.sh is commented out, issue does not happen, if you comment in line 16 in Dockerfile and comment out line 17, issue does not happen.. No idea how to debug further..

If using 3.2.2 there is no issue..

tonistiigi commented 3 years ago

(but same happens on a Linux box)

Please post a configuration where this build passes on linux.

romansky commented 3 years ago

(but same happens on a Linux box)

Please post a configuration where this build passes on linux.

Whats “aa configuration”? There are multiple ways for this to pass, workarounds or use Docker 3.2.2 instead of 3.3.3

tonistiigi commented 3 years ago

"a configuration".

3.3.3 is not a version that any of the tools use on Linux.

romansky commented 3 years ago

@tonistiigi Ahh, gotchya

I can get the build to get stuck in Docker Engine 20.10.6/20.10.5/20.10.4 but can only get it to pass the build on my MacOS machine.. On MacOS the Docker Engine versions are 3.3.3 - 20.10.6 3.2.2 - 20.10.5

tonistiigi commented 3 years ago

I can get it stuck as well(unless it is supposed to take 10+ min). It is stuck when it is running these yarn processes that are not returning so build can't continue. Therefore I would classify this as application error but if you have configurations that pass when of course we could look at finding the difference. Would be better if we would understand what the command that doesn't return is doing though.

tonistiigi commented 3 years ago

Even DOCKER_BUILDKIT=0 docker build seems to get stuck at same command so not looking very likely this is builder related.

romansky commented 3 years ago

@tonistiigi the working part does "replicate" on OSX..

How about this passing configuration, using this as the Dockerfile instead-

FROM node:16-alpine as builder
ENV WD /home/node/app
# install deps for webapp
WORKDIR $WD
COPY package.json ./
RUN yarn install

WORKDIR $WD
COPY src ./src
COPY *.json ./
COPY config-overrides.js ./
COPY public ./public
ENV NODE_ENV='production'
COPY vendor ./vendor
COPY vendor.sh ./
#RUN  yarn build
RUN IMC_WD=$WD . ./vendor.sh # setting env
RUN IMC_WD=$WD yarn build # now build works

CMD echo "donesky"

Notice how I split the RUN with environment setting command and the build command to two different lines

What's the difference? why does it work this way?

tonistiigi commented 3 years ago

What's the difference? why does it work this way?

It's hard to say without knowing what the yarn build does and why it goes into a loop. I straced it https://gist.github.com/tonistiigi/2be68cb98560025e5cf458468164d077 and it looks to be in a loop where it reads /proc/self/stat, waits on epoll and sleeps. I have no idea why it is doing that.

Note that the command split this way vs && is not identical behavior because you are sourcing ./vendor.sh to your current shell context that obviously gets lost if you are in the next layer. But I didn't find env combination that would directly trigger the behavior change in yarn.

Btw, you can just run these commands in a shell in docker container without any builder. Get's stuck the same way.

romansky commented 3 years ago

The env vars in vendor.sh should not be known outside of the script so I don't believe this is making any difference..

Another "work around" is to comment out the cp command inside the vendor.sh this will also "fix" the buid.. the Yarn build should not be specifically aware of this file..

My guess is that some Docker layer copying optimization is causing some disk state consolidation process to get stuck waiting or resolving some local storage state when the && is used (as opposed to splitting the command), this is on Docker side..

tonistiigi commented 3 years ago

I don't understand your latest case. What is the result you are expecting and what is the difference.

» docker buildx build .                                                                                                                                                                                                    !10523
[+] Building 2.4s (15/15) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                                                                         0.0s
 => => transferring dockerfile: 412B                                                                                                                                                                                         0.0s
 => [internal] load .dockerignore                                                                                                                                                                                            0.0s
 => => transferring context: 2B                                                                                                                                                                                              0.0s
 => [internal] load metadata for docker.io/library/node:16-alpine                                                                                                                                                            1.8s
 => [auth] library/node:pull token for registry-1.docker.io                                                                                                                                                                  0.0s
 => [ 1/10] FROM docker.io/library/node:16-alpine@sha256:3d9b25a9ab75b620434da48fa7f31181d5970d7ccd66bb590a4d4c54f0484423                                                                                                    0.3s
 => => resolve docker.io/library/node:16-alpine@sha256:3d9b25a9ab75b620434da48fa7f31181d5970d7ccd66bb590a4d4c54f0484423                                                                                                      0.0s
 => => sha256:03b7f3415b19c636a2d1ca01a0a8470b567db7b46859a531731abd8c10b9b771 280B / 280B                                                                                                                                   0.1s
 => => sha256:5dfd4491211c90c9d3f9172d41cfe8b57ac5cb3282b268c3d92aa7167518d9a5 1.05MB / 36.46MB                                                                                                                              0.3s
 => => sha256:ca4832019c53b7135209eb08c1734432ec3cf82dd875f0a93e9b6ee90d6159c7 0B / 2.41MB                                                                                                                                   0.3s
 => [internal] load build context                                                                                                                                                                                            0.4s
 => => transferring context: 14.56kB                                                                                                                                                                                         0.3s
 => CANCELED [ 2/10] WORKDIR /home/node/app                                                                                                                                                                                  0.0s
 => CACHED [ 3/10] COPY public ./public                                                                                                                                                                                      0.0s
 => CACHED [ 4/10] COPY vendor ./vendor                                                                                                                                                                                      0.0s
 => CACHED [ 5/10] COPY vendor.sh ./                                                                                                                                                                                         0.0s
 => CACHED [ 6/10] RUN IMC_WD=/home/node/app . ./vendor.sh                                                                                                                                                                   0.0s
 => CACHED [ 7/10] RUN echo /home/node/app                                                                                                                                                                                   0.0s
 => CACHED [ 8/10] RUN pwd                                                                                                                                                                                                   0.0s
 => CACHED [ 9/10] RUN ls /home/node/app                                                                                                                                                                                     0.0s
 => ERROR [10/10] COPY login_visual.png /home/node/app/public/img/login_visual.png                                                                                                                                           0.0s
------
 > [10/10] COPY login_visual.png /home/node/app/public/img/login_visual.png:
------
Dockerfile:15
--------------------
  13 |     RUN pwd
  14 |     RUN ls $WD
  15 | >>> COPY login_visual.png $WD/public/img/login_visual.png
  16 |
  17 |     CMD echo "donesky"
--------------------
error: failed to solve: rpc error: code = Unknown desc = failed to compute cache key: "/login_visual.png" not found: not found
romansky commented 3 years ago

@tonistiigi ah, my bad, please ignore my last comment here.. (deleted it to not confuse other readers)

romainframe commented 2 years ago

Hi all,

I came on this issue because my yarn build inside a docker build (with build kit ON) was never ending at step [build-stage 13/13] RUN yarn build

It seems it failed because on another terminal, a yarn dev (vue-cli-service serve) build was running.

I'm not sure to understand totally which of Docker, yarn, or vue-cli-service is guilty, or even if it is really related to this issue, but hope it can help someone!

Versions: