docker / buildx

Docker CLI plugin for extended build capabilities with BuildKit
Apache License 2.0
3.48k stars 471 forks source link

docker build --no-cache uses cache anyway :'( #2387

Open gummyWalrus opened 1 year ago

gummyWalrus commented 1 year ago


I'm trying to build an image referencing a custom image that references maven. I'm using docker cli on linux Fedora 36

Here is the dockerfile

FROM lanico/whanos-java:latest
COPY . .
WORKDIR /app/app
RUN mvn package
RUN ls -R /app
COPY /app/app/target/app.jar .
CMD ["java" , "-jar", "app.jar"]

the image referenced by this Dockerfile is built using this one.

FROM maven:3.8.5-openjdk-17

I successfully built and push the referenced one by using

docker build --no-cache -t lanico/whanos-java:latest
docker push lanico/whanos-java:latest

But then when I try to build the first one "the referencer" It uses cache anyway

docker build --no-cahce --pull -t lanico/test-java:latest

Output :

$ docker build --no-cache --pull -t lanico/test-java:latest .

[+] Building 1.6s (12/12) FINISHED                                                                                                                                                                    
 => [internal] load build definition from Dockerfile                                                                                                                                             0.0s
 => => transferring dockerfile: 203B                                                                                                                                                             0.0s
 => [internal] load .dockerignore                                                                                                                                                                0.1s
 => => transferring context: 2B                                                                                                                                                                  0.0s
 => [internal] load metadata for                                                                                                                             1.4s
 => [auth] lanico/whanos-java:pull token for                                                                                                                                0.0s
 => [internal] load build context                                                                                                                                                                0.1s
 => => transferring context: 3.81kB                                                                                                                                                              0.0s
 => [1/7] FROM                                                                       0.1s
 => => resolve                                                                       0.1s
 => CACHED [2/7] WORKDIR /app                                                                                                                                                                    0.0s
 => CACHED [3/7] COPY . .                                                                                                                                                                        0.0s
 => CACHED [4/7] WORKDIR /app/app                                                                                                                                                                0.0s
 => CACHED [5/7] RUN mvn package                                                                                                                                                                 0.0s
 => CACHED [6/7] RUN ls -R /app                                                                                                                                                                  0.0s
 => ERROR [7/7] COPY /app/app/target/app.jar .                                                                                                                                                   0.0s
 > [7/7] COPY /app/app/target/app.jar .:
   5 |     RUN mvn package
   6 |     RUN ls -R /app
   7 | >>> COPY /app/app/target/app.jar .
   8 |     CMD ["java" , "-jar", "app.jar"]
ERROR: failed to solve: failed to compute cache key: failed to calculate checksum of ref moby::zceqi6bcku5ggj2hte8gg96i0: "/app/app/target/app.jar": not found

Docker has cached the failing mvn package from a previous build and now, it is not building the target/app.jar :'(

It's not even running my ls command.

I've tried to

docker system prune
docker builder prune --all

still NOT working :(


  1. docker build --no-cache -t
  2. See that it uses cache :'(

Expected behavior

docker build --no-cache should not show me CACHED in the output. And should not use CACHE (how does it uses cache when I'm removing it prior to build !)

docker version

Client: Docker Engine - Community
 Version:           23.0.1
 API version:       1.42
 Go version:        go1.19.5
 Git commit:        a5ee5b1
 Built:             Thu Feb  9 19:50:04 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
  Version:          23.0.1
  API version:      1.42 (minimum version 1.12)
  Go version:       go1.19.5
  Git commit:       bc3805a
  Built:            Thu Feb  9 19:47:02 2023
  OS/Arch:          linux/amd64
  Experimental:     false
  Version:          1.6.14
  GitCommit:        9ba4b250366a5ddde94bb7c9d1def331423aa323
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

 Context:    default
 Debug Mode: false
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.10.2
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.16.0
    Path:     /usr/libexec/docker/cli-plugins/docker-compose
  scan: Docker Scan (Docker Inc.)
    Version:  v0.23.0
    Path:     /usr/libexec/docker/cli-plugins/docker-scan

 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 75
 Server Version: 23.0.1
 Storage Driver: btrfs
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 9ba4b250366a5ddde94bb7c9d1def331423aa323
 runc version: v1.1.4-0-g5fd4c4d
 init version: de40ad0
 Security Options:
   Profile: builtin
 Kernel Version: 6.1.11-100.fc36.x86_64
 Operating System: Fedora Linux 36 (Workstation Edition)
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 7.643GiB
 Name: fedora
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: lanico
 Experimental: false
 Insecure Registries:
 Live Restore Enabled: false

Additional Info

Linux kernel version : Linux fedora 6.1.11-100.fc36.x86_64

SampsonCrowley commented 1 year ago

@gummyWalrus what was the solution??

Praneethvvs commented 1 year ago

docker build --no-cache is still not working. Did we have a solution for this?

thaJeztah commented 1 year ago

@Praneethvvs do you have more details? I think some of this may be a presentation issue (there's a ticket somewhere with a longer discussion, but couldn't find it directly).

When using --no-cache, BuildKit will skip the cache for certain steps (such as RUN), but for other steps (such as COPY), it may still use the cache after re-verifying the cache. BuildKit validates the checksum of the files used, and if nothing changed, it will use the cache for those steps (as there would be no need to re-do the step).

Praneethvvs commented 1 year ago

@thaJeztah I see some layers are still being cached and from your comment above I understand this was supposed to happen. This answers my question "BuildKit validates the checksum of the files used, and if nothing changed, it will use the cache for those steps". I was wondering why my COPY step is still running from cache even after using --no-cache and I actually had to prune all the cache. Thanks for the details.

SampsonCrowley commented 1 year ago

@Praneethvvs do you have more details? I think some of this may be a presentation issue (there's a ticket somewhere with a longer discussion, but couldn't find it directly).

When using --no-cache, BuildKit will skip the cache for certain steps (such as RUN), but for other steps (such as COPY), it may still use the cache after re-verifying the cache. BuildKit validates the checksum of the files used, and if nothing changed, it will use the cache for those steps (as there would be no need to re-do the step).

This seems like a disservice to the --no-cache option. If I am running no-cache I want the entire build ran without caching. That's the whole point is to rebuild from scratch... Can there at least be a "value" option for --no-cache, something like --no-cache all to say run ALL steps as no cache regardless of if it seems pointless or not...

SampsonCrowley commented 1 year ago

It's removing the ability for using --no-cache to test a completely raw build by having it still use caching for some steps

thaJeztah commented 1 year ago

The best place to request such a feature would be in the BuildKit repository, where that code is maintained (

I'm curious though; do you have a specific scenario where forcing to re-create the same files instead of restoring the same files from cache makes a difference? I understand the "presentation" can be somewhat confusing (there was some discussion at some point to change CACHED to e.g. CACHE VERIFIED to show that the cache was validated), but wondering if you have a specific example where it makes a difference for the image / build result.

DaveCole commented 1 year ago

I'm curious though; do you have a specific scenario where forcing to re-create the same files instead of restoring the same files from cache makes a difference?

I'll chime in here - I just started experiencing my builds failing on a project last week due to this caching. I'm not exactly sure what changed, but docker suddenly started aggressively caching files it didn't actually have on the docker image. Several COPY commands fail with the following message:

COPY ./ ./
ERROR: failed to calculate checksum of ref moby::3upz0pvfkv28cwdwmr2klalwz: "/": not found

Even though the above file was clearly in the repo and not ignored. Stranger things still in this use case: If you use wildcards to copy files, some files and folders started being skipped over in a seemingly random fashion, even though the COPY would report that it was successful (and CACHED). Attempting to re-run previously successful runs now fail as well.

So far I have yet to find a workaround, purging and --no-cache didn't help at all. This is on Github specifically, things work fine locally.

thaJeztah commented 1 year ago

This is on Github specifically, things work fine locally.

That looks more like either a bug, or an issue with the nodes; if you have more details on that, please open a ticket in with details; the issue may depend on what version of docker is installed (e.g., there have been issues with recent distro-packaged versions of docker on ubuntu)

DaveCole commented 1 year ago

@thaJeztah thanks for the heads-up, I posted the issue here:

Happy to provide any more info that I can.

mcfriend99 commented 1 year ago

Why was this closed?? This issue still exist.

thaJeztah commented 1 year ago

@mcfriend99 read the discussion before posting next time, please.

mcfriend99 commented 1 year ago

@thaJeztah Didn't realise. Right post wrong thread.

faq885 commented 10 months ago

Same here, no-cache is not working so the COPY command fails because the folder is not there (because old copy cached)

thaJeztah commented 10 months ago

@faq885 same answer;

That looks more like either a bug, or an issue with the nodes; if you have more details on that, please open a ticket in with details; the issue may depend on what version of docker is installed (e.g., there have been issues with recent distro-packaged versions of docker on ubuntu)

Sairav commented 10 months ago

+1 commands continue to use cache for my RUN commands and fails at COPY command as it was dependent on the RUN command - generates the file needed to copy ->

ERROR: failed to solve: failed to compute cache key: failed to calculate checksum of ref moby::g8j6icchxv81cf7fu1yodfl1p: "/promtool": not found

Sairav commented 10 months ago

Ahh solved for me , it was due to the fact that - the COPY command it was failing at, the file it was actually trying to COPY was not present in the working directory ...

example ->

COPY /src/some_file /container-dir/some-file

There was no file at location -> /src/some_file.....this somehow caused the docker build to use the cached results for other instructions....

kerwitz commented 10 months ago

For what it's worth @Sairav seems to be correct, at least for me. --no-cache kept being ignored (judging by the CACHED labels in the output) until I commented out a broken COPY directive. This is.. not very intuitive.

jgfoster commented 8 months ago

I'm finding that even RUN statements are showing as CACHED.

jfoster@JGF-MBP-14-2022 flutter % docker --version
Docker version 24.0.7, build afdd53b
jfoster@JGF-MBP-14-2022 flutter % docker build --no-cache -t theia-flutter .       
[+] Building 0.1s (17/18)                                                            docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                 0.0s
 => => transferring dockerfile: 1.63kB                                                               0.0s
 => [internal] load .dockerignore                                                                    0.0s
 => => transferring context: 2B                                                                      0.0s
 => [internal] load metadata for                               0.0s
 => [internal] load build context                                                                    0.0s
 => => transferring context: 2B                                                                      0.0s
 => CACHED [ 1/14] FROM                                               0.0s
 => [ 2/14] RUN echo "Hello, world!" > /tmp/hello.txt                                                0.1s
 => CACHED [ 3/14] RUN apt update && apt install -y libglu1-mesa                                     0.0s
 => CACHED [ 4/14] WORKDIR /opt                                                                      0.0s
 => CACHED [ 5/14] RUN wget                                                  0.0s
 => CACHED [ 6/14] RUN wget  0.0s
 => CACHED [ 7/14] RUN tar xf flutter* && rm flutter*.tar.xz                                         0.0s
 => CACHED [ 8/14] RUN   ln -s /opt/flutter/bin/dart /usr/local/bin/dart &&   ln -s /opt/flutter/bi  0.0s
 => CACHED [ 9/14] RUN chmod -R o+w /opt/flutter                                                     0.0s
 => CACHED [10/14] RUN wget  0.0s
 => CACHED [11/14] RUN wget  0.0s
 => ERROR [12/14] ADD dart-code-3.80.0.vsix /opt/theia/plugins/                                      0.0s
 => ERROR [13/14] ADD flutter-3.80.0.vsix /opt/theia/plugins/                                        0.0s

Editing the line still shows cached!

thaJeztah commented 8 months ago

@jgfoster same answer;

please open a ticket in with details

Karreg commented 6 months ago

The issue is still there. The first related issue in buildkit has been renamed and is not addressing this issue anymore. The second related issue is not related to this issue either.

So, for now, the --no-cache flag is not working as intended, and the only way to work around it is to clear images and layers locally to really have a no-cache behavior.

thaJeztah commented 6 months ago

@Karreg same answer; commenting here won't help. If you have steps to reproduce and suspect there's a bug, please open a ticket in the BuildKit issue tracker instead.

Karreg commented 6 months ago

Of course. This message was more for people that keep arriving here, because this is where you end up while searching for this issue, and give them an update on how to workaround this issue until it's somewhat fixed, and then people won't have to come here afterwards...

surfingdoggo commented 5 months ago

@Karreg same answer; #4041 (comment). commenting here won't help. If you have steps to reproduce and suspect there's a bug, please open a ticket in the BuildKit issue tracker instead.

But commenting here does help. I'm one of the ones who was brought here by searching for the exact same problem.

docker image prune -a still didn't clear the cache on the layers I'm trying to force to build.

surfingdoggo commented 5 months ago

Quick update: add a command to the image to force it to rebuild past that point, something like

RUN ls -lah

Tobvl commented 4 months ago

I'm having the same issue, using --no-cache still builds "cached" or old versions of my docker project...

CS-cwhite commented 4 months ago

In my case it was COPY as well --- I accidentally said COPY when I should have said RUN cp. COPY goes from build context to container, and RUN cp goes from container into another place in the same container.

In the original post, I hypothesize that the problem will be fixed by this change:

-COPY /app/app/target/app.jar .
+RUN cp /app/app/target/app.jar .
lasergoat commented 3 months ago

I'm just trying to do a RUN ls -la . or RUN echo $(ls -la .) but it keeps getting cached no matter what I do, even when I rearrange the statement and so the command index changes, it's still somehow CACHED the value on the first run.

Anyone have a workaround for just doing a simple ls?

My full command:

docker build --no-cache --progress=plain --target ci -t exampleapp:targetLabel .

The CACHED output:

#6 [node 6/6] RUN echo "$(ls -la .)"

#7 [ci 1/5] RUN echo $(ls -la .)

Tested in: Docker version 24.0.7, build afdd53b And Docker version 26.1.1, build 4cf5afa

thaJeztah commented 3 months ago

What were the steps before the ls -la .? Did the filesystem change?

EntranceJew commented 2 months ago

How did this even break and how has a regression been open for nearly a year?

I have a command:

FROM AS deps
RUN echo "$(ls -lia '/app')"
RUN echo "$(ls -lia ${PWD})"

it will run the first RUN once, then the second RUN will be cached even if it's the first time it runs with that expression:

#19 [deps  2/18] RUN echo "$(ls -lia '/app')" && echo "$(date)"
#19 0.200 ls: /app: No such file or directory
#19 0.200 
#19 0.200 Wed Jul 17 14:40:40 UTC 2024
#19 DONE 0.2s
#8 [deps  5/18] RUN echo "$(ls -lia ${PWD})" && echo "$(date)"

#9 [deps  4/18] RUN echo "$(ls -lia '/app')" && echo "$(date)"

#10 [deps  3/18] RUN echo "$(ls -lia ${PWD})" && echo "$(date)"

even if I modify the commands and add more junk to the RUN command for the first time it will interpret it as having been completed and only do the first RUN command in a series:

#8 [deps  3/18] RUN echo "$(ls -lia ${PWD})" && echo "$(date)" && echo "Clown"

#9 [deps  5/18] RUN echo "$(ls -lia ${PWD})" && echo "$(date)"

#10 [deps  4/18] RUN echo "$(ls -lia '/app')" && echo "$(date)"

#19 [deps  2/18] RUN echo "$(ls -lia '/app')" && echo "$(date)" && echo "Clown"
#19 0.239 ls: /app: No such file or directory
#19 0.239 
#19 0.240 Wed Jul 17 14:42:08 UTC 2024
#19 0.240 Clown
#19 DONE 0.3s

This defies reason, and is exactly the reason why people want no cache. For anyone else, if you're building like:

docker build --no-cache --progress=plain -t php php/ --target deps

do not use --no-cache-filter with --progress=plain it somehow unsets it, and --no-cache cannot be used with --no-cache-filter

understanding what is happening at each layer is becoming increasingly difficult if I only get the same result the first or third time I run the command on a machine and not this weird intermediary state where the most benign of things can be considered cached.

Just give me none of the cache or layers you have. Wait, wait. I'm worried what you just heard was, "Give me a few cache or layers." What I said was, "Give me none of the cache or layers you have." Do you understand?

haozige90 commented 2 months ago

I accidently solved the issue by putting COPY command before WORKDIR.

Dockerfile example:

COPY . /var/www/html/
WORKDIR /var/www/html/

In this way, you don't need to add --no-cache.

It seems like WORKDIR influence COPY cache policy.

patrick-camect commented 1 month ago

In my case it was COPY as well --- I accidentally said COPY when I should have said RUN cp. COPY goes from build context to container, and RUN cp goes from container into another place in the same container.

In the original post, I hypothesize that the problem will be fixed by this change:

-COPY /app/app/target/app.jar .
+RUN cp /app/app/target/app.jar .

This turned out to be my problem as well. Not sure why the cache behavior is happening but this solved it for me.

Vv-vfx commented 1 month ago

I tried all the tips listed here, but nothing helps. New code goes to GitHub, and old code goes to production in Docker. No -no-cache helps. Is it really that hard to make a command that disables cache usage?

shaktiks commented 1 week ago

Hello, any update on this issue"?

gabrielschulhof commented 5 days ago

@thaJeztah here's a scenario where --no-cache absolutely must ignore the availability of a cached layer:

RUN yarn --pure-lockfile --force

This will install dependencies for a Node.js application. Some of those dependencies might be optional, yet relied-upon. Being optional, they may fail to build. If so, the layer gets recorded with a failed build of the optional dependency. The build may fail because of the intermittent unavailablility of header files downloaded from a URL. If the URL becomes healthy again, the build would succeed and the layer would get updated, if docker heeded the --no-cache option, but because it doesn't, the failed build remains what's cached and what gets built into subsequent images.