Open joeauty opened 4 months ago
Perhaps what might help me here is a more fundamental understanding of how caching works. How does the cache algorithm know what the output of the Dockerfile command is going to be before it is run to know whether the cached layer is valid for the command?
I'm wondering if the issue here has something to do with apt, pip, or gem package lists or the like.
Over the past few days, I've also encountered this cache problem in multi-stage builds. It worked fine for months before and I haven't changed anything.
I'm also getting this, and I can see the cache hash it's looking for -does- exist, but it's not pulled:
...
INFO[2024-08-12T02:44:10Z] Checking for cached layer
registry.gitlab.xxxx.xxxx/xxxx/cache:68130d05ac234eaae199bd7052a9898bb4df73c3517b830fb2e98923e488fcc3...
INFO[2024-08-12T02:44:10Z] No cached layer found for cmd RUN apt-get update -qq && apt-get install --no-install-recommends -y build-essential curl git libpq-dev libvips pkg-config unzip
...
68130d05ac234eaae199bd7052a9898bb4df73c3517b830fb2e98923e488fcc3 exists, but isn't used:
The tag keep changing even though there are no changes. When I run the docker build command with the same Dockerfile, cache layer works expected.
[36mINFO[0m[0017] Executing 0 build triggers
[36mINFO[0m[0018] Building stage 'base' [idx: '1', base-idx: '0']
[36mINFO[0m[0018] Checking for cached layer xxxxxxxxxxxxxxxxxxxxxxxxxxxx:39478ea256ca812a762b7e6c93725c317e9f646dd50a3d105f91bc87cc690958...
[36mINFO[0m[0018] No cached layer found for cmd RUN xxxxxxxxxxxxxxxxxxxxxxxxxxxx
The analysis is wrong. Ignore the following.
[36mINFO[0m[0022] Checking for cached layer xxxxxxxxxxxxxxxxxxxxxxxxxxxx:afcca8762ebd17a73cf848ea14994828b0a71fa6bf103135a70b3e1844ebdb2d...
[36mINFO[0m[0022] No cached layer found for xxxxxxxxxxxxxxxxxxxxxxxx
Seems related to #3254
Our tests revealed that using WORKDIR
on a multi-stage build causes this issue, specially with RUN
instructions for apt-get update/install
commands like:
...
WORKDIR /app
FROM base as build
RUN apt-get update -qq && \
apt-get install --no-install-recommends -y build-essential curl git libpq-dev node-gyp pkg-config python-is-python3
...
A workaround that worked for us was to either remove the workdir directive, or duplicate it across all stages. After that the RUN instruction started using the cache correctly, as previously a different hash was being generated even when nothing changed.
This was definitely not an issue before.
Seems related to #3254
Our tests revealed that using
WORKDIR
on a multi-stage build causes this issue, specially withRUN
instructions forapt-get update/install
commands like:... WORKDIR /app FROM base as build RUN apt-get update -qq && \ apt-get install --no-install-recommends -y build-essential curl git libpq-dev node-gyp pkg-config python-is-python3 ...
A workaround that worked for us was to either remove the workdir directive, or duplicate it across all stages. After that the RUN instruction started using the cache correctly, as previously a different hash was being generated even when nothing changed.
This was definitely not an issue before.
Unfortunately this did not fix my issue, unless there is some problem with using a variable to as an assignment WORKDIR?
# ---- BASE IMAGE ----
FROM ruby:3.3.4-slim-bullseye as base-image
ENV INSTALL_PATH /data/go
ENV GETTEXT_LOCALES_PATH $INSTALL_PATH/config/gettext_locales
ENV GETTEXT_CLIENT_LOCALES_PATH $INSTALL_PATH/client/locales
WORKDIR $INSTALL_PATH
RUN apt-get update && apt-get install -y libicu-dev libpq-dev python3-pip python-dev build-essential --no-install-recommends && apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
&& pip install --upgrade setuptools pip \
&& pip install awscli \
&& pip cache purge \
&& gem update --system 3.5.13 \
&& gem install bundler:2.5.13
# ---- BUILD DEPENDENCIES ----
FROM base-image as build-dependencies
ENV INSTALL_PATH /data/go
ENV NODE_MAJOR 20
WORKDIR $INSTALL_PATH
SHELL ["/bin/bash", "-lc"]
RUN apt-get update && apt-get install -y curl gnupg ca-certificates --no-install-recommends && apt-get clean
We ended up moving the WORKDIR directive after any RUN directive wherever possible and it resolved it for us.
We ended up moving the WORKDIR directive after any RUN directive wherever possible and it resolved it for us.
Unfortunately, that does not work for me. Of course, I'm stating the obvious, but being able to drop in Kaniko without touching the Dockerfile at all would be ideal.
make sure that the directory exists before calling WORKDIR
.
If the directory does not exist kaniko is kind enough to create it for you, but not kind enough to also put that layer into cache (come to think of it I should probably open a bug ticket for that). Which means that a new layer is emitted every time you pass the workdir instruction. Inside the same build it's not immediately obvious as you will get 100% cache hitrate, however all the layers are new so push will be slower and you will pull a completely new image thereafter. In multistage builds, or builds that run on top of other images created with kaniko this causes huge problems, as now the cache gets invalidated for them.
workaround is simple enough:
RUN mkdir $INSTALL_PATH
WORKDIR $INSTALL_PATH
Actual behavior Apt install commands are not being cached running Kaniko as a Kubernetes job, making image builds very slow
Expected behavior Cache layer hit/being found since, as far as I can tell, there should be no filesystem differences between builds
To Reproduce Steps to reproduce the behavior:
Additional Information
The
apt-get update && apt-get install -y curl gnupg ca-certificates --no-install-recommends && apt-get clean
command is not being loaded from cache, all other commands up to this point are loaded from cache. The logs show:No cached layer found for cmd RUN apt-get update && apt-get install -y curl gnupg ca-certificates --no-install-recommends && apt-get clean
This same Dockerfile built in Docker makes use of the cache, making build times much faster.
Triage Notes for the Maintainers
--cache
flag