GoogleContainerTools / kaniko

Build Container Images In Kubernetes
Apache License 2.0
14.79k stars 1.44k forks source link

Caching with multi-stage builds and multiple RUNs #1468

Open stanislaw55 opened 3 years ago

stanislaw55 commented 3 years ago

Hi, first of all - I would like to thank you for all your work - Kaniko is very tool and I love to use it.

I wanted to ask about caching in regard of multi-stage builds and having multiple RUNs in each. I am using kaniko version 1.2.0, debug inside GitLab CI.

I have realised that caching only works fine in with following Dockerfile

FROM python:3.8-alpine AS build-deps

RUN apk add --no-cache install gcc musl-dev yaml-dev

FROM build-deps AS build

RUN pip install --prefix=/install PyYAML

FROM python:3.8-alpine 

COPY --from=build /install /usr/local

in short - each stage has only one RUN directive and there are no special shell tricks in each of them.

When I try to use either multiple RUN commands or shell tricks (even like \ to split installation of packages into multiple lines or && to band commands together), no cache is used. It seems like cache layers are produced only after whole stage, not after each RUN directive. Kaniko tries to find cached layer for given RUN value which is different because there are multiple ones or shell tricks make it different than was retrieved from image.

The question is: is this correct behaviour?

Sorry for chaotic description, I tried to describe it as simple as possible. I do not know if this is intended behaviour or bug hence no bug report template for issue was used.

tejal29 commented 3 years ago

@stanislaw55,

Kaniko tries to find cached layer for given RUN value which is different because there are multiple ones or shell tricks make it different than was retrieved from image.

Do you expect for a Multi-Run command e..g RUN pip install --prefix=/install PyYAML && echo "text" > somefile

The results of each command i.e. pip install --prefix=/install PyYAML and echo "text" > somefile are cached individually?

Kaniko does not implement caching like that but computes the cache for the whole command `RUN pip install --prefix=/install PyYAML && echo "text" > somefile

Does that help?

stanislaw55 commented 3 years ago

Hi @tejal29

What I meant by 'multi run command' is having more than one RUN directives in a single stage. Sorry if I wasn't clear.

And yes, I realised that Kaniko caching works with shell tricks like && but not work multiple RUN directives in one stage. My bad for not testing thoroughly before.

When I run Kaniko with debug level of verbosity, I've seen that with Kaniko cache to work, RUN directive hasto be the last one in stage. If there's some other directive, like USER, after RUN, cache stops working. Like in the following Dockerfile snippet

FROM python:3.8-alpine

RUN apk add --no-cache gcc g++ musl-dev libffi-dev openssl-dev yaml-dev && pip install --upgrade pip setuptools wheel cython && adduser -D builder

USER builder

in this case cache won't work but if you delete USER directive, caching will work just fine.

The same is with multiple RUNs, example:

FROM python:3.8-alpine

RUN apk add --no-cache gcc g++ musl-dev libffi-dev openssl-dev yaml-dev
RUN pip install --upgrade pip setuptools wheel cython
RUN adduser -D builder

cache won't work again.

The only cache-working example I've been able to get is this:

FROM python:3.8-alpine AS user

RUN apk add --no-cache gcc g++ musl-dev libffi-dev openssl-dev yaml-dev
 && pip install --upgrade pip setuptools wheel cython \
 && adduser -D builder

I do not know if that's okay but it is different from how docker build cache works and it took me a while to figure it out.