Single-snapshot breaks caching

AndreKR commented 3 years ago

This might be difficult to fix, but I wanted to open an issue anyway to discuss possible solutions or workarounds.

When --single-snapshot is given, only a single snapshot is made at the end each stage. I don't know how the cache key is calculated for this final snapshot.

In a subsequent build, cache keys are calculated for each RUN line, but since those lines have never been snapshotted nor uploaded to the cache, they will always generate a cache miss.

There is a demonstration of the issue here: Command line Dockerfile Build log

Note how the cache key for the stage is 3789525c5a2203fceeee7a568b0a3d88a9a928cf7ddb2aa1111d47223e8939c0, but the cache key for the first RUN line is 73354b53a39213254fd5c366b87f7a6d67413c1ec7eca6223551fbf541b7541f, which never gets cached.

Description	Yes/No
Please check if this a new feature you are proposing	- [X] (somewhat)
Please check if the build works in docker but not in kaniko	(not applicable as `docker build` has no functionality equivalent to `--single-snapshot`
Please check if this error is seen when you use `--cache` flag	- [X]
Please check if your dockerfile is a multistage dockerfile	- [ ]

The full solution would probably entail calculating a cache key for the whole stage beforehand. I don't know if that is possible - actually it might? Cache keys only depend on contents of COPY file and the text of RUN lines, right? Both is know at the start of the build.

A bit of background info:

The reason why I personally use --single-snapshot is to have simpler output images, not performance. This opens the possibility of a workaround: Provide an option (like --snaphot-for-cache) that, if given in addition to --single-snapshot, still snapshots the separate layers to upload them to the cache but also makes one final snapshot for the target image.

The reason why I personally want to use --cache is that I have multistage builds with different targets but they all share previous stages. IOW, my Dockerfile looks like this:

FROM ... AS intermediate1
...

FROM .. AS target1
COPY --from=intermediate1

FROM .. AS target2
COPY --from=intermediate1

and then I run executor --target target1; executor --target target2, which results in intermediate1 being built twice. Therefore I'd like to have a cache for the whole intermediate1 stage without actually having to create a named image in the registry.

internalsystemerror commented 2 years ago

Thanks for reporting, we're experiencing the same issue.

lntzr commented 1 year ago

I'm experiencing a similar problem related to combined usage of caching & single-snapshot. As described in this issue, kaniko builds executed with --single-snapshot seem to perform cache-lookups for each RUN but do not push to the cache , therefore leading to unnecessary rebuilds. However, it seems to be more problematic when those cache-lookups actually result in a hit (e.g. because a kaniko build on the same Dockerfile was executed before without --single-snapshot and created the cache entries), since it leads to failing builds or potentially even wrong output.

Steps to reproduce:

Dockerfile:

FROM busybox:1.35.0 as base
RUN touch file.txt
RUN touch somethingelse.txt

FROM base
RUN cat file.txt

kaniko image: gcr.io/kaniko-project/executor:v1.9.1-debug

kaniko args: run 1 (fill cache): --cache --skip-unused-stages --target base

run 2 (retreive cached layers built by run1): --cache --single-snapshot

observed behavior: run 2 fails with error

...
INFO[0003] No cached layer found for cmd RUN cat file.txt 
INFO[0003] Unpacking rootfs as cmd RUN cat file.txt requires it. 
INFO[0003] Initializing snapshotter ...                 
INFO[0003] Taking snapshot of full filesystem...        
INFO[0003] RUN cat file.txt                             
INFO[0003] Cmd: /bin/sh                                 
INFO[0003] Args: [-c cat file.txt]                      
INFO[0003] Running: [/bin/sh -c cat file.txt]           
cat: can't open 'file.txt': No such file or directory
error building image: error building stage: failed to execute command: waiting for process to exit: exit status 1

It seems like kaniko only retrieves&uses the cached-layer for the last RUN instruction and ignores other RUNs before that (resulting image contains somethingelse.txt but file.txt is missing) if the cache was filled without using single-snapshot.

leosunmo commented 5 months ago

This is still a problem for us and I would love for someone with more experience with the code either take a quick look and suggest some stuff, or at least rate the feasibility of this feature. Super happy to help with a pull request!

AndreKR commented 5 months ago

I guess the first step would be to decide what exactly we want to implement. I mentioned a couple of different ideas which would fix different use cases.

GoogleContainerTools / kaniko

Single-snapshot breaks caching #1703