GoogleContainerTools / kaniko

Build Container Images In Kubernetes
Apache License 2.0
14.6k stars 1.42k forks source link

Lost layers related to caching and symlinks #1576

Open hstenzel opened 3 years ago

hstenzel commented 3 years ago

Actual behavior

Sometimes when caching is enabled in multistage dockerfiles subsequent layers lack changes made by previous layers

Expected behavior

a new FROM previousstage has all changes made by previousstage.

To Reproduce

I don't have a complete reproduce, but I do have CI logs both with and without caching showing the expected behavior (without caching) and the error behavior (with caching enabled). Both runs used the same kaniko and the same gitsha of our project; the only difference is in the cache args.

Additional Information

kaniko-job-fail-cache.txt kaniko-job-success-nocache.txt

Complete success and failure flows can be seen in these two logs. Note that the logs gathered all requested information below and it is preserved in-situ. The logs are easier to read with less -r as they contain color and CRs for gitlab sections.

The general flow is:

FROM alpine as base
# add some stuff, make sure it's working. 
# The thing that specifically tends to fail is the switch between busybox and coreutils based tools, but it's quite intermittent
# sort is one tool that we're sensitive to, and in the failure mode we lose the gnu version and are left with the busybox version
# This is of course symbolic link heavy
FROM base as test
# run the bats tests
FROM golang as testgoget
# make sure we can go get from private projects
FROM base as release
# Normally this is a noop, but we add validation that shows layers are not preserved from base
# On success sort is sort (GNU coreutils) 8.32. On failure it is busybox

Also, we first thought this issue was related to Alpine, so the first time we saw it we opened https://gitlab.alpinelinux.org/alpine/aports/-/issues/12155 . I was never able to get a recreate of the builds that failed though. The old issue might be an interesting reference.

Triage Notes for the Maintainers

Description Yes/No
Please check if this a new feature you are proposing
  • - [ ]
Please check if the build works in docker but not in kaniko
  • - [ ]
Please check if this error is seen when you use --cache flag
  • - [x]
Please check if your dockerfile is a multistage dockerfile
  • - [x]
hstenzel commented 3 years ago

Possibly related to #1552 #1547 #1540 #1533