GoogleContainerTools / kaniko

Build Container Images In Kubernetes
Apache License 2.0
14.67k stars 1.42k forks source link

False-positive cache-hit on COPY --from #870

Closed haizaar closed 4 years ago

haizaar commented 4 years ago

Actual behavior I have a two stage docker image that first build sphinx docs and then copies it to the second nginx image:

FROM python:3.7.5-alpine3.10 AS builder
RUN apk add --no-cache build-base libxml2-dev libxslt-dev graphviz
COPY requirements.txt /
RUN pip install --upgrade -r /requirements.txt
WORKDIR /build
COPY . ./
RUN make html
########################
FROM nginx:1.17.5-alpine AS runner
COPY --from=builder /build/_build/html /usr/share/nginx/html

I build it on Google Cloud Build using gcr.io/kaniko-project/executor:latest builder.

The problem is the last copy instruction - even through generated HTML files differ from the previously built revision, kaniko still prefers to use the cached version:

Step #0: build succeeded.
Step #0:
Step #0: The HTML pages are in _build/html.
Step #0:
Step #0: Build finished. The HTML pages are in _build/html.
Step #0: INFO[0062] Taking snapshot of full filesystem...
Step #0: INFO[0064] Pushing layer gcr.io/xxx/docs-server/cache:cd602b7af9c972414b3241eb3a192a606755174cf665f9806155078915975932 to cache now
Step #0: INFO[0068] Saving file /build/_build/html for later use.
Step #0: INFO[0068] Deleting filesystem...
Step #0: INFO[0068] Downloading base image nginx:1.17.5-alpine
Step #0: INFO[0069] Error while retrieving image from cache: getting file info: stat /cache/sha256:b4c0378c841cd76f0b75bc63454bfc6fe194a5220d4eab0d75963bccdbc327ff: no such file or directory
Step #0: INFO[0069] Downloading base image nginx:1.17.5-alpine
Step #0: INFO[0069] Checking for cached layer gcr.io/xxx/docs-server/cache:5de22bf6a7763f10cd0ffa0592b8f51c876286a1d0f50564cf2f21bfd6262dc4...
Step #0: INFO[0069] Using caching version of cmd: COPY --from=builder /build/_build/html /usr/share/nginx/html
Step #0: INFO[0069] Skipping unpacking as no commands require it.
Step #0: INFO[0069] Taking snapshot of full filesystem...
Step #0: INFO[0070] COPY --from=builder /build/_build/html /usr/share/nginx/html
Step #0: INFO[0070] Found cached layer, extracting to filesystem

Expected behavior Detect that stage 0 image files' have changes and discard cache for COPY --from directive in the stage 1.

To Reproduce Steps to reproduce the behavior:

  1. Run:

    mkdir /tmp/bug; cd /tmp/bug
    python3.7 -mvenv /tmp/bug-env && source /tmp/bug-env/bin/activate
    mkdir docs
    echo 'Sphinx==2.2.1' >> docs/requirements.txt
    pip install -r docs/requirements.txt
    cd docs
    echo no | sphinx-quickstart -p foo -a foo -v 0.0.1 -r 0.0.1 -l en

    Then add the above Dockerfile under docs/.

  2. Now add it all to git, commit and trigger Cloud Build using the following step:

    steps:
    - name: gcr.io/kaniko-project/executor:latest
    args:
      - --cache=true
      - --cache-ttl=336h  # 2 weeks
      - --context=/workspace/docs
      - --dockerfile=/workspace/docs/Dockerfile
      - --destination=gcr.io/$PROJECT_ID/docs-server:$BRANCH_NAME-$COMMIT_SHA
  3. Change docs/index.rst, commit, trigger new build and see that resulting images still has the old version of the docs

Additional Information

stroncium commented 4 years ago

I experience this problem too.

Current workaround for me is using gcr.io/kaniko-project/executor:v0.13.0.

mcfedr commented 4 years ago

Just found this having the exact same issue using kaniko on cloud build. will try suggested work aorund/roll back.

cvgw commented 4 years ago

this appears to the same underlying problem as #845 and #589

mcfedr commented 4 years ago

@cvgw Any movement on getting that fix merged?

cvgw commented 4 years ago

@mcfedr hoping to get merged this week

kzap commented 4 years ago

I can also confirm this does not exist in 0.13.0 but not sure why hope the fixes can get merged soon and released

This is how I recreated the issue locally using a test Dockerfile and AWS ECR as a Cache repo

Testing

We can replicate this bug on our machine using Docker

create our test Dockerfile

mkdir -p ./tmp/ cat <<'EOF' > ./tmp/Dockerfile FROM busybox:latest as builder COPY ./date.txt /tmp/date.txt

FROM busybox:latest COPY --from=builder /tmp/date.txt /tmp/test.txt EOF


- [ ] Run Kaniko locally using docker
```sh
# run kaniko debug image with some env vars
docker run \
  --tty \
  --interactive \
  --rm \
  --volume "$(pwd)/tmp:/app" \
  --volume "$HOME/.aws":/root/.aws \
  --env "AWS_DEFAULT_REGION=us-east-1" \
  --env "ECR_ENDPOINT=${ecr_endpoint}" \
  --env "ECR_REPO=${ecr_repo}" \
  --entrypoint "" \
  "${docker_image}" \
  sh

echo "docker run -it --rm ${ECR_ENDPOINT}/${ECR_REPO}:${ecr_tag} cat /tmp/test.txt > image1.txt"


- [ ] Create 2nd Image w/ cache
```sh
# Create 2nd Image
echo $(date) > /app/date.txt
ecr_tag="test-$(date +%s)"
/kaniko/executor \
  --context "/app" \
  --dockerfile "/app/Dockerfile" \
  --destination "${ECR_ENDPOINT}/${ECR_REPO}:${ecr_tag}" \
  --cache-repo "${ECR_ENDPOINT}/${ECR_REPO}/cache" \
  --cache=true \
  --cleanup

echo "docker run -it --rm ${ECR_ENDPOINT}/${ECR_REPO}:${ecr_tag} cat /tmp/test.txt > image2.txt"

echo "docker run -it --rm ${ECR_ENDPOINT}/${ECR_REPO}:${ecr_tag} cat /tmp/test.txt > image3.txt"


- [ ] Run the 3 docker commands which output 3 text files and cat them
```sh
$ cat image{1..3}.txt
cat image1.
Tue Dec 10 14:30:10 UTC 2019
Tue Dec 10 14:30:10 UTC 2019
Tue Dec 10 15:09:24 UTC 2019

How to fix:

Lets try again to build but with docker image gcr.io/kaniko-project/executor:debug-v0.13.0

run kaniko debug image with some env vars

docker run \ --tty \ --interactive \ --rm \ --volume "$(pwd)/tmp:/app" \ --volume "$HOME/.aws":/root/.aws \ --env "AWS_DEFAULT_REGION=us-east-1" \ --env "ECR_ENDPOINT=${ecr_endpoint}" \ --env "ECR_REPO=${ecr_repo}" \ --entrypoint "" \ "${docker_image}" \ sh


- [ ] Create ECR config for Kaniko
```sh
# create ECR config file for kaniko
mkdir -p /kaniko/.docker/
cat <<EOF > /kaniko/.docker/config.json
{
  "credHelpers": {
    "${ECR_ENDPOINT}": "ecr-login"
  }
}
EOF

echo "docker run -it --rm ${ECR_ENDPOINT}/${ECR_REPO}:${ecr_tag} cat /tmp/test.txt > image4.txt"


- [ ] Create 5th Docker Image w/ Cache and 0.13.0
```sh
echo $(date) > /app/date.txt
ecr_tag="test-$(date +%s)"
/kaniko/executor \
  --context "/app" \
  --dockerfile "/app/Dockerfile" \
  --destination "${ECR_ENDPOINT}/${ECR_REPO}:${ecr_tag}" \
  --cache-repo "${ECR_ENDPOINT}/${ECR_REPO}/cache" \
  --cache=true \
  --cleanup

echo "docker run -it --rm ${ECR_ENDPOINT}/${ECR_REPO}:${ecr_tag} cat /tmp/test.txt > image5.txt"
cvgw commented 4 years ago

Once https://github.com/GoogleContainerTools/kaniko/issues/899 is fixed this issue should be resolved

cvgw commented 4 years ago

This should be fixed as of master@https://github.com/GoogleContainerTools/kaniko/commit/a675098b452d020bf678063a3ac2d3be84b0e545

Some assistance verifying that it is fixed would be appreciated. Thank you

mcfedr commented 4 years ago

This should be fixed as of master@a675098

Some assistance verifying that it is fixed would be appreciated. Thank you

Is there a prebuilt image from master that I can test with? it looks like gcr.io/kaniko-project/executor:latest is fixed at the last release

fredr commented 4 years ago

@mcfedr I think you can use the git hash, like gcr.io/kaniko-project/executor:a675098b452d020bf678063a3ac2d3be84b0e545

edbighead commented 4 years ago

@mcfedr you can check image versions in GCR https://console.cloud.google.com/gcr/images/kaniko-project/GLOBAL/executor?gcrImageListsize=30

edbighead commented 4 years ago

This should be fixed as of master@a675098

Some assistance verifying that it is fixed would be appreciated. Thank you

@cvgw fix is working on the latest version for me.

Dockerfile

FROM maven:3.6.2-jdk-11-slim AS build  
WORKDIR /usr/src/app  

COPY pom.xml .
RUN mvn dependency:go-offline

COPY src ./src  
RUN mvn clean package

FROM openjdk:11.0.4-jre-slim
WORKDIR /home/user

COPY --from=build /usr/src/app/target/app-0.0.1.jar /home/user/app.jar

RUN chmod +x /home/user/app.jar

ENTRYPOINT ["java", "-jar", "app.jar"]
EXPOSE 8080

kaniko output running multiple builds, changing source code INFO[0390] No cached layer found for cmd COPY --from=build /usr/src/app/target/app-0.0.1.jar /home/user/app.jar

toxsick commented 4 years ago

@cvgw works for me too. Thanks!

cvgw commented 4 years ago

Thanks everyone for the help solving and verifying this issue. Closing, but feel free to reopen if it isn't 100% fixed

stroncium commented 4 years ago

Any info on when release version lands(0.15.0?)? (I can test commit based version, but won't be able to use non-release version in production.)

cvgw commented 4 years ago

@stroncium plan is to release v0.15.0 this week

jamil-s commented 3 years ago

Hey folks, we're seeing the same issue on the most recent debug build. In a multi-stage build, the first layer generates a new image and yet the second layer still results to the cached version, not sure if there's been a regression in this

DXCanas commented 3 years ago

Also seeing this in latest

KasnocknDave commented 3 years ago

Also seeing this in 1.6, examples described here https://github.com/GoogleContainerTools/kaniko/issues/1348 as well.