GoogleContainerTools / kaniko

Build Container Images In Kubernetes
Apache License 2.0
14.79k stars 1.44k forks source link

Multi-stage builds silently crashing #2249

Closed alucryd closed 1 year ago

alucryd commented 2 years ago

Actual behavior The kaniko build silently crashes after taking the full filesystem snapshot with no useful error. Works fine with dind. Disabling the Kaniko cache doesn't help.

Expected behavior The build should complete with no issue.

To Reproduce Steps to reproduce the behavior:

  1. Have your Gitlab Runner in a GKE autopilot
  2. Run your gitlab CI with Kaniko instead of dind

Additional Information

FROM gcr.io/distroless/nodejs:16 COPY --from=builder /app /app WORKDIR /app EXPOSE 8080 CMD ["--experimental-modules", "--experimental-json-modules", "src/server.js"]

 - Build Context
Multistage build, first stage copies the Express JS app and installs dependencies, second stage reuses the app directory to produce a distroless image.
 - Kaniko Image (fully qualified with digest)
 `gcr.io/kaniko-project/executor:a8498c762f34aabc62966c69169b79a04e04a4d5-debug`, v1.9.0-debug

 **Triage Notes for the Maintainers**
 <!-- πŸŽ‰πŸŽ‰πŸŽ‰ Thank you for an opening an issue !!! πŸŽ‰πŸŽ‰πŸŽ‰
We are doing our best to get to this. Please help us by helping us prioritize your issue by filling the section below -->

CI log:

Executing "step_script" stage of the job script 01:09 $ mkdir -p /kaniko/.docker $ echo "{\"auths\":{\"${CI_REGISTRY}\":{\"auth\":\"$(echo -n ${CI_REGISTRY_USER}:${CI_REGISTRY_PASSWORD} | base64)\"}}}" > /kaniko/.docker/config.json $ /kaniko/executor --context ${CI_PROJECT_DIR} --dockerfile ${CI_PROJECT_DIR}/Dockerfile --destination ${CI_REGISTRY_IMAGE}:${TAG} --destination ${CI_REGISTRY_IMAGE}:${LATEST_TAG} INFO[0000] Resolved base name node:16 to builder
INFO[0000] Retrieving image manifest node:16
INFO[0000] Retrieving image node:16 from registry index.docker.io INFO[0001] Retrieving image manifest gcr.io/distroless/nodejs:16 INFO[0001] Retrieving image gcr.io/distroless/nodejs:16 from registry gcr.io INFO[0002] Built cross stage deps: map[0:[/app]]
INFO[0002] Retrieving image manifest node:16
INFO[0002] Returning cached image manifest
INFO[0002] Executing 0 build triggers
INFO[0002] Building stage 'node:16' [idx: '0', base-idx: '-1'] INFO[0002] Unpacking rootfs as cmd COPY . /app requires it. INFO[0045] COPY . /app
INFO[0052] Taking snapshot of files...
INFO[0061] WORKDIR /app
INFO[0061] Cmd: workdir
INFO[0061] Changed working directory to /app
INFO[0061] No files changed in this command, skipping snapshotting. INFO[0061] RUN yarn install --frozen-lockfile --production INFO[0061] Initializing snapshotter ...
INFO[0061] Taking snapshot of full filesystem...
Cleaning up project directory and file based variables 00:00 ERROR: Job failed: pod "runner-yrykheow-project-61-concurrent-0gkkzz" status is "Failed"



 | **Description** | **Yes/No** |
 |----------------|---------------|
 | Please check if this a new feature you are proposing        | <ul><li>- [ ] </li></ul>|
 | Please check if the build works in docker but not in kaniko | <ul><li>- [x] </li></ul>| 
 | Please check if this error is seen when you use `--cache` flag | <ul><li>- [x] </li></ul>|
 | Please check if your dockerfile is a multistage dockerfile | <ul><li>- [x] </li></ul>| 
alucryd commented 2 years ago

Doubling memory request to 4Gi didn't help, so it doesn't appear to be OOM killed, I also tried 1.8.1 and 1.7.0, same result.

alucryd commented 2 years ago

FYI, going back to single-stage works, so this is an issue with multi-stage.

alucryd commented 2 years ago

Got another multistage Dockerfile that is crashing, unfortunately that one can't easily be converted to single stage.

FROM node:16 as builder

COPY . /app
WORKDIR /app
RUN yarn install --frozen-lockfile

ARG VITE_HIDE_INTERNAL
ARG VITE_HIDE_TRY_IT
ENV VITE_HIDE_INTERNAL=$VITE_HIDE_INTERNAL
ENV VITE_HIDE_TRY_IT=$VITE_HIDE_TRY_IT

RUN yarn build

FROM flashspys/nginx-static
COPY --from=builder /app/build /static
EXPOSE 80
JeromeJu commented 1 year ago

Looks like with the latest kaniko @HEAD (v1.17.0), I am not seeing the same error for the repo:

jeromeju@jju:~/kaniko$ ./run_in_docker.sh /dockerfile /usr/local/google/home/jeromeju/kaniko gcr.io/jju-dev/test:latest
INFO[0000] Resolved base name node:16 to builder        
INFO[0000] Using dockerignore file: /workspace/.dockerignore 
INFO[0000] Retrieving image manifest node:16            
INFO[0000] Retrieving image node:16 from registry index.docker.io 
INFO[0000] Retrieving image manifest gcr.io/distroless/nodejs:16 
INFO[0000] Retrieving image gcr.io/distroless/nodejs:16 from registry gcr.io 
INFO[0001] Built cross stage deps: map[0:[/app]]        
INFO[0001] Retrieving image manifest node:16            
INFO[0001] Returning cached image manifest              
INFO[0001] Executing 0 build triggers                   
INFO[0001] Building stage 'node:16' [idx: '0', base-idx: '-1'] 
INFO[0001] Unpacking rootfs as cmd COPY . /app requires it. 
INFO[0022] COPY . /app                                  
INFO[0027] Taking snapshot of files...                  
INFO[0030] WORKDIR /app                                 
INFO[0030] Cmd: workdir                                 
INFO[0030] Changed working directory to /app            
INFO[0030] No files changed in this command, skipping snapshotting. 
INFO[0030] RUN yarn install --frozen-lockfile --production 
INFO[0030] Initializing snapshotter ...                 
INFO[0030] Taking snapshot of full filesystem...        
INFO[0036] Cmd: /bin/sh                                 
INFO[0036] Args: [-c yarn install --frozen-lockfile --production] 
INFO[0036] Running: [/bin/sh -c yarn install --frozen-lockfile --production] 
yarn install v1.22.19
info No lockfile found.
[1/4] Resolving packages...
[2/4] Fetching packages...
[3/4] Linking dependencies...
[4/4] Building fresh packages...
Done in 0.06s.
INFO[0037] Taking snapshot of full filesystem...        
INFO[0039] Saving file app for later use                
INFO[0041] Deleting filesystem...                       
INFO[0043] Retrieving image manifest gcr.io/distroless/nodejs:16 
INFO[0043] Returning cached image manifest              
INFO[0043] Executing 0 build triggers                   
INFO[0043] Building stage 'gcr.io/distroless/nodejs:16' [idx: '1', base-idx: '-1'] 
INFO[0043] Unpacking rootfs as cmd COPY --from=builder /app /app requires it. 
INFO[0045] COPY --from=builder /app /app                
INFO[0047] Taking snapshot of files...                  
INFO[0050] WORKDIR /app                                 
INFO[0050] Cmd: workdir                                 
INFO[0050] Changed working directory to /app            
INFO[0050] No files changed in this command, skipping snapshotting. 
INFO[0050] EXPOSE 8080                                  
INFO[0050] Cmd: EXPOSE                                  
INFO[0050] Adding exposed port: 8080/tcp                
INFO[0050] CMD ["--experimental-modules", "--experimental-json-modules", "src/server.js"] 
INFO[0050] Pushing image to gcr.io/jju-dev/test:latest  
INFO[0053] Pushed gcr.io/jju-dev/test@sha256:d9b6d976408fa96a357f0dcb96856c544649cf3d31fac7bf3baf579b43c4175e 

Would you mind providing some updates on the current issue if it persists or we might close this?

alucryd commented 1 year ago

Apologies, I completely forgot to get back to you. It started working fine some time ago, this issue can definitely be closed. Thanks for the update!