GoogleContainerTools / kaniko

Build Container Images In Kubernetes
Apache License 2.0
14.6k stars 1.42k forks source link

Executor deletes the context directory in multi-stage builds resulting in loss of command pathing #1572

Open rcollette opened 3 years ago

rcollette commented 3 years ago

Actual behavior The context directory is deleted when targeting a second stage of a multi-stage build when the second stage relies on a prior stage.

Expected behavior The context directory should not be deleted.

To Reproduce Steps to reproduce the behavior:

  1. Start an interactive session of the debug executor with: docker run -it --entrypoint sh gcr.io/kaniko-project/executor:debug-v1.3.0
  2. Run the following commands to create a builds directory and Dockerfile
    mkdir builds
    cd builds
    vi Dockerfile

    Insert the following into the Dockerfile and save

FROM node:14.15-alpine3.12 AS base
LABEL type="build"

FROM base as build-and-test
LABEL type="build-and-test"
  1. Run
    /kaniko/executor --context /builds --no-push --target build-and-test

Observe the following

INFO[0000] Resolved base name node:14.15-alpine3.12 to base 
INFO[0000] Resolved base name base to build-and-test    
INFO[0000] Retrieving image manifest node:14.15-alpine3.12 
INFO[0000] Retrieving image node:14.15-alpine3.12       
INFO[0000] Retrieving image manifest node:14.15-alpine3.12 
INFO[0000] Retrieving image node:14.15-alpine3.12       
INFO[0001] Built cross stage deps: map[]                
INFO[0001] Retrieving image manifest node:14.15-alpine3.12 
INFO[0001] Retrieving image node:14.15-alpine3.12       
INFO[0002] Retrieving image manifest node:14.15-alpine3.12 
INFO[0002] Retrieving image node:14.15-alpine3.12       
INFO[0002] Executing 0 build triggers                   
INFO[0002] Skipping unpacking as no commands require it. 
INFO[0002] LABEL type="build"                           
INFO[0002] Applying label type=build                    
INFO[0002] Storing source image from stage 0 at path /kaniko/stages/0 
INFO[0006] Deleting filesystem...                       
INFO[0006] Base image from previous stage 0 found, using saved tar at path /kaniko/stages/0 
INFO[0006] Executing 0 build triggers                   
INFO[0006] Skipping unpacking as no commands require it. 
INFO[0006] LABEL type="build-and-test"                  
INFO[0006] Applying label type=build-and-test           
INFO[0006] Skipping push to container registry due to --no-push flag 
sh: getcwd: No such file or directory
(unknown) #

The current working directory has been deleted.

Commands from the current working directory will not function

(unknown) # ls
sh: getcwd: No such file or directory
(unknown) # /busybox/ls
sh: getcwd: No such file or directory

When running with the Gitlab Runner Operator for Openshift, this behavior causes a loss of command pathing and the ability to locate the busybox directory itself, even if you first change directories to root (outside the current context)

.gitlab-ci.yaml

stages:
  - build

# TEMPLATES
.runner_tags_template: &runners
  tags:
    - pdx
    - dind

.except_master_and_prodfix_template: &except_master_and_prodfix
  except:
    - /^prodfix\/.*$/
    - master
    - tags

# BUILD
build:
  stage: build
  <<: *runners
  <<: *except_master_and_prodfix
  image:
    name: gcr.io/kaniko-project/executor:debug-v1.3.0
    entrypoint: ["sh"]
  script:
    - cd /
    - pwd
    - /kaniko/executor
      --context $CI_PROJECT_DIR
      --no-push
      --target build-and-test
    - cd /
    - pwd
    - cd /busybox
    - ls

gitlab build log (note the directory not found message appears in the log before the command that caused it)

Running with gitlab-runner 12.9.0 (4c96e5ad)
  on pdx-gitlab-3-runner-d7f85cf7f-stxzl _TrpUuzy
section_start:1612649568:prepare_executor
Preparing the "kubernetes" executor
Using Kubernetes namespace: gitlab-runners
Using Kubernetes executor with image gcr.io/kaniko-project/executor:debug-v1.3.0 ...
section_end:1612649568:prepare_executor
section_start:1612649568:prepare_script
Preparing environment
Waiting for pod gitlab-runners/runner-trpuuzy-project-20222355-concurrent-0dhzst to be running, status is Pending
Waiting for pod gitlab-runners/runner-trpuuzy-project-20222355-concurrent-0dhzst to be running, status is Pending
Running on runner-trpuuzy-project-20222355-concurrent-0dhzst via pdx-gitlab-3-runner-d7f85cf7f-stxzl...
section_end:1612649574:prepare_script
section_start:1612649574:get_sources
Getting source from Git repository
Fetching changes with git depth set to 50...
Initialized empty Git repository in /builds/preciselydata/cloud/pdx/product_definition_tool/.git/
Created fresh repository.
From https://gitlab.com/preciselydata/cloud/pdx/product_definition_tool
 * [new ref]         7d969d80cff9d699e51b126e0359937c2cb5a526 -> refs/pipelines/252507426
 * [new branch]      testBusyBox -> origin/testBusyBox
 * [new tag]         1.0.1468.f3c0870e -> 1.0.1468.f3c0870e
 * [new tag]         1.0.1469.7169ceb0 -> 1.0.1469.7169ceb0
 * [new tag]         1.0.1470.111f6d88 -> 1.0.1470.111f6d88
 * [new tag]         1.0.1471.98614d1c -> 1.0.1471.98614d1c
 * [new tag]         1.0.1473.e1ce3ccd -> 1.0.1473.e1ce3ccd
 * [new tag]         1.0.1474.1087dd94 -> 1.0.1474.1087dd94
 * [new tag]         1.0.1475.3c04842a -> 1.0.1475.3c04842a
Checking out 7d969d80 as testBusyBox...

Skipping Git submodules setup
section_end:1612649580:get_sources
section_start:1612649580:restore_cache
Restoring cache
section_end:1612649580:restore_cache
section_start:1612649580:download_artifacts
Downloading artifacts
section_end:1612649580:download_artifacts
section_start:1612649580:build_script
Running before_script and script
$ cd /
$ pwd
/
$ /kaniko/executor --context $CI_PROJECT_DIR --no-push --target build-and-test
INFO[0000] Resolved base name node:14.15-alpine3.12 to base 
INFO[0000] Resolved base name base to build-and-test    
INFO[0000] Using dockerignore file: /builds/preciselydata/cloud/pdx/product_definition_tool/.dockerignore 
INFO[0000] Retrieving image manifest node:14.15-alpine3.12 
INFO[0000] Retrieving image node:14.15-alpine3.12       
E0206 22:12:25.176806      18 aws_credentials.go:77] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.
    For verbose messaging see aws.Config.CredentialsChainVerboseErrors
INFO[0006] Retrieving image manifest node:14.15-alpine3.12 
INFO[0006] Retrieving image node:14.15-alpine3.12       
INFO[0007] Built cross stage deps: map[]                
INFO[0007] Retrieving image manifest node:14.15-alpine3.12 
INFO[0007] Retrieving image node:14.15-alpine3.12       
INFO[0007] Retrieving image manifest node:14.15-alpine3.12 
INFO[0007] Retrieving image node:14.15-alpine3.12       
INFO[0007] Executing 0 build triggers                   
INFO[0007] Skipping unpacking as no commands require it. 
INFO[0007] LABEL type="build"                           
INFO[0007] Applying label type=build                    
INFO[0007] Storing source image from stage 0 at path /kaniko/stages/0 
INFO[0008] Deleting filesystem...                       
INFO[0008] Base image from previous stage 0 found, using saved tar at path /kaniko/stages/0 
INFO[0008] Executing 0 build triggers                   
INFO[0009] Skipping unpacking as no commands require it. 
INFO[0009] LABEL type="build-and-test"                  
INFO[0009] Applying label type=build-and-test           
INFO[0009] Skipping push to container registry due to --no-push flag 
/busybox/sh: cd: line 145: can't cd to /busybox: No such file or directory
$ cd /
$ pwd
/
$ cd /busybox
section_end:1612649590:build_script
section_start:1612649590:after_script
Running after_script
time="2021-02-06T22:12:28Z" level=error msg="exec failed: container_linux.go:349: starting container process caused \"exec: \\\"sh\\\": executable file not found in $PATH\""
exec failed: container_linux.go:349: starting container process caused "exec: \"sh\": executable file not found in $PATH"
section_end:1612649590:after_script
section_start:1612649590:upload_artifacts_on_failure
Uploading artifacts for failed job
section_end:1612649590:upload_artifacts_on_failure
ERROR: Job failed: command terminated with exit code 2


With commands like ls and cp no longer functioning due to the missing busybox directory, build artifacts cannot be copied back to the context directory path, which is where they must be located for Gitlab to be able to store them using its artifacts functionality.

Additional Information

FROM base as build-and-test LABEL type="build-and-test"


 - Build Context
    Nothing local required.
 - Kaniko Image (fully qualified with digest)
    gcr.io/kaniko-project/executor@sha256:473d6dfb011c69f32192e668d86a47c0235791e7e857c870ad70c5e86ec07e8c

 **Triage Notes for the Maintainers**
 <!-- 🎉🎉🎉 Thank you for an opening an issue !!! 🎉🎉🎉
We are doing our best to get to this. Please help us by helping us prioritize your issue by filling the section below -->

 | **Description** | **Yes/No** |
 |----------------|---------------|
 | Please check if this a new feature you are proposing        | <ul><li>- [ ] </li></ul>|
 | Please check if the build works in docker but not in kaniko | <ul><li>- [X] </li></ul>| 
 | Please check if this error is seen when you use `--cache` flag | <ul><li>- [X] </li></ul>|
 | Please check if your dockerfile is a multistage dockerfile | <ul><li>- [X] </li></ul>| 
qalinn commented 3 years ago

@rcollette Did you find any workaround for this problem?

rcollette commented 3 years ago

@qalinn - I have not found a workaround.

rcollette commented 1 year ago

@qalinn The workaround I have used looks like:

Gitlab build job

# BUILD
build:
  stage: build
  interruptible: true
  extends: .kubernetes_runners
  <<: *except_master_and_prodfix
  variables:
    AWS_ACCESS_KEY_ID: $DEV_AWS_ACCESS_KEY_ID
    AWS_SECRET_ACCESS_KEY: $DEV_AWS_SECRET_ACCESS_KEY
  image:
    name: gcr.io/kaniko-project/executor:$KANIKO_EXECUTOR_VERSION
    entrypoint: [ "sh" ]
  script:
    # We cannot git merge from master here because busybox used in Kaniko does not have git nor does it have
    # a package installer.
    # Docker command is not available in Kaniko image so we have to create the .docker/config.json file manually.
    - echo "$DOCKER_AUTH_CONFIG" > /kaniko/.docker/config.json
    #This builds and image but does not push to the registry
    - /kaniko/executor
      --context $CI_PROJECT_DIR
      --no-push
      --skip-unused-stages=true
      --cache=true
      --cache-repo=${CI_REGISTRY_IMAGE}/cache
      --log-timestamp=true
      --log-format=text
      --target build-and-test
    # Copy artifacts from the root where the Dockerfile copied them, to the project dir because artifacts
    # can only be captured from there.
    - cp -R /reports $CI_PROJECT_DIR
  artifacts:
    when: always
    expire_in: 30 days
    expose_as: Code coverage report
    paths:
      - reports/coverage/index.html
      - reports/coverage
    reports:
      coverage_report: 
        coverage_format: cobertura
        path: reports/coverage/Cobertura.xml
      junit:
        - reports/unit-tests/*test-result.xml

Dockerfile - The build and test stage moves the reports folder to the root. This is the key to preservation.

#With restore completed, now copy over the entire application.
FROM restore as build-and-test
LABEL type="build"
ARG SOLUTION_NAME="Precisely.Pdx.Api"
ARG VERSION="0.0.0"
WORKDIR /app_build/$SOLUTION_NAME
COPY $SOLUTION_NAME .
# We still want to capture coverage reports even if there was a coverage threshold or test error
# Any artifacts we want to capture have to be moved to root.
# The kaniko working directory is removed when kaniko finishes working.
RUN ./coverage.sh || flag=1 ; \
    mv reports / ; \
    exit $flag
RUN dotnet publish $SOLUTION_NAME.Web --output /dist --configuration Release --no-restore /p:Version=$VERSION

This solution works for us because, though the files are copied to the root of the gitlab runner instance and that might be unnerving, the runner is an ephemeral kubernetes pod so it's not going to conflict with other build jobs.

It would be a little more straight forward if the build context was preserved perhaps in a known/controlled location and perhaps by using a CLI option so as not to break existing behavior.

pasfl commented 4 months ago

Is there any news on this? i ran into it with a three stage build (common setup, build, final image) where i need to copy the context in the second stage. So the above workaround does not work for me.