Open bobcatfish opened 4 years ago
@bobcatfish I don't think using --whitelist=/var/run
is the right approach here.
You would not want your secrets end up in the image.
I tried building your image like this to inspect what the FS looks like.
docker run -it --entrypoint /busybox/sh -v /Users/tejaldesai/workspace/recreate:/workspace -v /Users/tejaldesai/workspace/keys/tejal-test.json:/var/run/secrets/SECRET.json:ro gcr.io/kaniko-project/executor:debug-v0.23.0
/ # /kaniko/executor --context=dir://workspace --no-push
...
OK: 12729 distinct packages available
(1/2) Upgrading alpine-baselayout (3.2.0-r6 -> 3.2.0-r7)
Executing alpine-baselayout-3.2.0-r7.pre-upgrade
rm: can't remove '/var/run/secrets/SECRET.json': Resource busy
ERROR: alpine-baselayout-3.2.0-r7: failed to rename var/.apk.f752bb51c942c7b3b4e0cf24875e21be9cdcd4595d8db384 to var/run.
Executing alpine-baselayout-3.2.0-r7.post-upgrade
(2/2) Upgrading ca-certificates-bundle (20191127-r2 -> 20191127-r3)
Executing busybox-1.31.1-r16.trigger
Executing ca-certificates-20191127-r3.trigger
1 error; 27 MiB in 25 packages
error building image: error building stage: failed to execute command: waiting for process to exit: exit status 1
it failed as expected.
The files in the /var/.apk
dir is the secret file.
/ # ls -al /var/.apk.f752bb51c942c7b3b4e0cf24875e21be9cdcd4595d8db384/
total 12
drwxr-xr-x 3 root root 4096 Jun 5 02:18 .
drwxr-xr-x 1 root root 4096 Jun 5 02:08 ..
drwxr-xr-x 2 root root 4096 Jun 5 02:07 secrets
/ #
Another question I had is, from the pre_upgrade
script, it is not clear where rename happens.
I removed the read-only secret mounted to /var/run
and the build works fine.
docker run -it --entrypoint /busybox/sh -v /Users/tejaldesai/workspace/recreate:/workspace gcr.io/kaniko-project/executor:debug-v0.23.0
/ # /kaniko/executor --context dir://workspace --no-push
INFO[0000] Retrieving image manifest alpine:3.12
INFO[0001] Retrieving image manifest alpine:3.12
INFO[0002] Built cross stage deps: map[]
INFO[0002] Retrieving image manifest alpine:3.12
INFO[0004] Retrieving image manifest alpine:3.12
INFO[0005] Executing 0 build triggers
INFO[0005] Unpacking rootfs as cmd RUN apk add --update git openssh-client && apk update && apk upgrade requires it.
INFO[0005] RUN apk add --update git openssh-client && apk update && apk upgrade
INFO[0005] Taking snapshot of full filesystem...
INFO[0005] Resolving 491 paths
INFO[0005] cmd: /bin/sh
INFO[0005] args: [-c apk add --update git openssh-client && apk update && apk upgrade]
INFO[0005] Running: [/bin/sh -c apk add --update git openssh-client && apk update && apk upgrade]
fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/community/x86_64/APKINDEX.tar.gz
(1/11) Installing ca-certificates (20191127-r3)
(2/11) Installing nghttp2-libs (1.41.0-r0)
(3/11) Installing libcurl (7.69.1-r0)
(4/11) Installing expat (2.2.9-r1)
(5/11) Installing pcre2 (10.35-r0)
(6/11) Installing git (2.26.2-r0)
(7/11) Installing openssh-keygen (8.3_p1-r0)
(8/11) Installing ncurses-terminfo-base (6.2_p20200523-r0)
(9/11) Installing ncurses-libs (6.2_p20200523-r0)
(10/11) Installing libedit (20191231.3.1-r0)
(11/11) Installing openssh-client (8.3_p1-r0)
Executing busybox-1.31.1-r16.trigger
Executing ca-certificates-20191127-r3.trigger
OK: 27 MiB in 25 packages
fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/community/x86_64/APKINDEX.tar.gz
v3.12.0-45-g0e4d4e3558 [http://dl-cdn.alpinelinux.org/alpine/v3.12/main]
v3.12.0-46-g02e8db0c3e [http://dl-cdn.alpinelinux.org/alpine/v3.12/community]
OK: 12729 distinct packages available
(1/2) Upgrading alpine-baselayout (3.2.0-r6 -> 3.2.0-r7)
Executing alpine-baselayout-3.2.0-r7.pre-upgrade
Executing alpine-baselayout-3.2.0-r7.post-upgrade
(2/2) Upgrading ca-certificates-bundle (20191127-r2 -> 20191127-r3)
Executing busybox-1.31.1-r16.trigger
Executing ca-certificates-20191127-r3.trigger
OK: 27 MiB in 25 packages
INFO[0008] Taking snapshot of full filesystem...
INFO[0008] Resolving 1211 paths
INFO[0009] RUN ls -al /var/run
INFO[0009] cmd: /bin/sh
INFO[0009] args: [-c ls -al /var/run]
INFO[0009] Running: [/bin/sh -c ls -al /var/run]
lrwxrwxrwx 1 root root 4 Jun 5 02:23 /var/run -> /run
INFO[0009] Taking snapshot of full filesystem...
INFO[0009] Resolving 1211 paths
INFO[0009] No files were changed, appending empty layer to config. No layer added to image.
INFO[0009] Skipping push to container registry due to --no-push flag
/ #
Can the read only secret be mounted in another dir?
Another option is to look into apk upgrade --no-commit-hooks
flag. However, not sure if that would have any side-effects.
I will keep looking for something better.
Can the read only secret be mounted in another dir?
This wouldn't work on Kubernetes that mounts the service account secret automatically under /var/run right?
I tried with --no-scripts
or --no-commit-hooks
but it doesn't help either.
So one ugly hack would be to install your package in a different root and then copy it over / in a final scratch image.
I made it work from this minimal Dockerfile directly inside a Kubernetes container:
FROM alpine:3.12 AS SRC
RUN set -x; \
# Actually make the installation in a different root dir
mkdir -p /proot/etc; \
\
apk -p /proot add --initdb && \
\
cp -r /etc/apk /proot/etc; \
\
apk -p /proot update && \
apk -p /proot fix && \
apk -p /proot add curl ca-certificates tzdata zip unzip openssl && \
\
<whatever needs to be done> \
\
# Clean up
rm -rf /proot/dev; \
rm -rf /proot/sys; \
rm -rf /proot/proc; \
unlink /proot/var/run; \
rm -rf /proot/var/cache/apk/*
FROM scratch
COPY --from=SRC /proot/ /
RUN <all the commands you'd have run after your pkg install>
Indeed this is only a basic workaround as this will come to bite you back anytime you need to install more packages in an image dependant of this one.... If you don't have much specific needs at least it builds an alpine:3.12 :D
Another hack would be to not fail if apk upgrade
fails due to error " ERROR: alpine-baselayout-3.2.0-r7: failed to rename var/.apk...."
Say, you create an upgrade script apk-upgrade.sh
#!/bin/bash
ERR='apk add --update git openssh-client && apk update && apk upgrade alpine-baselayout'
EXIT_CODE=$?
// if exit code is 0 then return exit code
PERMISSIBLE_ERR="ERROR: alpine-baselayout-3.2.0-r7: failed to rename"
if [[ "$ERR" == *"$PERMISSIBLE_ERR"* ]]; then
// Swallow error
exit 0
fi
// probably some other error
exit 1
Wouldn't it be easier for kaniko to extract images and run commands in a separate root context ? How difficult would it be to implement?
After looking at the output of --no-scripts
looks like the error is actually happening when upgrading alpine-baselayout-3.2.0-r7
(1/2) Upgrading alpine-baselayout (3.2.0-r6 -> 3.2.0-r7)
ERROR: alpine-baselayout-3.2.0-r7: failed to rename var/.apk.f752bb51c942c7b3b4e0cf24875e21be9cdcd4595d8db384 to var/run.
(2/2) Upgrading ca-certificates-bundle (20191127-r2 -> 20191127-r3)
However it is still installed.
/ # /sbin/apk list | grep alpine-baselayout
alpine-baselayout-3.2.0-r7 x86_64 {alpine-baselayout} (GPL-2.0-only) [installed]
Wouldn't it be easier for kaniko to extract images and run commands in a separate root context ? How difficult would it be to implement?
@olivier-mauras That would involve some major design changes. The way kaniko executed run command is, it actually calls a exec.Commad.Start
I am not sure how to run the command in a separate root context.
Do you mean we map "/" to "/tmp_run_XXX" ?
Do you mean we map "/" to "/tmp_run_XXX" ?
Yeah like using chroot so that you don't have any mixups...
There's problems doing simple things like COPY --from=another / /
because kaniko works on its own root and then tries to copy /dev,/sys,/proc and the likes.
Would that work? https://golang.org/pkg/syscall/#Chroot
EDIT: https://github.com/GoogleContainerTools/kaniko/blob/master/pkg/commands/run.go#L210 Am I understanding correctly that RootDir
could be changed but there's just no option to do so?
If we choose this approach for every command, i,e map "/" to another directory in "tmp", then i see 2 issues.
Run
command uses commands installed in paths relative to "/" how would that work.Another approach would be to map "/" to "/tmp/kanikoRootXXX" at the beginning of the build. (which is probably what you are suggesting in the edit)
I think that could work but we need to do something like this for all the Metadata commands like "ENV", "WORKDIR". Also for all the base images, we need to map their ImageConfig.Env
paths to be relative to this new chroot.
I don't think its not feasible or ugly. It could be a little hard to wire up. I would not mind pursuing this direction.
Another approach would be to map "/" to "/tmp/kanikoRootXXX" at the beginning of the build. (which is probably what you are suggesting in the edit)
Exactly
I don't think its not feasible or ugly. It could be a little hard to wire up. I would not mind pursuing this direction.
This would probably solve quite a bunch of COPY
issue at once
The only caveat is, currently i am the only one working actively on this project in 20% capacity. I wont be able to get this in soon. I can definitely help design/review this.
I'm running into this too. Upgrading from a Ruby image that uses 3.10 to 3.12 and I'm hitting this in my Gitlab CI. Unsure what the best path forward is there.
One particularly quick fix is : apk upgrade --no-cache --ignore alpine-baselayout
. Though be warned, apk explicitly says that partial upgrades aren't supported (but at least you can test).
@bobcatfish can you share the workaround that is working for you? We are running into the same issues using Kaniko in our Tekton pipelines.
Hey @jpower432 - our workaround is just to pin to alpine 3.11 https://github.com/tektoncd/pipeline/pull/2757 This works for us because we didn't have any particular need to use 3.12 but won't work for you if you actually need 3.12 :O
same here
i have other issues continually w/ kaniko when any operations (like COPY
) go against prior targets that are symlinked. The original permissions always get removed, this does not happen w/ docker build. (v0.24)
Seeing this issue as well, using kaniko in a gitlab runner. I suppose the solution is to pin all alpine builds we have at 3.11?
@mattsurge yes.
@tejal29 ,
Do we have any timeline to fix this issue for the latest alpine builds, we are kinda blocked to use kaniko to build alpine images.
we can't ignore alpine-baselayout since it has core package updates.
apk upgrade --no-cache --ignore alpine-baselayout
Is this problem solved now?
Another solution is to not mount the service account token automatically: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#use-the-default-service-account-to-access-the-api-server Probably you don't need the token.
GitLab has a feature request to add this as an option: https://gitlab.com/gitlab-org/gitlab-runner/-/issues/4786
And without the mounted token there is no /var/run/secrets/kubernetes.io/serviceaccount
directory and therefore no problem.
@tejal29 ,
Do we have any timeline to fix this issue for the latest alpine builds, we are kinda blocked to use kaniko to build alpine images.
we can't ignore alpine-baselayout since it has core package updates.
apk upgrade --no-cache --ignore alpine-baselayout
Also asking about the timeline for a fix. Any updates?
I'm also affected by this issue while updating core packages of an alpine image 😞 . Is there any update on this or any known workarounds?
same issue here.. it happen when build image from new version of miniconda..
Dockerfile:
FROM continuumio/miniconda3:4.10.3-alpine RUN apk update && apk upgrade
Log:
Continuing the upgrade transaction with new apk-tools: (1/4) Upgrading busybox (1.33.1-r2 -> 1.33.1-r3) Executing busybox-1.33.1-r3.post-upgrade (2/4) Upgrading alpine-baselayout (3.2.0-r15 -> 3.2.0-r16) Executing alpine-baselayout-3.2.0-r16.pre-upgrade rm: can't remove '/var/run/secrets/kubernetes.io/serviceaccount/..data': Read-only file system rm: can't remove '/var/run/secrets/kubernetes.io/serviceaccount/token': Read-only file system rm: can't remove '/var/run/secrets/kubernetes.io/serviceaccount/namespace': Read-only file system rm: can't remove '/var/run/secrets/kubernetes.io/serviceaccount/ca.crt': Read-only file system rm: can't remove '/var/run/secrets/kubernetes.io/serviceaccount/..2021_08_18_08_58_09.380050180/token': Read-only file system rm: can't remove '/var/run/secrets/kubernetes.io/serviceaccount/..2021_08_18_08_58_09.380050180/namespace': Read-only file system rm: can't remove '/var/run/secrets/kubernetes.io/serviceaccount/..2021_08_18_08_58_09.380050180/ca.crt': Read-only file system ERROR: alpine-baselayout-3.2.0-r16: failed to rename var/.apk.f752bb51c942c7b3b4e0cf24875e21be9cdcd4595d8db384 to var/run.
This is also causing a problem building a custom image based on nginx alpine image and building it with kaniko.
In the nginx alpine image, nginx.conf uses pid file location /var/run/nginx.pid
If I build a custom image off nginx alpine but want it to run as non-root user, I need to create an empty file /var/run/nginx.pid
in the image and set the ownership of this file to the non-root user:
This works fine when building with docker:
COPY --chown=nginx:nginx nginx.pid /var/run/nginx.pid
However, it doesn't work when using kaniko because this mount issue causes any file I put in /var/run to be deleted.
Workaround is to change pid path in nginx.conf.
RUN sed -i 's,/var/run/nginx.pid,/tmp/nginx.pid,' /etc/nginx/nginx.conf
COPY --chown=nginx:nginx nginx.pid /tmp/nginx.pid
We've just hit this issue with alpine:latest, which is currently the same as alpine 3, 3.16, and 3.16.0:
(2/2) Upgrading alpine-baselayout (3.2.0-r20 -> 3.2.0-r21)
Executing alpine-baselayout-3.2.0-r21.pre-upgrade
rm: can't remove '/var/run/secrets/eks.amazonaws.com/serviceaccount/..data': Read-only file system
I believe there is an issue with the pre upgrade script in for the alpine-baselayout package: https://git.alpinelinux.org/aports/tree/main/alpine-baselayout/alpine-baselayout.pre-upgrade#n18
The script is erroneously detecting that /var/run is a directory when it is already a symlink. I have filed an issue with the alpine project: https://gitlab.alpinelinux.org/alpine/aports/-/issues/13917
Update:
I have filed a merge request to fix the pre upgrade script: https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/35151
Update 2: The merge request I filed has been merged into prod, but it did not solve the issue. Just a coincidence that the alpine test was erroneously detecting that /var/run was a directory when it was actually a symlink. It appears that kaniko is overloading /var/run during the docker build, and it actually is a directory.
INFO[0001] RUN ls -l /var
INFO[0001] cmd: /bin/sh
INFO[0001] args: [-c ls -l /var]
INFO[0001] Running: [/bin/sh -c ls -l /var]
total 0
drwxr-xr-x 4 root root 29 Jun 15 00:23 cache
dr-xr-xr-x 2 root root 6 Jun 15 00:23 empty
drwxr-xr-x 5 root root 43 Jun 15 00:23 lib
drwxr-xr-x 2 root root 6 Jun 15 00:23 local
drwxr-xr-x 3 root root 20 Jun 15 00:23 lock
drwxr-xr-x 2 root root 6 Jun 15 00:23 log
drwxr-xr-x 2 root root 6 Jun 15 00:23 mail
drwxr-xr-x 2 root root 6 Jun 15 00:23 opt
drwxr-xr-x 3 root root 21 Jun 15 00:23 run <<<< this is a directory, even though it should be a symlink based on the alpine container version.
drwxr-xr-x 3 root root 30 Jun 15 00:23 spool
drwxrwxrwt 2 root root 6 Jun 15 00:23 tmp
update 3: I have found a very dumb work around... I just mv /var in my gitlab-ci.yml file like this:
build-containers:
stage: releases
image:
name: gcr.io/kaniko-project/executor:debug
entrypoint: [""]
script:
- mv /var /var-orig
- /kaniko/executor build commands
+1 for the same issue. But the strange part is that it fails only on GitLab runner. Just tried to run it locally and I can't reproduce it. Very strange. My current Dockerfile
:
FROM node:14-alpine as build-stage
ARG REACT_APP_CONTROLLER_API_URL
ARG REACT_APP_ENVIRONMENT
USER node
WORKDIR /home/node
# almost always cached
COPY --chown=node:node package.json package-lock.json ./
RUN npm ci
# pretty much never cached
COPY --chown=node:node ./ ./
RUN npm run build
FROM nginx:stable-alpine
RUN apk --no-cache upgrade
COPY --from=build-stage /home/node/build /usr/share/nginx/html
COPY default.conf /etc/nginx/conf.d/default.conf
CMD [ "nginx","-g","daemon off;" ]
EXPOSE 80
gitlab-ci.yml
(it's parts are imported from another file, so I just paste the relevant part for the job):
.is-sandbox: &is-sandbox
if: '$CI_COMMIT_BRANCH == "master"'
.build:
stage: build
image:
name: gcr.io/kaniko-project/executor:debug
entrypoint: ['']
before_script:
- mkdir -p /kaniko/.docker
- echo -e "{\"auths\":{\"$CI_REGISTRY\":{\"auth\":\"$(echo -n ${CI_REGISTRY_USER}:${CI_REGISTRY_PASSWORD} | base64)\"}}}" > /kaniko/.docker/config.json
script:
- /kaniko/executor --context ${KANIKO_CONTEXT:-$CI_PROJECT_DIR} --dockerfile ${KANIKO_DOCKERFILE:-$CI_PROJECT_DIR/Dockerfile} --destination $CI_REGISTRY_IMAGE:${CONTAINER_TAG:-$CI_COMMIT_TAG} ${KANIKO_EXTRAARGS:-}
build_sandbox:
extends: .build
variables:
CONTAINER_TAG: sandbox-$CI_COMMIT_SHORT_SHA
KANIKO_EXTRAARGS: '--build-arg REACT_APP_CONTROLLER_API_URL=https://api-sandbox.domain.com --build-arg REACT_APP_ENVIRONMENT=sandbox'
rules:
- <<: *is-sandbox
For now I'm working this around just by removing RUN apk --no-cache upgrade
, but I'd like to get a proper fix or workaround. Any ideas are welcome!
+1 for the same issue. But the strange part is that it fails only on GitLab runner. Just tried to run it locally and I can't reproduce it. Very strange.
If you are running in kaniko in kubernetes and locally in just bare docker. It is because kubernetes mounts some service account secrets under /var/run.
https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/
+1 for the same issue. But the strange part is that it fails only on GitLab runner. Just tried to run it locally and I can't reproduce it. Very strange.
If you are running in kaniko in kubernetes and locally in just bare docker. It is because kubernetes mounts some service account secrets under /var/run.
https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/
I was trying to run it on plain Ubuntu server, not on kubernetes
+1 upgrading in Gitlab pipeline using kaniko only works with apk upgrade --ignore alpine-baselayout
.
It seems like there is a setting for this? https://github.com/GoogleContainerTools/kaniko#flag---ignore-var-run
@tobiasmcnulty --ignore-var-run=false
doesn't appear to work either
/kaniko/executor --ignore-var-run=false --snapshotMode=redo --single-snapshot
error building image: error building stage: failed to get filesystem from image: error removing var/run to make way for new symlink: unlinkat /var/run/secrets/kubernetes.io/serviceaccount/..data: read-only file system
@tobiasmcnulty
--ignore-var-run=false
doesn't appear to work either
/kaniko/executor --ignore-var-run=false --snapshotMode=redo --single-snapshot
error building image: error building stage: failed to get filesystem from image: error removing var/run to make way for new symlink: unlinkat /var/run/secrets/kubernetes.io/serviceaccount/..data: read-only file system
Boo. Okay. Thanks for testing it. Just saw this in the docs looking for something else and didn't see it mentioned elsewhere on this issue.
Note that if it did work, and you are building on K8s, you'd probably end up with secrets in your image.
Hey folks, any news about this error? I'm getting this when I try to run apk update
:
Log:
-------........-------
INFO[0058] Args: [-c apk update]
INFO[0058] Running: [/bin/sh -c apk update]
fetch https://dl-cdn.alpinelinux.org/alpine/v3.16/main/aarch64/APKINDEX.tar.gz
ERROR: https://dl-cdn.alpinelinux.org/alpine/v3.16/main: temporary error (try again later)
WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.16/main: No such file or directory
fetch https://dl-cdn.alpinelinux.org/alpine/v3.16/community/aarch64/APKINDEX.tar.gz
ERROR: https://dl-cdn.alpinelinux.org/alpine/v3.16/community: temporary error (try again later)
WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.16/community: No such file or directory
2 errors; 14 distinct packages available
error building image: error building stage: failed to execute command: waiting for process to exit: exit status 2
Bump :)
Bump. I'm trying something else, where I need to delete the content of /var/* and it's failing due to this.
Bump
It seems like there is a setting for this? https://github.com/GoogleContainerTools/kaniko#flag---ignore-var-run
Thanks bro, it's worked for me.
Actual behavior
Tekton is using Kaniko to build a Docker image from alpine and recently the builds started failing.
TL;DR
The alpine:3.12 image has /var/run aliased to /run. When running kaniko in a kubernetes pod with service accounts, the serviceaccounts often seem to end up mounted to /var/run.
Kaniko is ignoring the contents and state of /var/run in the base image (alpine:3.12) but unfortunately some details of alpine seem to depend on /var/run being a symlink to /run, and so not preserving that is causing upgrading alpine packages to fail.
Details
We discovered this in https://github.com/tektoncd/pipeline/issues/2738.
It seems the problem is caused by recent versions of alpine-baselayout in alpine3.12. When we build from alpine 3.12 and upgrade all alpine packages, the alpine-baselayout upgrade fails:
Expected behavior
Kaniko should detect that /var/run is a symlink in the base image and preserve that. (I think! I'm not sure if it's that simple.)
To Reproduce
Using this dockerfile and mounting a file into /var/run, I can build with docker but not with Kaniko.
Trying to build with kaniko:
The error above about not being able to remove the file seems to come from https://git.alpinelinux.org/aports/tree/main/alpine-baselayout/alpine-baselayout.pre-upgrade which works just fine if /var/run is a symlink to /run, which I discovered by trying to do the same thing by using the alpine image directly without kaniko:
That works just fine!
I tried not whitelisting /var/run and that didn't work either:
Finally, using docker to build the image (from the pipelines repo checkout) worked just fine:
Additional Information
Kaniko Image (fully qualified with digest) gcr.io/kaniko-project/executor:v0.17.1 @ 970d32fa1eb2 but also v0.23.0
Triage Notes for the Maintainers
--cache
flag