Closed jskov-jyskebank-dk closed 3 years ago
If you are pulling an image with multiple UIDs then you need to run the container with multiple UIDs Usually you can do this as root. If you are running in openshift without being root, then this could cause issues. You could setup user namespace within the container, but it would still require CAP_SETUID and CAP_SETGID in order to start the user namespace within the container.
Yes, I am pulling the fedora image.
I do not mind CAP_SETUID/GID as much on their own. But I think running under the 'privileged' SCC would be a deal breaker. I will give it a shot though, just to confirm that I can get it running.
Do I understand the last sentence correctly? Is it possible to setup an additional user namespace inside the container, even though OpenShift only appears to provide it an uidmap size of 1?
Ta!
You should be able to get this to work. This is the way the buildah and podman images are configured in quay.io/buildah/stable and quay.io/podman/stable
https://github.com/containers/libpod/blob/master/contrib/podmanimage/stable/Dockerfile
The idea is to setup a user within the container and then to launch the container with that user. This is still not fully working, and I hope to get back to it, once we ship podman 2.0.
Just a progress update:
I have tried using the podman:stable image as base for my OpenShift container, and it does bring me past the user id problem.
So I can now pull images. Thanks!
But running anything in those pulled images still fails (due to missing fuse module, it seems).
I will explore further on Monday.
Waiting for the rain to end, so I collected a little more info.
Running an image fails with:
$ podman run -it docker.io/library/alpine /bin/sh -c "echo 'hello world!'"
ERRO[0000] error unmounting /home/.local/share/containers/storage/overlay/a8e3377a8f75d187a906823f1d8da6bfe5d37771b2d0a4354444f86f722a854c/merged: invalid argument
Error: error mounting storage for container c967d9189c3ca165788ca68d069cafd3a3f60fd95eb86c6726c6ef3215a20918: error creating overlay mount to /home/.local/share/containers/storage/overlay/a8e3377a8f75d187a906823f1d8da6bfe5d37771b2d0a4354444f86f722a854c/merged: using mount program /usr/bin/fuse-overlayfs: fuse: device not found, try 'modprobe fuse' first
fuse-overlayfs: cannot mount: No such file or directory
: exit status 1
The kernel is 4.18.0-147.8.1.el8_1.x86_64 (which should be new enough).
But according to https://developers.redhat.com/blog/2019/08/14/best-practices-for-running-buildah-in-a-container/ the container needs to be provided with access to /dev/fuse
.
And that matches the "setup" arguments: podman run --device /dev/fuse ...
in the link you provided.
Presumably this is a deal breaker in context of OpenShift?
You need to add /dev/fuse to the container, you can do this with CRI-O via the crio.conf for every container, which is what I am recommending we do by default. I am no Kubernetes expert, but I believe their is now a way to add devices via Kubernetes, which should also work.
BTW Great that you are working on this. I have plans to try this out, but am tied up in releasing podman2.0.
I got /dev/fuse added (via hostPath mapping in the Pod), but access to it fails:
$ podman run -it --device /dev/fuse:rw docker.io/library/alpine /bin/sh -c "echo 'hello'"
ERRO[0000] error unmounting /home/.local/share/containers/storage/overlay/06cdc160e71b46ce840709b7567a2bf377c96b51e40c139c437597a012bdef46/merged: invalid argument
Error: error mounting storage for container 9e3da9c3ae96a85d4315bdb09e93891578eff603dbe0a19a7345326298262c5f: error creating overlay mount to /home/.local/share/containers/storage/overlay/06cdc160e71b46ce840709b7567a2bf377c96b51e40c139c43759
7a012bdef46/merged: using mount program /usr/bin/fuse-overlayfs: fuse: failed to open /dev/fuse: Operation not permitted
fuse-overlayfs: cannot mount: Operation not permitted
: exit status 1
(full debug output at end of https://github.com/jskovjyskebankdk/openshift-podman)
Various issues on libpod suggests this may be a selinux issue, and our OpenShift installation runs with selinux enabled (as it should). Selinux is not something we are likely be able to tweak (unless it can be done from SCC or something).
Is there maybe something else that could be the cause of this problem?
Thanks!
Adding to this, if I run the container privileged, it actually does work.
sh-5.0$ podman run -it --device /dev/fuse:rw docker.io/library/alpine /bin/sh -c "echo 'hello'"
hello
But that just wets the appetite for running it without high privileges.
Any suggestions appreciated!
Run podman as root not rootless.
BTW, this works fine for me. $ podman run -it --device /dev/fuse:rw docker.io/library/alpine /bin/sh -c "echo 'hello'" hello
When running rootless do you have a /dev/fuse on your host?
$ ls /dev/fuse -l crw-rw-rw-. 1 root root 10, 229 Jun 18 15:50 /dev/fuse
Hm, run as root. I assume you mean something like the OpenShift anyuid SCC?
If I do that, it fails on just the info command - because podman no longer uses the rootless configuration entries?
sh-5.0# id
uid=0(root) gid=0(root) groups=0(root)
sh-5.0# podman --log-level debug info
DEBU[0000] Found deprecated file /usr/share/containers/libpod.conf, please remove. Use /etc/containers/containers.conf to override defaults.
DEBU[0000] Reading configuration file "/usr/share/containers/libpod.conf"
DEBU[0000] Ignoring lipod.conf EventsLogger setting "journald". Use containers.conf if you want to change this setting and remove libpod.conf files.
DEBU[0000] Reading configuration file "/usr/share/containers/containers.conf"
DEBU[0000] Merged system config "/usr/share/containers/containers.conf": &{{[] [] container-default [] host [CAP_AUDIT_WRITE CAP_CHOWN CAP_DAC_OVERRIDE CAP_FOWNER CAP_FSETID CAP_KILL CAP_MKNOD CAP_NET_BIND_SERVICE CAP_NET_RAW CAP_SETFCAP CAP_SETGID CAP_SETPCAP CAP_SETUID CAP_SYS_CHROOT] [] [nproc=1048576:1048576] [] [] [] false [PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] false false false private k8s-file -1 bridge false 2048 private /usr/share/containers/seccomp.json 65536k private host 65536} {false systemd [PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] [/usr/libexec/podman/conmon /usr/local/libexec/podman/conmon /usr/local/lib/podman/conmon /usr/bin/conmon /usr/sbin/conmon /usr/local/bin/conmon /usr/local/sbin/conmon /run/current-system/sw/bin/conmon] ctrl-p,ctrl-q true /var/run/libpod/events/events.log file [/usr/share/containers/oci/hooks.d] docker:// /pause k8s.gcr.io/pause:3.2 /usr/libexec/podman/catatonit shm false 2048 crun map[crun:[/usr/bin/crun /usr/sbin/crun /usr/local/bin/crun /usr/local/sbin/crun /sbin/crun /bin/crun /run/current-system/sw/bin/crun] kata:[/usr/bin/kata-runtime /usr/sbin/kata-runtime /usr/local/bin/kata-runtime /usr/local/sbin/kata-runtime /sbin/kata-runtime /bin/kata-runtime /usr/bin/kata-qemu /usr/bin/kata-fc] kata-fc:[/usr/bin/kata-fc] kata-qemu:[/usr/bin/kata-qemu] kata-runtime:[/usr/bin/kata-runtime] runc:[/usr/bin/runc /usr/sbin/runc /usr/local/bin/runc /usr/local/sbin/runc /sbin/runc /bin/runc /usr/lib/cri-o-runc/sbin/runc /run/current-system/sw/bin/runc]] missing [] [crun runc] [crun] {false false false true true true} false 3 /var/lib/containers/storage/libpod 10 /var/run/libpod /var/lib/containers/storage/volumes} {[/usr/libexec/cni /usr/lib/cni /usr/local/lib/cni /opt/cni/bin] podman /etc/cni/net.d/}}
DEBU[0000] Reading configuration file "/etc/containers/containers.conf"
DEBU[0000] Merged system config "/etc/containers/containers.conf": &{{[] [] container-default [] host [CAP_AUDIT_WRITE CAP_CHOWN CAP_DAC_OVERRIDE CAP_FOWNER CAP_FSETID CAP_KILL CAP_MKNOD CAP_NET_BIND_SERVICE CAP_NET_RAW CAP_SETFCAP CAP_SETGID CAP_SETPCAP CAP_SETUID CAP_SYS_CHROOT] [] [nproc=1048576:1048576] [] [] [] false [PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] false false false host k8s-file -1 host false 2048 private /usr/share/containers/seccomp.json 65536k host host 65536} {false cgroupfs [PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] [/usr/libexec/podman/conmon /usr/local/libexec/podman/conmon /usr/local/lib/podman/conmon /usr/bin/conmon /usr/sbin/conmon /usr/local/bin/conmon /usr/local/sbin/conmon /run/current-system/sw/bin/conmon] ctrl-p,ctrl-q true /var/run/libpod/events/events.log file [/usr/share/containers/oci/hooks.d] docker:// /pause k8s.gcr.io/pause:3.2 /usr/libexec/podman/catatonit shm false 2048 crun map[crun:[/usr/bin/crun /usr/sbin/crun /usr/local/bin/crun /usr/local/sbin/crun /sbin/crun /bin/crun /run/current-system/sw/bin/crun] kata:[/usr/bin/kata-runtime /usr/sbin/kata-runtime /usr/local/bin/kata-runtime /usr/local/sbin/kata-runtime /sbin/kata-runtime /bin/kata-runtime /usr/bin/kata-qemu /usr/bin/kata-fc] kata-fc:[/usr/bin/kata-fc] kata-qemu:[/usr/bin/kata-qemu] kata-runtime:[/usr/bin/kata-runtime] runc:[/usr/bin/runc /usr/sbin/runc /usr/local/bin/runc /usr/local/sbin/runc /sbin/runc /bin/runc /usr/lib/cri-o-runc/sbin/runc /run/current-system/sw/bin/runc]] missing [] [crun runc] [crun] {false false false true true true} false 3 /var/lib/containers/storage/libpod 10 /var/run/libpod /var/lib/containers/storage/volumes} {[/usr/libexec/cni /usr/lib/cni /usr/local/lib/cni /opt/cni/bin] podman /etc/cni/net.d/}}
DEBU[0000] Using conmon: "/usr/bin/conmon"
DEBU[0000] Initializing boltdb state at /var/lib/containers/storage/libpod/bolt_state.db
DEBU[0000] Using graph driver overlay
DEBU[0000] Using graph root /var/lib/containers/storage
DEBU[0000] Using run root /var/run/containers/storage
DEBU[0000] Using static dir /var/lib/containers/storage/libpod
DEBU[0000] Using tmp dir /var/run/libpod
DEBU[0000] Using volume path /var/lib/containers/storage/volumes
DEBU[0000] Set libpod namespace to ""
DEBU[0000] [graphdriver] trying provided driver "overlay"
DEBU[0000] overlay: imagestore=/var/lib/shared
DEBU[0000] overlay: mount_program=/usr/bin/fuse-overlayfs
ERRO[0000] could not get runtime: mount /var/lib/containers/storage/overlay:/var/lib/containers/storage/overlay, flags: 0x1000: operation not permitted
sh-5.0# ls -l /var/lib/containers/storage/overlay
total 0
drwx------. 2 root root 6 Jun 23 06:50 l
I see in https://github.com/containers/buildah/issues/867 that you suggest fixing this by mapping in some other drive at /var/lib/containers
.
But when running as rootless, this was not a problem, and it was using the same drive (at another path, /home/.local/share/containers/storage).
The mount map looks like this:
$ df
Filesystem 1K-blocks Used Available Use% Mounted on
overlay 125277164 60691652 64585512 49% /
tmpfs 65536 0 65536 0% /dev
tmpfs 16468192 0 16468192 0% /sys/fs/cgroup
shm 65536 0 65536 0% /dev/shm
tmpfs 16468192 9660 16458532 1% /etc/hostname
devtmpfs 16430336 0 16430336 0% /dev/fuse
/dev/mapper/coreos-luks-root-nocrypt 125277164 60691652 64585512 49% /etc/hosts
tmpfs 16468192 24 16468168 1% /run/secrets/kubernetes.io/serviceaccount
tmpfs 16468192 0 16468192 0% /proc/acpi
tmpfs 16468192 0 16468192 0% /proc/scsi
tmpfs 16468192 0 16468192 0% /sys/firmware
We only have NFS-based PVCs on the platform, and I have found earlier that overlay fails on NFS. Workaround then was to use a /tmp folder from the running image (similar to using /home/.local I guess).
Oh, the warning in the output shows it reads another config file when running as root. I deleted the file, but the only apparent difference is that the warning is removed.
I have tried mounting a folder at /dev/xx/storage
(tmpfs) into the /var/lib/containers/storage
path, and it makes no apparent difference.
I am out of ideas of tweaking stuff now :(
Your question about /dev/fuse
as rootless:
sh-5.0$ ls -lZ /dev/fuse
crw-rw-rw-. 1 root root system_u:object_r:fuse_device_t:s0 10, 229 Jun 22 11:11 /dev/fuse
Same as when I run as root.
Regarding missing UIDs and GIDs in the user namespace. You can set ignore_chown_errors=true
in the storage.conf (see https://github.com/containers/storage/blob/master/docs/containers-storage.conf.5.md#storage-options-for-overlay-table). This will squash all UIDs and GIDs to the ones available in the user namespace.
Regarding fuse: although we need to find a way to address the issue, a quick workaround could be using the VFS storage driver. That will have some considerable performance impacts though.
I will give VFS a shot.
It would be nice with image building on OpenShift that is secure. Performance is not a super critical parameter right now (we are still bringing up the platform). Of course, I do not know how slow it is yet :)
The alternative we had discussed was building images on a dedicated box. But we would obviously prefer to keep all workloads on the OpenShift platform.
Thanks!
It also fails with VFS. The error is something I can only find in https://github.com/containers/libpod/issues/4079 which is ironically worked around by switching to fuse :)
$ id
uid=1000590000(builder) gid=0(root) groups=0(root),1000590000
$ podman --storage-driver=vfs version
Version: 1.9.1
RemoteAPI Version: 1
Go Version: go1.14.2
OS/Arch: linux/amd64
$ podman --storage-driver=vfs --log-level debug run -it docker.io/library/alpine /bin/sh -c "echo 'hello'"
...
WARN[0000] Error initializing configured OCI runtime kata: no valid executable found for OCI runtime kata: invalid argument
DEBU[0000] parsed reference into "[vfs@/home/.local/share/containers/storage+/tmp/run-1000590000/containers]docker.io/library/alpine:latest"
DEBU[0000] parsed reference into "[vfs@/home/.local/share/containers/storage+/tmp/run-1000590000/containers]@a24bb4013296f61e89ba57005a7b3e52274d8edd3ae2077d04395f806b63d83e"
DEBU[0000] [graphdriver] trying provided driver "vfs"
DEBU[0000] exporting opaque data as blob "sha256:a24bb4013296f61e89ba57005a7b3e52274d8edd3ae2077d04395f806b63d83e"
DEBU[0000] Using host netmode
DEBU[0000] Loading seccomp profile from "/usr/share/containers/seccomp.json"
DEBU[0000] created OCI spec and options for new container
DEBU[0000] Allocated lock 1 for container fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911
DEBU[0000] parsed reference into "[vfs@/home/.local/share/containers/storage+/tmp/run-1000590000/containers]@a24bb4013296f61e89ba57005a7b3e52274d8edd3ae2077d04395f806b63d83e"
DEBU[0000] exporting opaque data as blob "sha256:a24bb4013296f61e89ba57005a7b3e52274d8edd3ae2077d04395f806b63d83e"
DEBU[0000] created container "fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911"
DEBU[0000] container "fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911" has work directory "/home/.local/share/containers/storage/vfs-containers/fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911/userdata"
DEBU[0000] container "fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911" has run directory "/tmp/run-1000590000/containers/vfs-containers/fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911/userdata"
DEBU[0000] New container created "fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911"
DEBU[0000] container "fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911" has CgroupParent "/libpod_parent/libpod-fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911"
DEBU[0000] Handling terminal attach
DEBU[0000] mounted container "fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911" at "/home/.local/share/containers/storage/vfs/dir/3d651c0bc695dbdbac73b64a34431110b4a0eb2f465bd42330744a5a534c35b8"
DEBU[0000] Created root filesystem for container fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911 at /home/.local/share/containers/storage/vfs/dir/3d651c0bc695dbdbac73b64a34431110b4a0eb2f465bd42330744a5a534c35b8
DEBU[0000] /etc/system-fips does not exist on host, not mounting FIPS mode secret
DEBU[0000] reading hooks from /usr/share/containers/oci/hooks.d
DEBU[0000] Created OCI spec for container fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911 at /home/.local/share/containers/storage/vfs-containers/fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911/userdata/config.json
DEBU[0000] /usr/bin/conmon messages will be logged to syslog
DEBU[0000] running conmon: /usr/bin/conmon args="[--api-version 1 -c fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911 -u fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911 -r /usr/bin/crun -b /home/.local/share/containers/storage/vfs-containers/fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911/userdata -p /tmp/run-1000590000/containers/vfs-containers/fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911/userdata/pidfile -l k8s-file:/home/.local/share/containers/storage/vfs-containers/fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911/userdata/ctr.log --exit-dir /tmp/run-1000590000/libpod/tmp/exits --socket-dir-path /tmp/run-1000590000/libpod/tmp/socket --log-level debug --syslog -t --conmon-pidfile /tmp/run-1000590000/containers/vfs-containers/fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /home/.local/share/containers/storage --exit-command-arg --runroot --exit-command-arg /tmp/run-1000590000/containers --exit-command-arg --log-level --exit-command-arg debug --exit-command-arg --cgroup-manager --exit-command-arg cgroupfs --exit-command-arg --tmpdir --exit-command-arg /tmp/run-1000590000/libpod/tmp --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg vfs --exit-command-arg --events-backend --exit-command-arg file --exit-command-arg container --exit-command-arg cleanup --exit-command-arg fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911]"
WARN[0000] Failed to add conmon to cgroupfs sandbox cgroup: error creating cgroup for cpu: mkdir /sys/fs/cgroup/cpu/libpod_parent: read-only file system
DEBU[0000] Received: -1
DEBU[0000] Cleaning up container fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911
DEBU[0000] Network is already cleaned up, skipping...
DEBU[0000] unmounted container "fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911"
DEBU[0000] ExitCode msg: "mount `proc` to '/home/.local/share/containers/storage/vfs/dir/3d651c0bc695dbdbac73b64a34431110b4a0eb2f465bd42330744a5a534c35b8/proc': permission denied: oci runtime permission denied error"
ERRO[0000] mount `proc` to '/home/.local/share/containers/storage/vfs/dir/3d651c0bc695dbdbac73b64a34431110b4a0eb2f465bd42330744a5a534c35b8/proc': Permission denied: OCI runtime permission denied error
(full log in https://github.com/jskovjyskebankdk/openshift-podman/blob/master/README.md)
@jskovjyskebankdk how did you manage to make /dev/fuse work inside of the container? I am getting:
sh-5.0# ls -lah /dev/fuse
crw-rw-rw-. 1 root root 10, 229 Jun 13 08:45 /dev/fuse
sh-5.0# id
uid=0(root) gid=0(root) groups=0(root)
sh-5.0# cat /dev/fuse
cat: /dev/fuse: Operation not permitted
and:
+ buildah bud --build-arg TAG=7.7 -t test:latest .
STEP 1: FROM registry/rhel7:7.7
Getting image source signatures
Copying blob sha256:32be9843afa050552a66345576a59497ba7c81c272aa895d67e6e349841714da
Copying blob sha256:1f1202c893ce2775c72b2a3f42ac33b25231d16ca978244bb0c6d1453dc1f39e
Copying config sha256:6682529ce3faf028687cef4fc6ffb30f51a1eb805b3709d31cb92a54caeb3daf
Writing manifest to image destination
Storing signatures
level=error msg="error unmounting /var/lib/containers/storage/overlay/20eb3e378dd5186b133651b1b23a60d3b3eab611ac1283294a342c1c5a905e42/merged: invalid argument"
error mounting new container: error mounting build container "69f52236781b491f49e742644a8af63e237cad149f32e532ac2530b14720f429": error creating overlay mount to /var/lib/containers/storage/overlay/20eb3e378dd5186b133651b1b23a60d3b3eab611ac1283294a342c1c5a905e42/merged: using mount program /usr/bin/fuse-overlayfs: fuse: failed to open /dev/fuse: Operation not permitted
fuse-overlayfs: cannot mount: Operation not permitted
@danielkucera I am not sure cat /dev/fuse
is valid. I get that error on a workstation as root.
I only saw it working/failing in context of podman execution.
And podman was only happy when I ran in a container with privileged: true
So I think you see the same problem as I do; it appears /dev/fuse cannot be used in a non-privileged container.
(or that is my theory - hopefully Daniel has something to add)
Ok I am trying to run a buildah container within a privileged container non root and I am failing. I am not sure if you start inside of a user namespace that this is possible
$ podman run --privileged --device=/dev/fuse -ti quay.io/buildah/testing sh
#
At this point I modified the /etc/subuild and /etc/subgid inside of the container to use different UID mappings, since my host account was only able to use 65k UIDS.
# cat > /etc/subuid << _EOF
build:10000:2000
_EOF
# cat > /etc/subgid << _EOF
build:10000:2000
_EOF
Now I will switch to the build user and attempt to pull an image
# su - build
$ buildah from alpine
Getting image source signatures
Copying blob df20fa9351a1 done
Copying config a24bb40132 done
Writing manifest to image destination
Storing signatures
alpine-working-container
So far so good, this means this used fuse-overlayfs to pull to a user namespace. Now I will attempt to run a container on it. Using --isolation=chroot
$ buildah run alpine-working-container ls /
2020-06-23T13:03:08.000474825Z: executable file not found in $PATH: No such file or directory
error running container: error creating container for [ls /]: : exit status 1
I can enter the user namespace and make sure everything is set, then mount the image, and attempt to mount the proc file system.
$ buildah unshare
# buildah mount alpine-working-container
/home/build/.local/share/containers/storage/overlay/ea45d5325b23dcff9349d334600e347521bb9ab196981534f2490e2a905575a5/merged
# mount -t proc none /home/build/.local/share/containers/storage/overlay/ea45d5325b23dcff9349d334600e347521bb9ab196981534f2490e2a905575a5/merged/proc
mount: /home/build/.local/share/containers/storage/overlay/ea45d5325b23dcff9349d334600e347521bb9ab196981534f2490e2a905575a5/merged/proc: permission denied.
I do not know what is causing permission denied here. Basically I am blocked from mounting a proc file system from inside of the user namespace.
@rhvgoyal Any ideas? @giuseppe ?
I am unable to even open the file when not privileged:
privileged:true
:
sh-5.0# exec 3<> /dev/fuse
sh-5.0# ls -lah /proc/self/fd
total 0
dr-x------. 2 root root 0 Jun 23 13:27 .
dr-xr-xr-x. 9 root root 0 Jun 23 13:27 ..
lrwx------. 1 root root 64 Jun 23 13:27 0 -> /dev/pts/1
lrwx------. 1 root root 64 Jun 23 13:27 1 -> /dev/pts/1
lrwx------. 1 root root 64 Jun 23 13:27 2 -> /dev/pts/1
lrwx------. 1 root root 64 Jun 23 13:27 3 -> /dev/fuse
lr-x------. 1 root root 64 Jun 23 13:27 4 -> /proc/809/fd
privileged: false
:
sh-5.0# exec 3<> /dev/fuse
sh: /dev/fuse: Operation not permitted
@jskovjyskebankdk I was trying to find out how builds triggered by BuildConfig are executed and it turns out that they run a pod with serviceaccount builder
and security context, guess what....
securityContext:
privileged: true
So I presume there is no way how to avoid this when even OpenShift native build mechanism runs as privileged....
We are under full discussion of this right now. We see two choices, one involves a container running with CAP_SETUID, CAP_SETGID, CAP_SYS_CHROOT and we can get that to run buildah inside of a user namespace as a non root user.
Second choice is to get OpenShift/Kubernetes/CRI-O to launch the builder container inside of a user namespace which would be fully locked down from a UID point of view, but might have issues dealing with volumes and secrets, since kubernetes would have to set any content it creates for the container to match the "root" of the container.
@rhvgoyal Any ideas? @giuseppe ?
I think you'll first need to create a new pid namespace
I am happy with the CAP_SYS_CHROOT route (easiest for me to work with, and I will need volumes).
But I fail to make anything work as suggested. I probably need a little more specific guidance :)
What I have done is run the podman-stable image (installing buildah) with SCC anyuid, and it shows CAP_SYS_CHROOT:
sh-5.0# echo podman:10000:65536 > /etc/subuid
sh-5.0# echo podman:10000:65536 > /etc/subgid
sh-5.0# su - podman
[podman@podman-8-c5qp7 ~]$ buildah version
Version: 1.14.9
Go Version: go1.14.2
Image Spec: 1.0.1-dev
Runtime Spec: 1.0.1-dev
CNI Spec: 0.4.0
libcni Version:
image Version: 5.4.3
Git Commit:
Built: Thu Jan 1 00:00:00 1970
OS/Arch: linux/amd64
[podman@podman-8-rmtgx ~]$ capsh --print
Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot+i
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot
Ambient set =
Securebits: 00/0x0/1'b0
secure-noroot: no (unlocked)
secure-no-suid-fixup: no (unlocked)
secure-keep-caps: no (unlocked)
secure-no-ambient-raise: no (unlocked)
uid=1000(podman)
gid=1000(podman)
groups=1000(podman)
[podman@podman-8-c5qp7 ~]$ buildah --log-level debug run --isolation chroot alpine-working-container ls /
DEBU running [buildah-in-a-user-namespace --log-level debug run --isolation chroot alpine-working-container ls /] with environment [SHELL=/bin/bash HISTCONTROL=ignoredups HISTSIZE=1000 HOSTNAME= PWD=/home/podman LOGNAME=podman HOME=/home/pod
man LANG=C.UTF-8 LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31
:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz
=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=0
1;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.
pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:
*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.m4a=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.
mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.oga=01;36:*.opus=01;36:*.spx=01;36:*.xspf=01;36: BUILDAH_ISOLATION=chroot TERM=xterm USER=podman SHLVL=1 PATH=/home/podman/.local/bin:/home/podman/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/us
r/sbin MAIL=/var/spool/mail/podman _=/usr/bin/buildah TMPDIR=/var/tmp _CONTAINERS_USERNS_CONFIGURED=1], UID map [{ContainerID:0 HostID:1000 Size:1} {ContainerID:1 HostID:10000 Size:65536}], and GID map [{ContainerID:0 HostID:1000 Size:1} {Co
ntainerID:1 HostID:10000 Size:65536}]
DEBU [graphdriver] trying provided driver "overlay"
DEBU overlay: mount_program=/usr/bin/fuse-overlayfs
DEBU backingFs=overlayfs, projectQuotaSupported=false, useNativeDiff=false, usingMetacopy=false
DEBU using "/var/tmp/buildah994730804" to hold bundle data
DEBU Resources: &buildah.CommonBuildOptions{AddHost:[]string{}, CgroupParent:"", CPUPeriod:0x0, CPUQuota:0, CPUShares:0x0, CPUSetCPUs:"", CPUSetMems:"", HTTPProxy:true, Memory:0, DNSSearch:[]string{}, DNSServers:[]string{}, DNSOptions:[]stri
ng{}, MemorySwap:0, LabelOpts:[]string(nil), SeccompProfilePath:"/usr/share/containers/seccomp.json", ApparmorProfile:"", ShmSize:"65536k", Ulimit:[]string{"nproc=1048576:1048576"}, Volumes:[]string{}}
DEBU overlay: mount_data=lowerdir=/home/podman/.local/share/containers/storage/overlay/l/VUBMQEZB7D4VJWLROCODAIR24F,upperdir=/home/podman/.local/share/containers/storage/overlay/1ca1feda8f3a76261656185490fb0faeb6a192fa8a04ac9a4e12ef0082e0ec2
8/diff,workdir=/home/podman/.local/share/containers/storage/overlay/1ca1feda8f3a76261656185490fb0faeb6a192fa8a04ac9a4e12ef0082e0ec28/work
ERRO error unmounting /home/podman/.local/share/containers/storage/overlay/1ca1feda8f3a76261656185490fb0faeb6a192fa8a04ac9a4e12ef0082e0ec28/merged: invalid argument
DEBU error running [ls /] in container "alpine-working-container": error mounting container "8563a43b0e4254fa3003b5fafc79c0f6371ca7bc89f3ffb8d61bcb314d80d05b": error mounting build container "8563a43b0e4254fa3003b5fafc79c0f6371ca7bc89f3ffb8d
61bcb314d80d05b": error creating overlay mount to /home/podman/.local/share/containers/storage/overlay/1ca1feda8f3a76261656185490fb0faeb6a192fa8a04ac9a4e12ef0082e0ec28/merged: using mount program /usr/bin/fuse-overlayfs: fuse: failed to open
/dev/fuse: Operation not permitted
fuse-overlayfs: cannot mount: Operation not permitted
: exit status 1
error mounting container "8563a43b0e4254fa3003b5fafc79c0f6371ca7bc89f3ffb8d61bcb314d80d05b": error mounting build container "8563a43b0e4254fa3003b5fafc79c0f6371ca7bc89f3ffb8d61bcb314d80d05b": error creating overlay mount to /home/podman/.loc
al/share/containers/storage/overlay/1ca1feda8f3a76261656185490fb0faeb6a192fa8a04ac9a4e12ef0082e0ec28/merged: using mount program /usr/bin/fuse-overlayfs: fuse: failed to open /dev/fuse: Operation not permitted
fuse-overlayfs: cannot mount: Operation not permitted
: exit status 1
ERRO exit status 1
[podman@podman-8-c5qp7 ~]$ ls -lZ /dev/fuse
crw-rw-rw-. 1 root root system_u:object_r:fuse_device_t:s0 10, 229 Jun 22 11:03 /dev/fuse
Seems to be the same problem as with podman, so I am probably missing some (hopefully not too obvious) magic.
This is the current minimal configuration working for me:
securityContext:
privileged: false
runAsUser: 0
command:
buildah --storage-driver vfs bud --isolation chroot -t test:latest .
Yes! Obviously I should not have used fuse but tried VFS again.
It does indeed work!
Thank you all!
I will make a PR with an OpenShift-specific howto for how to set it up.
We believe that you should be able to specify the /dev/fuse device in kubernetes but currently the kublet is only passing in block devices. @nalind is looking into a fix.
I have written this: https://github.com/jskovjyskebankdk/buildah/blob/rootlessBudOpenShift/docs/tutorials/05-openshift-rootless-bud.md
@ashokponkumar would you have a look?
@rhatdan I have written a tutorial for the buildah project since that is what works right now. When/if it gets to work with podman, I will be happy to provide a similar variant for podman. Does that suit you, or would you prefer it somewhere else/in another form?
No that sounds good.
I created https://github.com/containers/buildah/pull/2453
We believe that you should be able to specify the /dev/fuse device in kubernetes but currently the kublet is only passing in block devices. @nalind is looking into a fix.
Do you have an issue for this so that I can track it? I would like to follow this to the end, if possible. So the project can have a simple tutorial that simpletons like myself can follow :)
Thanks!
In a pod spec, one possibility would be to mount a HostPathCharDev
volume to mount the device from the node, but those volume devices don't get added to the container's device cgroup. The runtime will add all devices to the container's device cgroup if the container is privileged
, but that's what we're trying to avoid requiring here,
https://github.com/kubernetes/kubernetes/pull/79925 attempts to modify the kubelet to add devices to the device cgroup for not-privileged
containers. I've not yet personally verified that we can't also get the desired results using one of the other options that were suggested there.
OK, thanks @nalind
A friendly reminder that this issue had no activity for 30 days.
I can make builds now (using Buildah).
But we only have NFS-based storage which makes it really hard to get performant builds (because VFS cannot use NFS).
So I am still very much hoping that there will be some way in the future to make use of /dev/fuse.
@jskovjyskebankdk did you try https://github.com/flavio/fuse-device-plugin ?
@elgalu I was not aware of that project, no.
And I do not think I will test it - I am a little concerned about its age, and it not being mentioned as a device plugin in the official documentation.
But thanks for the reference.
By the way; we now use Podman to make builds in non-privileged containers.
Still using NFS backend and VFS, so performance is not great.
But it does work (same instructions as for Buildah, https://github.com/containers/buildah/blob/master/docs/tutorials/05-openshift-rootless-bud.md)
So I will close this issue.
/kind bug
Description
Podman is not able to pull images when running in an OpenShift container.
There are elements seen in many other reported issues. I hope to get some help in tracking down what I am missing, so it can be added to the podman documentation proper (I will be happy to help with this).
Steps to reproduce the issue:
Full description of the steps taken - with bog standard images and very little setup - in this repository: https://github.com/jskovjyskebankdk/openshift-podman
Describe the results you received: Podman fails image pull with "there might not be enough IDs available in the namespace".
Describe the results you expected: The image pull should complete :)
Additional information you deem important (e.g. issue happens only occasionally):
This may be a side effect of how OpenShift configures containers, compared to running a container locally in docker/podman. It must be similar to whatever people face on cloud hosting, but I have not been able to find anything documenting how to get things running.
Output of
podman version
:Output of
podman info --debug
:Package info (e.g. output of
rpm -q podman
orapt list podman
):Additional environment details (AWS, VirtualBox, physical, etc.):
On-premise OpenShift 4.3.21