containers / crun

A fast and lightweight fully featured OCI runtime and C library for running containers
GNU General Public License v2.0
3.08k stars 311 forks source link

OCI runtime error: crun: mount `proc` to `/proc`: Too many levels of symbolic links #1121

Open satmandu opened 1 year ago

satmandu commented 1 year ago

(Continuation of discussion from https://github.com/containers/crun/issues/1115 )

          yes, that seems like a different error and one I've never seen before.

Do you see the same error if you run something like $ sudo unshare -muinp -f --mount-proc=/proc echo hello?

Could you strace the command to see where it fails?

$ sudo strace -Z -f -s 1000 podman run -tid --net host busybox sh

Originally posted by @giuseppe in https://github.com/containers/crun/issues/1115#issuecomment-1381481362

chronos@localhost /usr/local/lib/crew/packages (master *%|SPARSE=)$ sudo unshare -muinp -f --mount-proc=/proc echo hello
hello
sudo podman run -tid --net host busybox sh
Your kernel does not support pids limit capabilities or the cgroup is not mounted. PIDs limit discarded.
Error: OCI runtime error: crun: mount `proc` to `/proc`: Too many levels of symbolic links
sudo strace -Z -f -s 1000 --output=/usr/local/tmp/crun_strace.output.txt podman run -tid --net host busybox sh
Your kernel does not support pids limit capabilities or the cgroup is not mounted. PIDs limit discarded.
Error: OCI runtime error: crun: mount `proc` to `/proc`: Too many levels of symbolic links

crun_strace.output.txt

satmandu commented 1 year ago

This may or may not be connected:

[65114.616267] Chromium OS LSM: sb_mount Mount path with symlinks prohibited obj="/usr/local/var/lib/containers/storage/overlay/e7726bdeb7684bd44367217c4f7bfdd460b6aa1073177f0e436463d75fbe752d/merged/proc" pid=12351 cmdline="/usr/local/bin/crun --log-format=json --log /usr/local/var/run/containers/storage/overlay-containers/c0c39e780a0c377a3b4b3644b5608b84c7e7c1043ac40ad37ef96bbd15b94584/userdata/oci-log create --bundle /usr/local/var/lib/containers/storage/overlay-containers/c0c39e780a0c377a3b4b3644b5608b84c7e7c1043ac40ad37ef96bbd15b94584/userdata --pid-file /usr/local/var/run/containers/storage/overlay-containers/c0c39e780a0c377a3b4b3644b5608b84c7e7c1043ac40ad37ef96bbd15b94584/userdata/pidfile --console-socket /tmp/conmon-term.1BHUY1 c0c39e780a0c377a3b4b3644b5608b84c7e7c1043ac40ad37ef96bbd15b94584"
[65114.616275] Chromium OS LSM: sb_mount dev=proc type=proc flags=0xe
[65155.910960] Chromium OS LSM: sb_mount Mount path with symlinks prohibited obj="/usr/local/var/lib/containers/storage/overlay/5485c8288679cf94346ead0204c6d19cc869249d59608904f9610afa4540cee9/merged/proc" pid=12563 cmdline="/usr/local/bin/crun --log-format=json --log /usr/local/var/run/containers/storage/overlay-containers/d1875cd31dbeed3816f4f37069d1f707ffce0fccbb97fec52dbaa636fbc5d1b4/userdata/oci-log create --bundle /usr/local/var/lib/containers/storage/overlay-containers/d1875cd31dbeed3816f4f37069d1f707ffce0fccbb97fec52dbaa636fbc5d1b4/userdata --pid-file /usr/local/var/run/containers/storage/overlay-containers/d1875cd31dbeed3816f4f37069d1f707ffce0fccbb97fec52dbaa636fbc5d1b4/userdata/pidfile --console-socket /tmp/conmon-term.IEX3Y1 d1875cd31dbeed3816f4f37069d1f707ffce0fccbb97fec52dbaa636fbc5d1b4"
s1gnate-sync commented 1 year ago

Hi @satmandu,

I've found out the issue and it completely unrelated to crun. The reason why it's happening is because chromeos kernel has additional patches. This particular security measure is controlled by CONFIG_SECURITY_CHROMIUMOS directive.

Here is a bit more directives and explanations: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/HEAD/security/chromiumos/Kconfig

So I assume that fixing this issue is just a matter of building custom kernel with eased security settings.

satmandu commented 1 year ago

Hmm... Thanks for this. Is there a way to work around it for runc purposes without rebuilding the kernel?

config SECURITY_CHROMIUMOS
--
  | bool "Chromium OS Security Module"
  | depends on SECURITY
  | depends on X86_64 \|\| ARM64
  | help
  | The purpose of the Chromium OS security module is to reduce attacking
  | surface by preventing access to general purpose access modes not
  | required by Chromium OS. Currently: the mount operation is
  | restricted by requiring a mount point path without symbolic links,
  | and loading modules is limited to only the root filesystem. This
  | LSM is stacked ahead of any primary "full" LSM.
s1gnate-sync commented 1 year ago

I think so... since run_oci (the only runtime bundled with chromeos) is actually works. I think a possible fix is to make a bind mount to the rootfs instead of symlink.

s1gnate-sync commented 1 year ago

But something tells me it goes a bit beyond crun/runc + it would require root priveledge to do a bind mount

satmandu commented 1 year ago

Root privileges aren't a problem with ChromeOS dev mode. Replacing the kernel is much harder for most people...

Just creating a crosvm stub for using stuff like docker so one doesn't have to open crostini to do nested kvm would also be nice...

s1gnate-sync commented 1 year ago

But that exactly what I did :) Alpine with docker/podman is running under just crosvm

        crosvm run ${EXTRA_ARGS:-} \
                --mem $MEM \
                --cpus $CPUS \
                --cid "$CID" \
                --socket "$VM_CONTROL_SOCKET" \
                --net tap-name=$NETWORK_NAME \
                --block "rootfs.img,ro,o_direct=true,root,sparse=false" \
                --block "cros-tools.img,ro,o_direct=true,sparse=false" \
                --block "$STATE_DISK,o_direct=true,sparse=false" \
                kernel.img

and here is how I build rootfs for the vm

build_rootfs() {
        local tmp_root="$1"
        test -n "${tmp_root:?}"

        mkdir -p "${tmp_root:?}/etc/apk"
        for repo in main community; do
                echo "http://dl-cdn.alpinelinux.org/alpine/latest-stable/$repo"
        done > "${tmp_root:?}/etc/apk/repositories"

        $APK add --root "${tmp_root:?}" --quiet --allow-untrusted --update-cache --initdb \
                alpine-baselayout-data openrc alpine-keys coreutils findutils grep \
                udev shadow sudo diffutils docker docker-compose caddy openssh \
                git micro curl bash umoci buildah skopeo crun runc podman slirp4netns \
                fuse-overlayfs iputils

        passwd="$(cat "${tmp_root:?}/etc/passwd" | grep -v root)"
        echo -e "root:x:0:0:root,,,:/:/bin/nologin\n$passwd\n$USER:x:1000:1000:user,,,:/var/home:/bin/bash" > "${tmp_root:?}/etc/passwd"

        shadow="$(cat "${tmp_root:?}/etc/shadow" | grep -v root)"
        echo -e "root:!::0:::::\n$shadow\n$USER:::0:::::" > "${tmp_root:?}/etc/shadow"

        echo "$USER:!:1000:" >> "${tmp_root:?}/etc/group"
        sed -i -E  "s/^((docker|ping):(x|\!):[^:]+:)([^,]*)(,?)(.*)$/\1$USER\5\4/" "${tmp_root:?}/etc/group"

        local user_id_start=2000000
        local id_length=65535
        echo "$USER:$user_id_start:$id_length" > "${tmp_root:?}/etc/subuid"
        echo "root:1000000:$id_length" >> "${tmp_root:?}/etc/subuid"
        cp "${tmp_root:?}/etc/subuid" "${tmp_root:?}/etc/subgid"

        echo "$USER ALL=(ALL:ALL) NOPASSWD: ALL" > "${tmp_root:?}/etc/sudoers"

        rm -f "${tmp_root:?}/etc/resolv.conf"
        ln -s /var/.data/etc/resolv.conf "${tmp_root:?}/etc/resolv.conf"

        echo 'net.ipv4.ip_forward = 1' > "${tmp_root:?}/etc/sysctl.conf"
        echo "net.ipv4.ping_group_range=0 $(((user_id_start+id_length)))" > "${tmp_root:?}/etc/sysctl.conf"

        echo -e "auto lo\niface lo inet loopback" > "${tmp_root:?}/etc/network/interfaces"

        gen_fstab > "${tmp_root:?}/etc/fstab"

        gen_inittab > "${tmp_root:?}/etc/inittab"

        for name in devfs dmesg udev udev-settle udev-trigger; do
                ln -s "/etc/init.d/$name" "${tmp_root:?}/etc/runlevels/sysinit/$name"
        done

        for name in bootmisc hostname loadkmap networking swap sysctl syslog urandom cgroups; do
                ln -s "/etc/init.d/$name" "${tmp_root:?}/etc/runlevels/boot/$name"
        done

        gen_vshd_service > "${tmp_root:?}/etc/init.d/vshd"
        gen_runstate_service > "${tmp_root:?}/etc/init.d/runstate"
        gen_netstart_script >  "${tmp_root:?}/etc/init.d/netstart"
        chmod a+x "${tmp_root:?}/etc/init.d/runstate" "${tmp_root:?}/etc/init.d/vshd" "${tmp_root:?}/etc/init.d/netstart"
        ln -s /etc/init.d/runstate "${tmp_root:?}/etc/runlevels/default/runstate"

        echo "" > ${tmp_root:?}/etc/motd

        for dir in home media mnt srv usr/local etc/apk usr/share/apk lib/apk root opt; do
                rm -fr "${tmp_root:?}/$dir"
        done

        for dir in tmp sys dev proc run var lib/modules etc/ssh; do
                rm -fr "${tmp_root:?}/$dir"
                mkdir -p "${tmp_root:?}/$dir"
        done

        mkdir -p "${tmp_root:?}/opt/google/cros-containers"

        gen_sshd_config > "${tmp_root:?}/etc/ssh/sshd_config"
        cp "$DIR/ssh_host_"* "${tmp_root:?}/etc/ssh"
        chown 0:0 -R "${tmp_root:?}/etc/ssh"
        chmod 600 -R "${tmp_root:?}/etc/ssh"
}
s1gnate-sync commented 1 year ago

I've just added podman and it doesn't require KVM for sure. Actually I'm not even sure if docker requires one as it containers work perfectly fine just on linux namespaces.

https://gist.github.com/s1gnate-sync/2b17ffb4cfc21a764f784370c61c4fb2 here is an updated version without autostart of docker + basic podman setup.

I'll try to dig into nested KVM thing

s1gnate-sync commented 1 year ago

Yeah docker engine works without nested kvm, only docker destktop requires one

supechicken commented 1 month ago

This may or may not be connected:

[65114.616267] Chromium OS LSM: sb_mount Mount path with symlinks prohibited obj="/usr/local/var/lib/containers/storage/overlay/e7726bdeb7684bd44367217c4f7bfdd460b6aa1073177f0e436463d75fbe752d/merged/proc" pid=12351 cmdline="/usr/local/bin/crun --log-format=json --log /usr/local/var/run/containers/storage/overlay-containers/c0c39e780a0c377a3b4b3644b5608b84c7e7c1043ac40ad37ef96bbd15b94584/userdata/oci-log create --bundle /usr/local/var/lib/containers/storage/overlay-containers/c0c39e780a0c377a3b4b3644b5608b84c7e7c1043ac40ad37ef96bbd15b94584/userdata --pid-file /usr/local/var/run/containers/storage/overlay-containers/c0c39e780a0c377a3b4b3644b5608b84c7e7c1043ac40ad37ef96bbd15b94584/userdata/pidfile --console-socket /tmp/conmon-term.1BHUY1 c0c39e780a0c377a3b4b3644b5608b84c7e7c1043ac40ad37ef96bbd15b94584"
[65114.616275] Chromium OS LSM: sb_mount dev=proc type=proc flags=0xe
[65155.910960] Chromium OS LSM: sb_mount Mount path with symlinks prohibited obj="/usr/local/var/lib/containers/storage/overlay/5485c8288679cf94346ead0204c6d19cc869249d59608904f9610afa4540cee9/merged/proc" pid=12563 cmdline="/usr/local/bin/crun --log-format=json --log /usr/local/var/run/containers/storage/overlay-containers/d1875cd31dbeed3816f4f37069d1f707ffce0fccbb97fec52dbaa636fbc5d1b4/userdata/oci-log create --bundle /usr/local/var/lib/containers/storage/overlay-containers/d1875cd31dbeed3816f4f37069d1f707ffce0fccbb97fec52dbaa636fbc5d1b4/userdata --pid-file /usr/local/var/run/containers/storage/overlay-containers/d1875cd31dbeed3816f4f37069d1f707ffce0fccbb97fec52dbaa636fbc5d1b4/userdata/pidfile --console-socket /tmp/conmon-term.IEX3Y1 d1875cd31dbeed3816f4f37069d1f707ffce0fccbb97fec52dbaa636fbc5d1b4"

Not sure if this is helpful, but I found a workaround for this :)

Turns out that the Chromium OS LSM can be disabled via kernel parameters without rebuilding the kernel (just like Landlock and SELinux)

All you need to do is make use of /usr/share/vboot/bin/make_dev_ssd.sh for appending lsm=selinux,landlock to the boot parameter.