containers / crun

A fast and lightweight fully featured OCI runtime and C library for running containers
GNU General Public License v2.0
2.99k stars 303 forks source link

crun exec throws "writing file `[...]/cgroup.procs`: Permission denied" #376

Closed geverartsdev closed 4 years ago

geverartsdev commented 4 years ago

I am trying to launch a process in a container running in a user+mount namespace (created with rootlesskit), but when I do I get this error, while I actually have writing permission to the file:

Terminal 1

$ rootlesskit bash
# echo $$ > /tmp/pid
# crun --systemd-cgroup run test
/ #

Terminal 2

$ nsenter -t $(cat /tmp/pid) --user --mount
# crun --systemd-cgroup exec -t test sh
2020-05-29T13:37:05.000581591Z: writing file `/sys/fs/cgroup//user.slice/user-1000.slice/user@1000.service/user.slice/contingious-test.scope/cgroup.procs`: Permission denied
# crun exec -t test sh
2020-05-29T13:37:16.000059652Z: writing file `/sys/fs/cgroup//user.slice/user-1000.slice/user@1000.service/user.slice/contingious-test.scope/cgroup.procs`: Permission denied
# ls -l /sys/fs/cgroup//user.slice/user-1000.slice/user@1000.service/user.slice/contingious-test.scope/cgroup.procs
-rw-r--r-- 1 root root 0 May 29 13:35 /sys/fs/cgroup//user.slice/user-1000.slice/user@1000.service/user.slice/contingious-test.scope/cgroup.procs

Any idea of what is going on?

Here is the content of the config.json used to create the container ```json { "ociVersion": "1.0.1-dev", "process": { "terminal": true, "user": { "uid": 0, "gid": 0 }, "args": [ "/bin/sh" ], "env": [ "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "TERM=xterm", "HOME=/root" ], "cwd": "/", "capabilities": { "bounding": [ "CAP_AUDIT_WRITE", "CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_FOWNER", "CAP_FSETID", "CAP_KILL", "CAP_MKNOD", "CAP_NET_BIND_SERVICE", "CAP_NET_RAW", "CAP_SETFCAP", "CAP_SETGID", "CAP_SETPCAP", "CAP_SETUID", "CAP_SYS_CHROOT" ], "effective": [ "CAP_AUDIT_WRITE", "CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_FOWNER", "CAP_FSETID", "CAP_KILL", "CAP_MKNOD", "CAP_NET_BIND_SERVICE", "CAP_NET_RAW", "CAP_SETFCAP", "CAP_SETGID", "CAP_SETPCAP", "CAP_SETUID", "CAP_SYS_CHROOT" ], "inheritable": [ "CAP_AUDIT_WRITE", "CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_FOWNER", "CAP_FSETID", "CAP_KILL", "CAP_MKNOD", "CAP_NET_BIND_SERVICE", "CAP_NET_RAW", "CAP_SETFCAP", "CAP_SETGID", "CAP_SETPCAP", "CAP_SETUID", "CAP_SYS_CHROOT" ], "permitted": [ "CAP_AUDIT_WRITE", "CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_FOWNER", "CAP_FSETID", "CAP_KILL", "CAP_MKNOD", "CAP_NET_BIND_SERVICE", "CAP_NET_RAW", "CAP_SETFCAP", "CAP_SETGID", "CAP_SETPCAP", "CAP_SETUID", "CAP_SYS_CHROOT" ], "ambient": [ "CAP_AUDIT_WRITE", "CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_FOWNER", "CAP_FSETID", "CAP_KILL", "CAP_MKNOD", "CAP_NET_BIND_SERVICE", "CAP_NET_RAW", "CAP_SETFCAP", "CAP_SETGID", "CAP_SETPCAP", "CAP_SETUID", "CAP_SYS_CHROOT" ] }, "rlimits": [ { "type": "RLIMIT_NOFILE", "hard": 1024, "soft": 1024 }, { "type": "RLIMIT_NPROC", "hard": 1024, "soft": 1024 } ], "oomScoreAdj": 0, "noNewPrivileges": true }, "root": { "path": "/tmp/contingious-test/test/rootfs", "readonly": false }, "hostname": "test", "mounts": [ { "destination": "/mnt", "type": "bind", "source": "/home/guillaume/.local/share/contingious/storage/images/edvgui/alpine-hello-world/ro-rootfs", "options": [ "bind", "private", "ro" ] }, { "destination": "/proc", "type": "proc", "source": "proc", "options": [ "nosuid", "noexec", "nodev" ] }, { "destination": "/dev", "type": "tmpfs", "source": "tmpfs", "options": [ "nosuid", "strictatime", "mode=755", "size=65536k" ] }, { "destination": "/sys", "type": "sysfs", "source": "sysfs", "options": [ "nosuid", "noexec", "nodev", "ro" ] }, { "destination": "/dev/pts", "type": "devpts", "source": "devpts", "options": [ "nosuid", "noexec", "newinstance", "ptmxmode=0666", "mode=0620", "gid=5" ] }, { "destination": "/dev/mqueue", "type": "mqueue", "source": "mqueue", "options": [ "nosuid", "noexec", "nodev" ] }, { "destination": "/sys/fs/cgroup", "type": "cgroup", "source": "cgroup", "options": [ "rprivate", "nosuid", "noexec", "nodev", "relatime", "ro" ] } ], "linux": { "resources": { "devices": [ { "allow": false, "access": "rwm" } ], "memory": { "limit": 1073741824, "disableOOMKiller": false }, "cpu": { "quota": 100000, "period": 100000 }, "pids": { "limit": 4096 } }, "cgroupsPath": "user.slice:contingious:test", "namespaces": [ { "type": "pid" }, { "type": "network" }, { "type": "ipc" }, { "type": "uts" }, { "type": "mount" }, { "type": "cgroup" } ], "seccomp": { "defaultAction": "SCMP_ACT_ERRNO", "architectures": [ "SCMP_ARCH_X86_64", "SCMP_ARCH_X86", "SCMP_ARCH_X32" ], "syscalls": [ { "names": [ "_llseek" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "_newselect" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "accept" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "accept4" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "access" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "adjtimex" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "alarm" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "bind" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "brk" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "capget" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "capset" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "chdir" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "chmod" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "chown" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "chown32" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "clock_getres" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "clock_gettime" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "clock_nanosleep" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "close" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "connect" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "copy_file_range" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "creat" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "dup" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "dup2" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "dup3" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "epoll_create" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "epoll_create1" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "epoll_ctl" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "epoll_ctl_old" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "epoll_pwait" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "epoll_wait" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "epoll_wait_old" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "eventfd" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "eventfd2" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "execve" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "execveat" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "exit" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "exit_group" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "faccessat" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "fadvise64" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "fadvise64_64" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "fallocate" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "fanotify_mark" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "fchdir" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "fchmod" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "fchmodat" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "fchown" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "fchown32" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "fchownat" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "fcntl" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "fcntl64" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "fdatasync" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "fgetxattr" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "flistxattr" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "flock" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "fork" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "fremovexattr" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "fsetxattr" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "fstat" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "fstat64" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "fstatat64" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "fstatfs" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "fstatfs64" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "fsync" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "ftruncate" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "ftruncate64" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "futex" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "futimesat" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "get_robust_list" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "get_thread_area" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getcpu" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getcwd" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getdents" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getdents64" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getegid" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getegid32" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "geteuid" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "geteuid32" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getgid" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getgid32" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getgroups" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getgroups32" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getitimer" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getpeername" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getpgid" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getpgrp" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getpid" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getppid" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getpriority" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getrandom" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getresgid" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getresgid32" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getresuid" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getresuid32" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getrlimit" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getrusage" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getsid" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getsockname" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getsockopt" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "gettid" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "gettimeofday" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getuid" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getuid32" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "getxattr" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "inotify_add_watch" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "inotify_init" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "inotify_init1" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "inotify_rm_watch" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "io_cancel" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "io_destroy" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "io_getevents" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "io_setup" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "io_submit" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "ioctl" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "ioprio_get" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "ioprio_set" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "ipc" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "kill" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "lchown" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "lchown32" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "lgetxattr" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "link" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "linkat" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "listen" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "listxattr" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "llistxattr" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "lremovexattr" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "lseek" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "lsetxattr" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "lstat" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "lstat64" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "madvise" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "memfd_create" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "mincore" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "mkdir" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "mkdirat" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "mknod" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "mknodat" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "mlock" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "mlock2" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "mlockall" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "mmap" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "mmap2" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "mount" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "mprotect" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "mq_getsetattr" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "mq_notify" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "mq_open" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "mq_timedreceive" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "mq_timedsend" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "mq_unlink" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "mremap" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "msgctl" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "msgget" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "msgrcv" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "msgsnd" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "msync" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "munlock" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "munlockall" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "munmap" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "name_to_handle_at" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "nanosleep" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "newfstatat" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "open" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "openat" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "pause" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "pipe" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "pipe2" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "poll" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "ppoll" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "prctl" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "pread64" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "preadv" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "preadv2" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "prlimit64" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "pselect6" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "pwrite64" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "pwritev" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "pwritev2" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "read" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "readahead" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "readlink" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "readlinkat" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "readv" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "reboot" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "recv" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "recvfrom" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "recvmmsg" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "recvmsg" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "remap_file_pages" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "removexattr" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "rename" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "renameat" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "renameat2" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "restart_syscall" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "rmdir" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "rt_sigaction" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "rt_sigpending" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "rt_sigprocmask" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "rt_sigqueueinfo" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "rt_sigreturn" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "rt_sigsuspend" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "rt_sigtimedwait" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "rt_tgsigqueueinfo" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "sched_get_priority_max" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "sched_get_priority_min" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "sched_getaffinity" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "sched_getattr" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "sched_getparam" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "sched_getscheduler" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "sched_rr_get_interval" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "sched_setaffinity" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "sched_setattr" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "sched_setparam" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "sched_setscheduler" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "sched_yield" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "seccomp" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "select" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "semctl" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "semget" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "semop" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "semtimedop" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "send" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "sendfile" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "sendfile64" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "sendmmsg" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "sendmsg" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "sendto" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "set_robust_list" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "set_thread_area" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "set_tid_address" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "setfsgid" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "setfsgid32" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "setfsuid" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "setfsuid32" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "setgid" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "setgid32" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "setgroups" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "setgroups32" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "setitimer" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "setpgid" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "setpriority" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "setregid" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "setregid32" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "setresgid" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "setresgid32" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "setresuid" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "setresuid32" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "setreuid" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "setreuid32" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "setrlimit" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "setsid" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "setsockopt" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "setuid" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "setuid32" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "setxattr" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "shmat" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "shmctl" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "shmdt" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "shmget" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "shutdown" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "sigaltstack" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "signalfd" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "signalfd4" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "sigreturn" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "socket" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "socketcall" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "socketpair" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "splice" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "stat" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "stat64" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "statfs" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "statfs64" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "statx" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "symlink" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "symlinkat" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "sync" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "sync_file_range" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "syncfs" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "sysinfo" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "syslog" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "tee" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "tgkill" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "time" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "timer_create" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "timer_delete" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "timer_getoverrun" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "timer_gettime" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "timer_settime" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "timerfd_create" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "timerfd_gettime" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "timerfd_settime" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "times" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "tkill" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "truncate" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "truncate64" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "ugetrlimit" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "umask" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "umount" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "umount2" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "uname" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "unlink" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "unlinkat" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "unshare" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "utime" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "utimensat" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "utimes" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "vfork" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "vmsplice" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "wait4" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "waitid" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "waitpid" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "write" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "writev" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "personality" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 0, "op": "SCMP_CMP_EQ" } ] }, { "names": [ "personality" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 8, "op": "SCMP_CMP_EQ" } ] }, { "names": [ "personality" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 131072, "op": "SCMP_CMP_EQ" } ] }, { "names": [ "personality" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 131080, "op": "SCMP_CMP_EQ" } ] }, { "names": [ "personality" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 4294967295, "op": "SCMP_CMP_EQ" } ] }, { "names": [ "arch_prctl" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "modify_ldt" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "clone" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 2080505856, "op": "SCMP_CMP_MASKED_EQ" } ] }, { "names": [ "chroot" ], "action": "SCMP_ACT_ALLOW" } ] }, "maskedPaths": [ "/proc/acpi", "/proc/kcore", "/proc/keys", "/proc/latency_stats", "/proc/timer_list", "/proc/timer_stats", "/proc/sched_debug", "/proc/scsi", "/sys/firmware", "/sys/fs/selinux" ], "readonlyPaths": [ "/proc/asound", "/proc/bus", "/proc/fs", "/proc/irq", "/proc/sys", "/proc/sysrq-trigger" ] } } ```
geverartsdev commented 4 years ago

Doing the same thing with Podman (rootless) goes without a hitch si I guess that I am the one doing something wrong here, but I am having a hard time figuring what...

Would all of this be documented somewhere that I didn't find?

giuseppe commented 4 years ago

looks like you are trying to use the systemd driver with rootless on cgroup v1. That is not supported.

Does it work without --systemd-cgroup?

geverartsdev commented 4 years ago

I am on cgroupv2, and limiting the cpu and memory actually works for the running container. Struggle arise when trying to add a process in this container (which is attached to a control group)

Doing all the same thing whithout any resource management works just fine (unless I add --systemd-cgroup or --cgroup-manager=systemd)

giuseppe commented 4 years ago

what crun version are you using?

While trying your reproducer, I've found some regressions caused by recent changes: https://github.com/containers/crun/pull/378

geverartsdev commented 4 years ago

what crun version are you using?

I wish I could tell... It came with podman, from Kubic repositories i think

guillaume@ubuntu-20:~$ crun --version
crun version UNKNOWN
commit: 
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL

I will try with the last version as soon as I can

geverartsdev commented 4 years ago

I just build it up from sources, but the problem persists

# crun --version
crun version 0.13.144-36b3
commit: 36b3e595618588893ce653d9e840bf8bf058e358
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL

Should I pass something to crun when calling the command exec ? What is the process file (--process and process.json) for?

giuseppe commented 4 years ago

Should I pass something to crun when calling the command exec ?  What is the process file (--process and process.json) for?

that should not affect how the cgroups are used, could you show me the output for cat /proc/self/mountinfo after you join the rootlesskit namespaces?

geverartsdev commented 4 years ago
guillaume@ubuntu-20:~$ nsenter -t $(cat /tmp/pid) --user --mount
root@ubuntu-20:/# cat /proc/self/mountinfo 
374 373 8:2 / / rw,relatime - ext4 /dev/sda2 rw
375 374 0:6 / /dev rw,nosuid,noexec,relatime - devtmpfs udev rw,size=974124k,nr_inodes=243531,mode=755
376 375 0:22 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=000
377 375 0:25 / /dev/shm rw,nosuid,nodev - tmpfs tmpfs rw
378 375 0:31 / /dev/hugepages rw,relatime - hugetlbfs hugetlbfs rw,pagesize=2M
379 375 0:19 / /dev/mqueue rw,nosuid,nodev,noexec,relatime - mqueue mqueue rw
380 374 0:23 / /run rw,nosuid,nodev,noexec,relatime - tmpfs tmpfs rw,size=203552k,mode=755
381 380 0:26 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs tmpfs rw,size=5120k
382 380 0:23 /snapd/ns /run/snapd/ns rw,nosuid,nodev,noexec,relatime - tmpfs tmpfs rw,size=203552k,mode=755
383 380 0:35 / /run/user/1000 rw,nosuid,nodev,relatime - tmpfs tmpfs rw,size=203548k,mode=700,uid=1000,gid=1000
384 374 0:21 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
385 384 0:7 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime - securityfs securityfs rw
386 384 0:27 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime - cgroup2 cgroup2 rw,nsdelegate
387 384 0:28 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime - pstore pstore rw
388 384 0:29 / /sys/fs/bpf rw,nosuid,nodev,noexec,relatime - bpf none rw,mode=700
389 384 0:8 / /sys/kernel/debug rw,nosuid,nodev,noexec,relatime - debugfs debugfs rw
390 384 0:12 / /sys/kernel/tracing rw,nosuid,nodev,noexec,relatime - tracefs tracefs rw
391 384 0:32 / /sys/fs/fuse/connections rw,nosuid,nodev,noexec,relatime - fusectl fusectl rw
392 384 0:20 / /sys/kernel/config rw,nosuid,nodev,noexec,relatime - configfs configfs rw
393 374 0:5 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
394 393 0:30 / /proc/sys/fs/binfmt_misc rw,relatime - autofs systemd-1 rw,fd=28,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=15505
395 374 7:1 / /snap/core18/1754 ro,nodev,relatime - squashfs /dev/loop1 ro
396 374 7:2 / /snap/lxd/15161 ro,nodev,relatime - squashfs /dev/loop2 ro
397 374 7:0 / /snap/core18/1705 ro,nodev,relatime - squashfs /dev/loop0 ro
398 374 7:3 / /snap/lxd/15223 ro,nodev,relatime - squashfs /dev/loop3 ro
399 374 7:4 / /snap/snapd/7264 ro,nodev,relatime - squashfs /dev/loop4 ro
geverartsdev commented 4 years ago

Strangely, if I use podman unshare instead of rootlesskit + nsenter, everything works as expected...

By looking into the ouput of cat /proc/self/mountinfo when joining podman unshare namspace, I noticed that the mount propagation was the slave one, while with rootlesskit, the default one (that I was using) is the privateone.

I then tryied again with a slave propagation, the output of /proc/self/mountinfo now match the one of podman unshare, but I still get the same error...

giuseppe commented 4 years ago

the mount table seems fine.

I am still not able to reproduce on Fedora 32. I am using rootlesskit (748ea095d9b18f9ea9e8a3487a2e43dce534ca8c) and tried both crun 0.13 and current master.

geverartsdev commented 4 years ago

I just build rootlesskit from source, same commit as yours, and I still have the same problem... (I am on Ubuntu 20.04) Looking at strace of the command, openat succeed, write fails:

openat(AT_FDCWD, "/sys/fs/cgroup//user.slice/user-1000.slice/user@1000.service/user.slice/contingious-test.scope/container/cgroup.procs", O_WRONLY|O_CREAT, 0700) = 7
write(7, "4223", 4)                     = -1 EACCES (Permission denied)

I wrote a simple test code, and apparently the error is not proper to crun

#include<stdio.h>
#include<unistd.h>

int main(int argc, char* argv[]) {
        FILE* file = fopen(argv[1], "w");
        fprintf(file, "%d", getpid());
        fclose(file);

        return 0;
}
openat(AT_FDCWD, "/sys/fs/cgroup//user.slice/user-1000.slice/user@1000.service/user.slice/contingious-test.scope/container/cgroup.procs", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
getpid()                                = 4703
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
write(3, "4703", 4)                     = -1 EACCES (Permission denied)

Being quite desperate by this error, I made a quick tour back in cgroupv2 documentation and found this:

The writer must have write access to the “cgroup.procs” file of the common ancestor of the source and destination cgroups.

I guess that my errors comes from this, I checked if I could write to the cgroup.procs file when I have real root permissions on the host... and it works. I imagine then that the processes running inside of rootlesskit's namespace are too far (requiring to modify /sys/fs/cgroup/cgroup.procs probably) from the cgroup in which the process should be moved. Which starts to make sense to why it does work on Fedora and not on Ubuntu (the reason being the obviously poor current support that Ubuntu has for cgroupv2). Following this idea I check the following commands:

guillaume@ubuntu-20:~$ nsenter -t $(cat /tmp/pid) --user --mount
root@ubuntu-20:/# cat /proc/$$/cgroup 
0::/user.slice/user-1000.slice/session-10.scope
root@ubuntu-20:/# ls -l /sys/fs/cgroup/user.slice/user-1000.slice/cgroup.procs 
-rw-r--r-- 1 nobody nogroup 0 May 31 12:48 /sys/fs/cgroup/user.slice/user-1000.slice/cgroup.procs

Which seems to confirm my theory... now, I really don't know what to do with it. Why isn't the user slice editable by the current user? I guess that it is not supposed to be this way, could you maybe confirm me that this is indeed the case on your machine @giuseppe ?

Or did I just go too far looking for an explaination and I am actually saying completely insane things?

giuseppe commented 4 years ago

Which seems to confirm my theory...  now, I really don't know what to do with it.  Why isn't the user slice editable by the current user?  I guess that it is not supposed to be this way, could you maybe confirm me that this is indeed the case on your machine @giuseppe ?
Or did I just go too far looking for an explaination and I am actually saying completely insane things?

one reason is that the process is not in a cgroup owned by the current user.

What is the difference in cat /proc/self/cgroup if you try from podman unshare and from nsenter namespaces? If you don't own the current cgroup, then you can't move a process out of it. You may need to create the cgroup with systemd-run --user --scope ...

geverartsdev commented 4 years ago

What is the difference in cat /proc/self/cgroup if you try from podman unshare and from nsenter namespaces?

As expected podman unshare leads us in a scope under user@1000.service and a simple nsenter doesn't

If you don't own the current cgroup, then you can't move a process out of it. You may need to create the cgroup with systemd-run --user --scope ...

And this indeed allows to create the scope right under the user service, as we want to... Here is the output of the different commands to illustrate

guillaume@ubuntu-20:~$ podman unshare
root@ubuntu-20:~# cat /proc/self/cgroup 
0::/user.slice/user-1000.slice/user@1000.service/user.slice/podman-6696.scope
root@ubuntu-20:~# exit
guillaume@ubuntu-20:~$ nsenter -t 1563 --user --mount
root@ubuntu-20:/# cat /proc/self/cgroup 
0::/user.slice/user-1000.slice/session-21.scope
root@ubuntu-20:/# logout
guillaume@ubuntu-20:~$ systemd-run --user --scope --unit my-scope nsenter -t 1563 --user --mount
Running scope as unit: my-scope.scope
root@ubuntu-20:/# cat /proc/self/cgroup 
0::/user.slice/user-1000.slice/user@1000.service/my-scope.scope

With all of this I managed to make it work with rootlesskit as well! Once again, thank you a lot for your help!

(Am I supposed to close the issue once I am done? Or do you do it yourself?)

giuseppe commented 4 years ago

I can close the issue. Thanks for helping me debugging it.

I should probably add a check for this situation and give a more helpful error message.