Closed smac89 closed 3 years ago
Doesn't look like Seccomp. Our default profile lives at https://github.com/containers/common/blob/master/pkg/seccomp/seccomp.json#L80 and you can see that close_range
is in the list of allowed calls.
@mheon Do you have any other explanation for this behavior?
The reason I brought up seccomp
is because like I said, using --security-opt seccomp=unconfined
allows the container to run just fine. So why does this flag work if the problem has nothing to do with seccomp?
I've used strace in the real container: once with the flag and once without. With the flag, the strace log shows that the close_range
syscall succeeds:
574 close_range(0, -1, CLOSE_RANGE_CLOEXEC <unfinished ...>
557 <... poll resumed>) = 0 (Timeout)
559 futex(0x55b1c61a09b0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
574 <... close_range resumed>) = 0
Without the flag, we get the following:
571 close_range(0, -1, CLOSE_RANGE_CLOEXEC) = -1 EPERM (Operation not permitted)
571 +++ exited with 127 +++
(The numbers beside each syscall is the process id)
Can you verify what profile is in use in the container you're running in? The default Podman profile does allow the syscall, so I have to assume your system may not be using the default
The default profile should live at /usr/share/containers/seccomp.json
. However, if an alternative is present at /etc/containers/seccomp.json
we will use that one instead.
Can you verify what profile is in use in the container you're running in? The default Podman profile does allow the syscall, so I have to assume your system may not be using the default
Please how do I do this?
I did:
podman create <image_hash>
podman inspect <container_name>
``` [ { "Id": "03803ce5d0f2421e3e4a0778c9262d834f91ad3df96ff962147cb767667d4478", "Created": "2021-04-25T17:29:52.519720505-06:00", "Path": "/app/walk", "Args": [ "/app/walk" ], "State": { "OciVersion": "1.0.2-dev", "Status": "configured", "Running": false, "Paused": false, "Restarting": false, "OOMKilled": false, "Dead": false, "Pid": 0, "ExitCode": 0, "Error": "", "StartedAt": "0001-01-01T00:00:00Z", "FinishedAt": "0001-01-01T00:00:00Z", "Healthcheck": { "Status": "", "FailingStreak": 0, "Log": null } }, "Image": "b8adaf3fbdcf539038216f0e061b638003ac8708cc6933177dd9f8dba0c4cd4e", "ImageName": "b8adaf3fbdc", "Rootfs": "", "Pod": "", "ResolvConfPath": "", "HostnamePath": "", "HostsPath": "", "StaticDir": "/home/chigozirim/.local/share/containers/storage/overlay-containers/03803ce5d0f2421e3e4a0778c9262d834f91ad3df96ff962147cb767667d4478/userdata", "OCIRuntime": "crun", "ConmonPidFile": "/run/user/1000/containers/overlay-containers/03803ce5d0f2421e3e4a0778c9262d834f91ad3df96ff962147cb767667d4478/userdata/conmon.pid", "Name": "priceless_meninsky", "RestartCount": 0, "Driver": "overlay", "MountLabel": "", "ProcessLabel": "", "AppArmorProfile": "", "EffectiveCaps": [ "CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_FOWNER", "CAP_FSETID", "CAP_KILL", "CAP_NET_BIND_SERVICE", "CAP_SETFCAP", "CAP_SETGID", "CAP_SETPCAP", "CAP_SETUID", "CAP_SYS_CHROOT" ], "BoundingCaps": [ "CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_FOWNER", "CAP_FSETID", "CAP_KILL", "CAP_NET_BIND_SERVICE", "CAP_SETFCAP", "CAP_SETGID", "CAP_SETPCAP", "CAP_SETUID", "CAP_SYS_CHROOT" ], "ExecIDs": [], "GraphDriver": { "Name": "overlay", "Data": { "LowerDir": "/home/chigozirim/.local/share/containers/storage/overlay/899938f8a7d4f906eda9dda6f1a413cd792177f6cb2af01d18fd215eab659cd5/diff:/home/chigozirim/.local/share/containers/storage/overlay/30d61bb737bb9be7178afce441d0ca5098909a59001a0301d3b50544e659ace1/diff", "UpperDir": "/home/chigozirim/.local/share/containers/storage/overlay/dbc15944f329eec9343405100a0d3095cffd6b0ed5885f365cdfbb7e327817fc/diff", "WorkDir": "/home/chigozirim/.local/share/containers/storage/overlay/dbc15944f329eec9343405100a0d3095cffd6b0ed5885f365cdfbb7e327817fc/work" } }, "Mounts": [], "Dependencies": [], "NetworkSettings": { "EndpointID": "", "Gateway": "", "IPAddress": "", "IPPrefixLen": 0, "IPv6Gateway": "", "GlobalIPv6Address": "", "GlobalIPv6PrefixLen": 0, "MacAddress": "", "Bridge": "", "SandboxID": "", "HairpinMode": false, "LinkLocalIPv6Address": "", "LinkLocalIPv6PrefixLen": 0, "Ports": {}, "SandboxKey": "" }, "ExitCommand": [ "/usr/bin/podman", "--root", "/home/chigozirim/.local/share/containers/storage", "--runroot", "/run/user/1000/containers", "--log-level", "warning", "--cgroup-manager", "systemd", "--tmpdir", "/run/user/1000/libpod/tmp", "--runtime", "crun", "--storage-driver", "overlay", "--storage-opt", "overlay.mount_program=/usr/bin/fuse-overlayfs", "--events-backend", "journald", "container", "cleanup", "03803ce5d0f2421e3e4a0778c9262d834f91ad3df96ff962147cb767667d4478" ], "Namespace": "", "IsInfra": false, "Config": { "Hostname": "03803ce5d0f2", "Domainname": "", "User": "", "AttachStdin": false, "AttachStdout": false, "AttachStderr": false, "Tty": false, "OpenStdin": false, "StdinOnce": false, "Env": [ "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "TERM=xterm", "container=podman" ], "Cmd": null, "Image": "b8adaf3fbdc", "Volumes": null, "WorkingDir": "/", "Entrypoint": "/app/walk", "OnBuild": null, "Labels": { "io.buildah.version": "1.20.1" }, "Annotations": { "io.kubernetes.cri-o.TTY": "false", "io.podman.annotations.autoremove": "FALSE", "io.podman.annotations.init": "FALSE", "io.podman.annotations.privileged": "FALSE", "io.podman.annotations.publish-all": "FALSE" }, "StopSignal": 15, "CreateCommand": [ "podman", "create", "b8adaf3fbdc" ], "Umask": "0022" }, "HostConfig": { "Binds": [], "CgroupManager": "systemd", "CgroupMode": "private", "ContainerIDFile": "", "LogConfig": { "Type": "k8s-file", "Config": null, "Path": "/home/chigozirim/.local/share/containers/storage/overlay-containers/03803ce5d0f2421e3e4a0778c9262d834f91ad3df96ff962147cb767667d4478/userdata/ctr.log", "Tag": "", "Size": "0B" }, "NetworkMode": "slirp4netns", "PortBindings": {}, "RestartPolicy": { "Name": "", "MaximumRetryCount": 0 }, "AutoRemove": false, "VolumeDriver": "", "VolumesFrom": null, "CapAdd": [], "CapDrop": [ "CAP_AUDIT_WRITE", "CAP_MKNOD", "CAP_NET_RAW" ], "Dns": [], "DnsOptions": [], "DnsSearch": [], "ExtraHosts": [], "GroupAdd": [], "IpcMode": "private", "Cgroup": "", "Cgroups": "default", "Links": null, "OomScoreAdj": 0, "PidMode": "private", "Privileged": false, "PublishAllPorts": false, "ReadonlyRootfs": false, "SecurityOpt": [], "Tmpfs": {}, "UTSMode": "private", "UsernsMode": "", "ShmSize": 65536000, "Runtime": "oci", "ConsoleSize": [ 0, 0 ], "Isolation": "", "CpuShares": 0, "Memory": 0, "NanoCpus": 0, "CgroupParent": "user.slice", "BlkioWeight": 0, "BlkioWeightDevice": null, "BlkioDeviceReadBps": null, "BlkioDeviceWriteBps": null, "BlkioDeviceReadIOps": null, "BlkioDeviceWriteIOps": null, "CpuPeriod": 0, "CpuQuota": 0, "CpuRealtimePeriod": 0, "CpuRealtimeRuntime": 0, "CpusetCpus": "", "CpusetMems": "", "Devices": [], "DiskQuota": 0, "KernelMemory": 0, "MemoryReservation": 0, "MemorySwap": 0, "MemorySwappiness": 0, "OomKillDisable": false, "PidsLimit": 2048, "Ulimits": [], "CpuCount": 0, "CpuPercent": 0, "IOMaximumIOps": 0, "IOMaximumBandwidth": 0, "CgroupConf": null } } ] ```
I've also checked the installed profile (both /usr/share/containers/seccomp.json
and /etc/containers/seccomp.json
are the same), and here it is:
``` { "defaultAction": "SCMP_ACT_ERRNO", "archMap": [ { "architecture": "SCMP_ARCH_X86_64", "subArchitectures": [ "SCMP_ARCH_X86", "SCMP_ARCH_X32" ] }, { "architecture": "SCMP_ARCH_AARCH64", "subArchitectures": [ "SCMP_ARCH_ARM" ] }, { "architecture": "SCMP_ARCH_MIPS64", "subArchitectures": [ "SCMP_ARCH_MIPS", "SCMP_ARCH_MIPS64N32" ] }, { "architecture": "SCMP_ARCH_MIPS64N32", "subArchitectures": [ "SCMP_ARCH_MIPS", "SCMP_ARCH_MIPS64" ] }, { "architecture": "SCMP_ARCH_MIPSEL64", "subArchitectures": [ "SCMP_ARCH_MIPSEL", "SCMP_ARCH_MIPSEL64N32" ] }, { "architecture": "SCMP_ARCH_MIPSEL64N32", "subArchitectures": [ "SCMP_ARCH_MIPSEL", "SCMP_ARCH_MIPSEL64" ] }, { "architecture": "SCMP_ARCH_S390X", "subArchitectures": [ "SCMP_ARCH_S390" ] } ], "syscalls": [ { "names": [ "_llseek", "_newselect", "accept", "accept4", "access", "adjtimex", "alarm", "bind", "brk", "capget", "capset", "chdir", "chmod", "chown", "chown32", "clock_adjtime", "clock_adjtime64", "clock_getres", "clock_getres_time64", "clock_gettime", "clock_gettime64", "clock_nanosleep", "clock_nanosleep_time64", "clone", "close", "close_range", "connect", "copy_file_range", "creat", "dup", "dup2", "dup3", "epoll_create", "epoll_create1", "epoll_ctl", "epoll_ctl_old", "epoll_pwait", "epoll_pwait2", "epoll_wait", "epoll_wait_old", "eventfd", "eventfd2", "execve", "execveat", "exit", "exit_group", "faccessat", "faccessat2", "fadvise64", "fadvise64_64", "fallocate", "fanotify_mark", "fchdir", "fchmod", "fchmodat", "fchown", "fchown32", "fchownat", "fcntl", "fcntl64", "fdatasync", "fgetxattr", "flistxattr", "flock", "fork", "fremovexattr", "fsconfig", "fsetxattr", "fsmount", "fsopen", "fspick", "fstat", "fstat64", "fstatat64", "fstatfs", "fstatfs64", "fsync", "ftruncate", "ftruncate64", "futex", "futimesat", "get_robust_list", "get_thread_area", "getcpu", "getcwd", "getdents", "getdents64", "getegid", "getegid32", "geteuid", "geteuid32", "getgid", "getgid32", "getgroups", "getgroups32", "getitimer", "getpeername", "getpgid", "getpgrp", "getpid", "getppid", "getpriority", "getrandom", "getresgid", "getresgid32", "getresuid", "getresuid32", "getrlimit", "getrusage", "getsid", "getsockname", "getsockopt", "gettid", "gettimeofday", "getuid", "getuid32", "getxattr", "inotify_add_watch", "inotify_init", "inotify_init1", "inotify_rm_watch", "io_cancel", "io_destroy", "io_getevents", "io_setup", "io_submit", "ioctl", "ioprio_get", "ioprio_set", "ipc", "keyctl", "kill", "lchown", "lchown32", "lgetxattr", "link", "linkat", "listen", "listxattr", "llistxattr", "lremovexattr", "lseek", "lsetxattr", "lstat", "lstat64", "madvise", "memfd_create", "mincore", "mkdir", "mkdirat", "mknod", "mknodat", "mlock", "mlock2", "mlockall", "mmap", "mmap2", "mount", "move_mount", "mprotect", "mq_getsetattr", "mq_notify", "mq_open", "mq_timedreceive", "mq_timedsend", "mq_unlink", "mremap", "msgctl", "msgget", "msgrcv", "msgsnd", "msync", "munlock", "munlockall", "munmap", "name_to_handle_at", "nanosleep", "newfstatat", "open", "openat", "openat2", "open_tree", "pause", "pidfd_getfd", "pidfd_open", "pidfd_send_signal", "pipe", "pipe2", "pivot_root", "poll", "ppoll", "ppoll_time64", "prctl", "pread64", "preadv", "preadv2", "prlimit64", "pselect6", "pselect6_time64", "pwrite64", "pwritev", "pwritev2", "read", "readahead", "readlink", "readlinkat", "readv", "reboot", "recv", "recvfrom", "recvmmsg", "recvmsg", "remap_file_pages", "removexattr", "rename", "renameat", "renameat2", "restart_syscall", "rmdir", "rt_sigaction", "rt_sigpending", "rt_sigprocmask", "rt_sigqueueinfo", "rt_sigreturn", "rt_sigsuspend", "rt_sigtimedwait", "rt_tgsigqueueinfo", "sched_get_priority_max", "sched_get_priority_min", "sched_getaffinity", "sched_getattr", "sched_getparam", "sched_getscheduler", "sched_rr_get_interval", "sched_setaffinity", "sched_setattr", "sched_setparam", "sched_setscheduler", "sched_yield", "seccomp", "select", "semctl", "semget", "semop", "semtimedop", "send", "sendfile", "sendfile64", "sendmmsg", "sendmsg", "sendto", "setns", "set_robust_list", "set_thread_area", "set_tid_address", "setfsgid", "setfsgid32", "setfsuid", "setfsuid32", "setgid", "setgid32", "setgroups", "setgroups32", "setitimer", "setpgid", "setpriority", "setregid", "setregid32", "setresgid", "setresgid32", "setresuid", "setresuid32", "setreuid", "setreuid32", "setrlimit", "setsid", "setsockopt", "setuid", "setuid32", "setxattr", "shmat", "shmctl", "shmdt", "shmget", "shutdown", "sigaltstack", "signalfd", "signalfd4", "sigreturn", "socketcall", "socketpair", "splice", "stat", "stat64", "statfs", "statfs64", "statx", "symlink", "symlinkat", "sync", "sync_file_range", "syncfs", "sysinfo", "syslog", "tee", "tgkill", "time", "timer_create", "timer_delete", "timer_getoverrun", "timer_gettime", "timer_gettime64", "timer_settime", "timerfd_create", "timerfd_gettime", "timerfd_gettime64", "timerfd_settime", "timerfd_settime64", "times", "tkill", "truncate", "truncate64", "ugetrlimit", "umask", "umount", "umount2", "uname", "unlink", "unlinkat", "unshare", "utime", "utimensat", "utimensat_time64", "utimes", "vfork", "wait4", "waitid", "waitpid", "write", "writev" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": {}, "excludes": {} }, { "names": [ "personality" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 0, "valueTwo": 0, "op": "SCMP_CMP_EQ" } ], "comment": "", "includes": {}, "excludes": {} }, { "names": [ "personality" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 8, "valueTwo": 0, "op": "SCMP_CMP_EQ" } ], "comment": "", "includes": {}, "excludes": {} }, { "names": [ "personality" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 131072, "valueTwo": 0, "op": "SCMP_CMP_EQ" } ], "comment": "", "includes": {}, "excludes": {} }, { "names": [ "personality" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 131080, "valueTwo": 0, "op": "SCMP_CMP_EQ" } ], "comment": "", "includes": {}, "excludes": {} }, { "names": [ "personality" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 4294967295, "valueTwo": 0, "op": "SCMP_CMP_EQ" } ], "comment": "", "includes": {}, "excludes": {} }, { "names": [ "sync_file_range2" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "arches": [ "ppc64le" ] }, "excludes": {} }, { "names": [ "arm_fadvise64_64", "arm_sync_file_range", "sync_file_range2", "breakpoint", "cacheflush", "set_tls" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "arches": [ "arm", "arm64" ] }, "excludes": {} }, { "names": [ "arch_prctl" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "arches": [ "amd64", "x32" ] }, "excludes": {} }, { "names": [ "modify_ldt" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "arches": [ "amd64", "x32", "x86" ] }, "excludes": {} }, { "names": [ "s390_pci_mmio_read", "s390_pci_mmio_write", "s390_runtime_instr" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "arches": [ "s390", "s390x" ] }, "excludes": {} }, { "names": [ "open_by_handle_at" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_DAC_READ_SEARCH" ] }, "excludes": {} }, { "names": [ "bpf", "fanotify_init", "lookup_dcookie", "perf_event_open", "quotactl", "setdomainname", "sethostname", "setns" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_SYS_ADMIN" ] }, "excludes": {} }, { "names": [ "chroot" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_SYS_CHROOT" ] }, "excludes": {} }, { "names": [ "delete_module", "init_module", "finit_module", "query_module" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_SYS_MODULE" ] }, "excludes": {} }, { "names": [ "get_mempolicy", "mbind", "set_mempolicy" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_SYS_NICE" ] }, "excludes": {} }, { "names": [ "acct" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_SYS_PACCT" ] }, "excludes": {} }, { "names": [ "kcmp", "process_madvise", "process_vm_readv", "process_vm_writev", "ptrace" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_SYS_PTRACE" ] }, "excludes": {} }, { "names": [ "iopl", "ioperm" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_SYS_RAWIO" ] }, "excludes": {} }, { "names": [ "settimeofday", "stime", "clock_settime", "clock_settime64" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_SYS_TIME" ] }, "excludes": {} }, { "names": [ "vhangup" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_SYS_TTY_CONFIG" ] }, "excludes": {} }, { "names": [ "socket" ], "action": "SCMP_ACT_ERRNO", "args": [ { "index": 0, "value": 16, "valueTwo": 0, "op": "SCMP_CMP_EQ" }, { "index": 2, "value": 9, "valueTwo": 0, "op": "SCMP_CMP_EQ" } ], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_AUDIT_WRITE" ] }, "errnoRet": 22 }, { "names": [ "socket" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 2, "value": 9, "valueTwo": 0, "op": "SCMP_CMP_NE" } ], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_AUDIT_WRITE" ] } }, { "names": [ "socket" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 16, "valueTwo": 0, "op": "SCMP_CMP_NE" } ], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_AUDIT_WRITE" ] } }, { "names": [ "socket" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 2, "value": 9, "valueTwo": 0, "op": "SCMP_CMP_NE" } ], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_AUDIT_WRITE" ] } }, { "names": [ "socket" ], "action": "SCMP_ACT_ALLOW", "args": null, "comment": "", "includes": { "caps": [ "CAP_AUDIT_WRITE" ] }, "excludes": {} } ] } ```
Your Seccomp profile does include close_range
in the list of allowed calls, so Podman and Libseccomp should not be generating profiles that block it. It's not conditional in any way, either - allowed without any checks.
You should see the denied seccomp call in /var/log/audit/audit.log
ausearch -m seccomp -i
You should see the denied seccomp call in /var/log/audit/audit.log
ausearch -m seccomp -i
@rhatdan
I do:
----
type=SECCOMP msg=audit(2021-04-27 14:04:24.740:425) : auid=chigozirim uid=unknown(10099) gid=unknown(10099) ses=2 subj==unconfined pid=190649 comm=xfce4-terminal exe=/usr/bin/xfce4-terminal sig=SIG0 arch=x86_64 syscall=close_range compat=0 ip=0x7f76336d8a9d code=errno
Like I said, this only happens inside the container. On my host machine, the problem never occurs
Something is going wrong then, some kind of mismatch between what the OCI Runtime understands is close_range and what the kernel does. You see close_range in /usr/share/containers/seccomp.json correct?
I just wrote a quick patch to podman info to show what seccomp.json file the tool is using.
You see close_range in /usr/share/containers/seccomp.json correct?
Indeed I do
➜ grep -C4 'close_range' /usr/share/containers/seccomp.json
"clock_nanosleep",
"clock_nanosleep_time64",
"clone",
"close",
"close_range",
"connect",
"copy_file_range",
"creat",
"dup",
Are you using runc or crun?
@giuseppe ideas?
I am using crun
. I can switch back to runc
and test it. Let me do that.
The same issue with runc
Also when I switch to runc
, the error is not detected by auditd
(i.e. I don't see it in the logs), but when I strace the command, I see that it still ends at close_range
:
[pid 706] close_range(0, -1, CLOSE_RANGE_CLOEXEC) = -1 EPERM (Operation not permitted)
@giuseppe ideas?
close_range is used by crun.
This is again the same issue with EPERM
vs ENOSYS
we already faced few months ago.
I think it is time we switch to use ENOSYS
by default, the only issue AFAIK is that runc doesn't support yet (https://github.com/opencontainers/runtime-spec/pull/1087).
CC @kolyshkin
Seeing same issue on F33, starting container with --security-opt=seccomp=unconfined
solves it.
$ grep -C4 'close_range' /usr/share/containers/seccomp.json
"clock_nanosleep",
"clock_nanosleep_time64",
"clone",
"close",
"close_range",
"connect",
"copy_file_range",
"creat",
"dup",
$ rpm -qf /usr/share/containers/seccomp.json
containers-common-1-10.fc33.noarch
$ rpm -q podman runc crun
podman-3.1.0-3.fc33.x86_64
runc-1.0.0-377.rc93.fc33.x86_64
crun-0.19.1-2.fc33.x86_64
#Edit: seccomp audit message:
audit[1112353]: SECCOMP auid=1000 uid=1000 gid=1000 ses=3 subj=system_u:system_r:container_init_t:s0:c344,c914 pid=1112353 comm="xfce4-terminal" exe="/usr/bin/xfce4-terminal" sig=0 arch=c000003e syscall=436 compat=0 ip=0x7f6184e4f15d code=0x50000
@smac89 love your bug report, so easy to reproduce!
we also need an updated libseccomp that knows about close_range
and apparently it is not present even upstream at the moment
@giuseppe Did you open a PR with libseccomp to add this?
I think at this point it is easier to fix it for good in our default seccomp profile now that runc rc95 is out and with the feature we need. Also libseccomp uses some scripts to read all the syscalls from the kernel sources, so it is not necessary to update it manually
Ok what is our next steps then? Do we need a new PR to Podman? Containers-common?
PR opened here: https://github.com/containers/common/pull/573
A friendly reminder that this issue had no activity for 30 days.
this is fixed in c/common
I'm still facing this same issue with containers/common-0.40.1:
$ pacman -Ss containers-common
community/containers-common 0.40.1-2 [installed]
Configuration files and manpages for containers
$ podman run --rm -it a83749b0c3fdecb23737bcbc591262cbd8fc91f517b5d61106273d1965658320 /app/walk.c
/app/walk.c opened as FD 3
/proc/self/fd/0 ==> /dev/pts/0
/proc/self/fd/1 ==> /dev/pts/0
/proc/self/fd/2 ==> /dev/pts/0
/proc/self/fd/3 ==> /app/walk.c
/proc/self/fd/4 ==> /proc/1/fd
========= About to call close_range() =======
close_range: Operation not permitted
In my case it seems that a stale config file was probably to blame. Removing and reinstalling the files in /etc/containers
fixed this for me. Sorry for the noise.
EDIT: the problem is actually still here.
I just triple checked, and I'm now in a very weird situation:
I'm now hitting another error (Error: capset: Operation not permitted: OCI permission denied
), but that is in a podman-in-podman situation, and easy to workaround for now with --drop-caps all
.
Support for close_range
has only recently been added to seccomp:
https://github.com/seccomp/libseccomp/commit/ac849e7960547d418009a783da654d5917dbfe2d
~~in the dunfell yocto build on even podman 3.4.2 with https://github.com/seccomp/libseccomp/commit/ac849e7960547d418009a783da654d5917dbfe2d I still observe the same defect.
Error: OCI runtime error: invalid seccomp syscall 'close_range'
~~
never mind updating "crun" from 0.10 to 0.19 fixed the issue.
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description I have an application which uses
close_range
syscall running inside a container. When I run the container, and the application makes that syscall, I get an error saying "Permission denied".At first I was thinking this was a problem with the application, but after some investigating, I am starting to think this may be a podman issue and may have something to do with how it handles seccomp profiles.
Steps to reproduce the issue:
walk.c
```c #define _GNU_SOURCE #includeCopy the above script to /tmp on your host machine
Using
buildah
:7bd46f9814bb
with the id of the built image)Describe the results you received:
The result will look something like:
Describe the results you expected:
Now repeat this same process on your host linux machine (assuming you are running atleast kernel version 5.9)
The program should run successfully with an output similar to:
This is what I expected inside the container
Additional information you deem important (e.g. issue happens only occasionally):
If you run the image with the option
--security-opt seccomp=unconfined
, everything works fine.Does that mean
podman
is simply blocking theclose_range
syscall? Where does podman's default seccomp.json file live? I was under the impression that they use the default one from docker, which whitelistsclose_range
syscall.Output of
podman version
:Output of
podman info --debug
:Package info (e.g. output of
rpm -q podman
orapt list podman
):Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?
Yes
Additional environment details (AWS, VirtualBox, physical, etc.):