containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.32k stars 2.38k forks source link

Document exact seccomp profile to use when running Podman inside a k8s container (CI container build use case) #9226

Closed philnalwalker closed 3 years ago

philnalwalker commented 3 years ago

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind feature

Description

We have Podman working with overlay to build containers on EKS Jenkins, however this is by way of granting the SYS_ADMIN capability. We want to eliminate granting SYS_ADMIN by way of utilizing a seccomp profile.

After doing a bit of Googling and reading similar threads on Github issues we tried @rhatdan's suggestion of using the latest Fedora container-common package seccomp.json. However we encounter the following error:

+ podman build --isolation=chroot --events-backend=file -f Dockerfile .
Error: mount /var/lib/containers/storage/overlay:/var/lib/containers/storage/overlay, flags: 0x1000: operation not permitted

We did some negative testing to make sure it was actually using the profile. If we give it a bad profile name the pod will not come up, so it is using this profile albeit unsuccessfully.

We did read the Podman needs unshare2 and do not see this syscall in the seccomp profile?

Regardless, it would be good to document the exact seccomp profile JSON needed to use Podman in a Kubernetes container (CI container build use case.) Perhaps we can make a PR with the instructions once we get this working?

Also, if we do not add --isolation-chroot to the Podman command line we get the following error:

time="2021-02-03T21:01:12Z" level=error msg="container_linux.go:370: starting container process caused: process_linux.go:326: applying cgroup configuration for process caused: mkdir /sys/fs/cgroup/cpuset/buildah-buildah755764111: read-only file system"

Is --isolation-chroot required when building a container using Podman in a Kubernetes container?

Steps to reproduce the issue:

  1. Put the following profile.json in /var/lib/kubelet/seccomp (same as latest Fedora containers-common seccomp.json):
{
    "defaultAction": "SCMP_ACT_ERRNO",
    "archMap": [
        {
            "architecture": "SCMP_ARCH_X86_64",
            "subArchitectures": [
                "SCMP_ARCH_X86",
                "SCMP_ARCH_X32"
            ]
        },
        {
            "architecture": "SCMP_ARCH_AARCH64",
            "subArchitectures": [
                "SCMP_ARCH_ARM"
            ]
        },
        {
            "architecture": "SCMP_ARCH_MIPS64",
            "subArchitectures": [
                "SCMP_ARCH_MIPS",
                "SCMP_ARCH_MIPS64N32"
            ]
        },
        {
            "architecture": "SCMP_ARCH_MIPS64N32",
            "subArchitectures": [
                "SCMP_ARCH_MIPS",
                "SCMP_ARCH_MIPS64"
            ]
        },
        {
            "architecture": "SCMP_ARCH_MIPSEL64",
            "subArchitectures": [
                "SCMP_ARCH_MIPSEL",
                "SCMP_ARCH_MIPSEL64N32"
            ]
        },
        {
            "architecture": "SCMP_ARCH_MIPSEL64N32",
            "subArchitectures": [
                "SCMP_ARCH_MIPSEL",
                "SCMP_ARCH_MIPSEL64"
            ]
        },
        {
            "architecture": "SCMP_ARCH_S390X",
            "subArchitectures": [
                "SCMP_ARCH_S390"
            ]
        }
    ],
    "syscalls": [
        {
            "names": [
                "_llseek",
                "_newselect",
                "accept",
                "accept4",
                "access",
                "adjtimex",
                "alarm",
                "bind",
                "brk",
                "capget",
                "capset",
                "chdir",
                "chmod",
                "chown",
                "chown32",
                "clock_adjtime",
                "clock_adjtime64",
                "clock_getres",
                "clock_getres_time64",
                "clock_gettime",
                "clock_gettime64",
                "clock_nanosleep",
                "clock_nanosleep_time64",
                "clone",
                "close",
                "connect",
                "copy_file_range",
                "creat",
                "dup",
                "dup2",
                "dup3",
                "epoll_create",
                "epoll_create1",
                "epoll_ctl",
                "epoll_ctl_old",
                "epoll_pwait",
                "epoll_wait",
                "epoll_wait_old",
                "eventfd",
                "eventfd2",
                "execve",
                "execveat",
                "exit",
                "exit_group",
                "faccessat",
                "faccessat2",
                "fadvise64",
                "fadvise64_64",
                "fallocate",
                "fanotify_mark",
                "fchdir",
                "fchmod",
                "fchmodat",
                "fchown",
                "fchown32",
                "fchownat",
                "fcntl",
                "fcntl64",
                "fdatasync",
                "fgetxattr",
                "flistxattr",
                "flock",
                "fork",
                "fremovexattr",
                "fsetxattr",
                "fstat",
                "fstat64",
                "fstatat64",
                "fstatfs",
                "fstatfs64",
                "fsync",
                "ftruncate",
                "ftruncate64",
                "futex",
                "futimesat",
                "get_robust_list",
                "get_thread_area",
                "getcpu",
                "getcwd",
                "getdents",
                "getdents64",
                "getegid",
                "getegid32",
                "geteuid",
                "geteuid32",
                "getgid",
                "getgid32",
                "getgroups",
                "getgroups32",
                "getitimer",
                "getpeername",
                "getpgid",
                "getpgrp",
                "getpid",
                "getppid",
                "getpriority",
                "getrandom",
                "getresgid",
                "getresgid32",
                "getresuid",
                "getresuid32",
                "getrlimit",
                "getrusage",
                "getsid",
                "getsockname",
                "getsockopt",
                "gettid",
                "gettimeofday",
                "getuid",
                "getuid32",
                "getxattr",
                "inotify_add_watch",
                "inotify_init",
                "inotify_init1",
                "inotify_rm_watch",
                "io_cancel",
                "io_destroy",
                "io_getevents",
                "io_setup",
                "io_submit",
                "ioctl",
                "ioprio_get",
                "ioprio_set",
                "ipc",
                "keyctl",
                "kill",
                "lchown",
                "lchown32",
                "lgetxattr",
                "link",
                "linkat",
                "listen",
                "listxattr",
                "llistxattr",
                "lremovexattr",
                "lseek",
                "lsetxattr",
                "lstat",
                "lstat64",
                "madvise",
                "memfd_create",
                "mincore",
                "mkdir",
                "mkdirat",
                "mknod",
                "mknodat",
                "mlock",
                "mlock2",
                "mlockall",
                "mmap",
                "mmap2",
                "mount",
                "mprotect",
                "mq_getsetattr",
                "mq_notify",
                "mq_open",
                "mq_timedreceive",
                "mq_timedsend",
                "mq_unlink",
                "mremap",
                "msgctl",
                "msgget",
                "msgrcv",
                "msgsnd",
                "msync",
                "munlock",
                "munlockall",
                "munmap",
                "name_to_handle_at",
                "nanosleep",
                "newfstatat",
                "open",
                "openat",
                "openat2",
                "pause",
                "pidfd_getfd",
                "pipe",
                "pipe2",
                "pivot_root",
                "poll",
                "ppoll",
                "ppoll_time64",
                "prctl",
                "pread64",
                "preadv",
                "preadv2",
                "prlimit64",
                "pselect6",
                "pselect6_time64",
                "pwrite64",
                "pwritev",
                "pwritev2",
                "read",
                "readahead",
                "readlink",
                "readlinkat",
                "readv",
                "reboot",
                "recv",
                "recvfrom",
                "recvmmsg",
                "recvmsg",
                "remap_file_pages",
                "removexattr",
                "rename",
                "renameat",
                "renameat2",
                "restart_syscall",
                "rmdir",
                "rt_sigaction",
                "rt_sigpending",
                "rt_sigprocmask",
                "rt_sigqueueinfo",
                "rt_sigreturn",
                "rt_sigsuspend",
                "rt_sigtimedwait",
                "rt_tgsigqueueinfo",
                "sched_get_priority_max",
                "sched_get_priority_min",
                "sched_getaffinity",
                "sched_getattr",
                "sched_getparam",
                "sched_getscheduler",
                "sched_rr_get_interval",
                "sched_setaffinity",
                "sched_setattr",
                "sched_setparam",
                "sched_setscheduler",
                "sched_yield",
                "seccomp",
                "select",
                "semctl",
                "semget",
                "semop",
                "semtimedop",
                "send",
                "sendfile",
                "sendfile64",
                "sendmmsg",
                "sendmsg",
                "sendto",
                "set_robust_list",
                "set_thread_area",
                "set_tid_address",
                "setfsgid",
                "setfsgid32",
                "setfsuid",
                "setfsuid32",
                "setgid",
                "setgid32",
                "setgroups",
                "setgroups32",
                "setitimer",
                "setpgid",
                "setpriority",
                "setregid",
                "setregid32",
                "setresgid",
                "setresgid32",
                "setresuid",
                "setresuid32",
                "setreuid",
                "setreuid32",
                "setrlimit",
                "setsid",
                "setsockopt",
                "setuid",
                "setuid32",
                "setxattr",
                "shmat",
                "shmctl",
                "shmdt",
                "shmget",
                "shutdown",
                "sigaltstack",
                "signalfd",
                "signalfd4",
                "sigreturn",
                "socket",
                "socketcall",
                "socketpair",
                "splice",
                "stat",
                "stat64",
                "statfs",
                "statfs64",
                "statx",
                "symlink",
                "symlinkat",
                "sync",
                "sync_file_range",
                "syncfs",
                "sysinfo",
                "syslog",
                "tee",
                "tgkill",
                "time",
                "timer_create",
                "timer_delete",
                "timer_getoverrun",
                "timer_gettime",
                "timer_gettime64",
                "timer_settime",
                "timerfd_create",
                "timerfd_gettime",
                "timerfd_gettime64",
                "timerfd_settime",
                "timerfd_settime64",
                "times",
                "tkill",
                "truncate",
                "truncate64",
                "ugetrlimit",
                "umask",
                "umount",
                "umount2",
                "uname",
                "unlink",
                "unlinkat",
                "unshare",
                "utime",
                "utimensat",
                "utimensat_time64",
                "utimes",
                "vfork",
                "vmsplice",
                "wait4",
                "waitid",
                "waitpid",
                "write",
                "writev"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [],
            "comment": "",
            "includes": {},
            "excludes": {}
        },
        {
            "names": [
                "personality"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [
                {
                    "index": 0,
                    "value": 0,
                    "valueTwo": 0,
                    "op": "SCMP_CMP_EQ"
                }
            ],
            "comment": "",
            "includes": {},
            "excludes": {}
        },
        {
            "names": [
                "personality"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [
                {
                    "index": 0,
                    "value": 8,
                    "valueTwo": 0,
                    "op": "SCMP_CMP_EQ"
                }
            ],
            "comment": "",
            "includes": {},
            "excludes": {}
        },
        {
            "names": [
                "personality"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [
                {
                    "index": 0,
                    "value": 131072,
                    "valueTwo": 0,
                    "op": "SCMP_CMP_EQ"
                }
            ],
            "comment": "",
            "includes": {},
            "excludes": {}
        },
        {
            "names": [
                "personality"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [
                {
                    "index": 0,
                    "value": 131080,
                    "valueTwo": 0,
                    "op": "SCMP_CMP_EQ"
                }
            ],
            "comment": "",
            "includes": {},
            "excludes": {}
        },
        {
            "names": [
                "personality"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [
                {
                    "index": 0,
                    "value": 4294967295,
                    "valueTwo": 0,
                    "op": "SCMP_CMP_EQ"
                }
            ],
            "comment": "",
            "includes": {},
            "excludes": {}
        },
        {
            "names": [
                "sync_file_range2"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [],
            "comment": "",
            "includes": {
                "arches": [
                    "ppc64le"
                ]
            },
            "excludes": {}
        },
        {
            "names": [
                "arm_fadvise64_64",
                "arm_sync_file_range",
                "sync_file_range2",
                "breakpoint",
                "cacheflush",
                "set_tls"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [],
            "comment": "",
            "includes": {
                "arches": [
                    "arm",
                    "arm64"
                ]
            },
            "excludes": {}
        },
        {
            "names": [
                "arch_prctl"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [],
            "comment": "",
            "includes": {
                "arches": [
                    "amd64",
                    "x32"
                ]
            },
            "excludes": {}
        },
        {
            "names": [
                "modify_ldt"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [],
            "comment": "",
            "includes": {
                "arches": [
                    "amd64",
                    "x32",
                    "x86"
                ]
            },
            "excludes": {}
        },
        {
            "names": [
                "s390_pci_mmio_read",
                "s390_pci_mmio_write",
                "s390_runtime_instr"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [],
            "comment": "",
            "includes": {
                "arches": [
                    "s390",
                    "s390x"
                ]
            },
            "excludes": {}
        },
        {
            "names": [
                "open_by_handle_at"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [],
            "comment": "",
            "includes": {
                "caps": [
                    "CAP_DAC_READ_SEARCH"
                ]
            },
            "excludes": {}
        },
        {
            "names": [
                "bpf",
                "clone",
                "fanotify_init",
                "lookup_dcookie",
                "mount",
                "name_to_handle_at",
                "perf_event_open",
                "quotactl",
                "setdomainname",
                "sethostname",
                "setns",
                "umount",
                "umount2",
                "unshare"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [],
            "comment": "",
            "includes": {
                "caps": [
                    "CAP_SYS_ADMIN"
                ]
            },
            "excludes": {}
        },
        {
            "names": [
                "clone"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [
                {
                    "index": 0,
                    "value": 2080505856,
                    "valueTwo": 0,
                    "op": "SCMP_CMP_MASKED_EQ"
                }
            ],
            "comment": "",
            "includes": {},
            "excludes": {
                "caps": [
                    "CAP_SYS_ADMIN"
                ],
                "arches": [
                    "s390",
                    "s390x"
                ]
            }
        },
        {
            "names": [
                "clone"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [
                {
                    "index": 1,
                    "value": 2080505856,
                    "valueTwo": 0,
                    "op": "SCMP_CMP_MASKED_EQ"
                }
            ],
            "comment": "s390 parameter ordering for clone is different",
            "includes": {
                "arches": [
                    "s390",
                    "s390x"
                ]
            },
            "excludes": {
                "caps": [
                    "CAP_SYS_ADMIN"
                ]
            }
        },
        {
            "names": [
                "reboot"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [],
            "comment": "",
            "includes": {
                "caps": [
                    "CAP_SYS_BOOT"
                ]
            },
            "excludes": {}
        },
        {
            "names": [
                "chroot"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [],
            "comment": "",
            "includes": {
                "caps": [
                    "CAP_SYS_CHROOT"
                ]
            },
            "excludes": {}
        },
        {
            "names": [
                "delete_module",
                "init_module",
                "finit_module",
                "query_module"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [],
            "comment": "",
            "includes": {
                "caps": [
                    "CAP_SYS_MODULE"
                ]
            },
            "excludes": {}
        },
        {
            "names": [
                "get_mempolicy",
                "mbind",
                "name_to_handle_at",
                "set_mempolicy"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [],
            "comment": "",
            "includes": {
                "caps": [
                    "CAP_SYS_NICE"
                ]
            },
            "excludes": {}
        },
        {
            "names": [
                "acct"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [],
            "comment": "",
            "includes": {
                "caps": [
                    "CAP_SYS_PACCT"
                ]
            },
            "excludes": {}
        },
        {
            "names": [
                "kcmp",
                "process_vm_readv",
                "process_vm_writev",
                "ptrace"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [],
            "comment": "",
            "includes": {
                "caps": [
                    "CAP_SYS_PTRACE"
                ]
            },
            "excludes": {}
        },
        {
            "names": [
                "iopl",
                "ioperm"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [],
            "comment": "",
            "includes": {
                "caps": [
                    "CAP_SYS_RAWIO"
                ]
            },
            "excludes": {}
        },
        {
            "names": [
                "settimeofday",
                "stime",
                "clock_settime",
                "clock_settime64"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [],
            "comment": "",
            "includes": {
                "caps": [
                    "CAP_SYS_TIME"
                ]
            },
            "excludes": {}
        },
        {
            "names": [
                "vhangup"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [],
            "comment": "",
            "includes": {
                "caps": [
                    "CAP_SYS_TTY_CONFIG"
                ]
            },
            "excludes": {}
        },
        {
            "names": [
                "socket"
            ],
            "action": "SCMP_ACT_ERRNO",
            "args": [
                {
                    "index": 0,
                    "value": 16,
                    "valueTwo": 0,
                    "op": "SCMP_CMP_EQ"
                },
                {
                    "index": 2,
                    "value": 9,
                    "valueTwo": 0,
                    "op": "SCMP_CMP_EQ"
                }
            ],
            "comment": "",
            "includes": {},
            "excludes": {
                "caps": [
                    "CAP_AUDIT_WRITE"
                ]
            },
            "errnoRet": 22
        },
        {
            "names": [
                "socket"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [
                {
                    "index": 2,
                    "value": 9,
                    "valueTwo": 0,
                    "op": "SCMP_CMP_NE"
                }
            ],
            "comment": "",
            "includes": {},
            "excludes": {
                "caps": [
                    "CAP_AUDIT_WRITE"
                ]
            }
        },
        {
            "names": [
                "socket"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [
                {
                    "index": 0,
                    "value": 16,
                    "valueTwo": 0,
                    "op": "SCMP_CMP_NE"
                }
            ],
            "comment": "",
            "includes": {},
            "excludes": {
                "caps": [
                    "CAP_AUDIT_WRITE"
                ]
            }
        },
        {
            "names": [
                "socket"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": [
                {
                    "index": 2,
                    "value": 9,
                    "valueTwo": 0,
                    "op": "SCMP_CMP_NE"
                }
            ],
            "comment": "",
            "includes": {},
            "excludes": {
                "caps": [
                    "CAP_AUDIT_WRITE"
                ]
            }
        },
        {
            "names": [
                "socket"
            ],
            "action": "SCMP_ACT_ALLOW",
            "args": null,
            "comment": "",
            "includes": {
                "caps": [
                    "CAP_AUDIT_WRITE"
                ]
            },
            "excludes": {}
        }
    ]
}
  1. Annotate pod to use profile.json:
apiVersion: v1
kind: Pod
metadata:
  name: sandbox
  annotations:
    container.seccomp.security.alpha.kubernetes.io/podman: 'localhost/profile.json'

Describe the results you received:

Error: mount /var/lib/containers/storage/overlay:/var/lib/containers/storage/overlay, flags: 0x1000: operation not permitted

Describe the results you expected:

Container image to build

Output of podman version:

+ podman version
Version:      2.2.1
API Version:  2.1.0
Go Version:   go1.15.5
Built:        Tue Dec  8 14:37:50 2020
OS/Arch:      linux/amd64

Output of podman info --debug: '''+ podman info --debug host: arch: amd64 buildahVersion: 1.18.0 cgroupManager: cgroupfs cgroupVersion: v1 conmon: package: conmon-2.0.21-3.fc33.x86_64 path: /usr/bin/conmon version: 'conmon version 2.0.21, commit: 0f53fb68333bdead5fe4dc5175703e22cf9882ab' cpus: 8 distribution: distribution: fedora version: "33" eventLogger: file hostname: test-ljvvq-zhk4d idMappings: gidmap: null uidmap: null kernel: 5.4.80-40.140.amzn2.x86_64 linkmode: dynamic memFree: 17086222336 memTotal: 33191227392 ociRuntime: name: crun package: crun-0.17-1.fc33.x86_64 path: /usr/bin/crun version: |- crun version 0.17 commit: 0e9229ae34caaebcb86f1fde18de3acaf18c6d9a spec: 1.0.0 +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL os: linux remoteSocket: path: /run/podman/podman.sock rootless: false slirp4netns: executable: "" package: "" version: "" swapFree: 0 swapTotal: 0 uptime: 120h 55m 15.85s (Approximately 5.00 days) registries: search:

Package info (e.g. output of rpm -q podman or apt list podman):

Using quay.io podman:stable container

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

Jenkins on EKS

rhatdan commented 3 years ago

The seccomp rules provided in containers-common /usr/share/contaniers/seccomp.json

Should allow buildah/podman build to run within a container.

the --isolation=chroot, basically eliminates some of the more privileged requirements required for setting up lots of namespaces. From a security point of view, we see this as you are already within a container, so want launch subcontainers. just use chroot to setup your build environment.

philnalwalker commented 3 years ago

Hi @rhatdan I specifically tried the seccomp rules provided in containers-common /usr/share/contaniers/seccomp.json. I also pasted the file in the original issue. Is it possible there are some needed syscalls missing? Is there a good way for us to efficiently figure out what syscalls to grant?

rhatdan commented 3 years ago

The syscall that is missing should be listed in the audit.log or in the journal.

rhatdan commented 3 years ago

BTW, We prefer to just use buildah bud when running builds within a container.

https://developers.redhat.com/blog/2019/08/14/best-practices-for-running-buildah-in-a-container/

rhatdan commented 3 years ago

We have multiple podman in container issues opened lets have discussion in one place. #9015

philnalwalker commented 3 years ago

@rhatdan The seccomp profile you linked Dan actually worked. I made the mistake of removing CAP_SYS_ADMIN from the pod thinking that I did not need that anymore when using the seccomp profile. For anyone else facing this issue you need both CAP_SYS_ADMIN in the securityContext and the appropriate seccomp metadata annotation.

Thanks for the feedback on using buildah bud over podman. Why do you prefer it over podman in automated builds?

rhatdan commented 3 years ago

buildah bud eliminates some of the features in podman and has been used a lot more for this use case.

But if you are running privileged it does not matter. If you want to run as locked down as possible buildah bud --isolation chroot...
Can be run within a container running with /dev/fuse, CAP_SETUID, CAP_SETGID and nothing else. Since it uses the user namespace.

When running within CRI-O I would like to experiement with running the entire podman build or buildah bud container within a separate user namespace.