containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.25k stars 2.37k forks source link

error running container: error from /usr/bin/crun during build #11951

Closed philnalwalker closed 2 years ago

philnalwalker commented 2 years ago

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

When attempting to build images we receive the following error:

STEP 2/2: RUN sudo apt-get -y update &&     sudo apt-get -y install ssh
error running container: error from /usr/bin/crun creating container for [/bin/sh -c sudo apt-get -y update &&     sudo apt-get -y install ssh]: creating cgroup directory `/sys/fs/cgroup/perf_event/buildah-buildah172591369`: No such file or directory
: exit status 1
Error: error building at STEP "RUN sudo apt-get -y update &&  sudo apt-get -y install ssh": error while running runtime: exit status 1

Our Dockerfile is:

FROM debian
RUN sudo apt-get -y update && sudo apt-get -y install ssh 

Our podman build command is:

podman build --storage-driver=overlay  --ulimit=nofile=1048576:1048576 --events-backend=file .

We have tried "oci", "rootless", and "chroot" isolation modes. "chroot" works, but we see weirdness installing certain packages having to do with systemd on debian. We ideally would like to be running with "oci" isolation.

Our implementation is running on Amazon EKS with the fuse-overlay driver.

Our Kubernetes.yaml container entry is as follows:

    - name: podman
      image: quay.io/podman/stable:latest
      tty: true
      command: [ "cat" ]
      imagePullPolicy: Always
      volumeMounts:
        - name: container-storage
          mountPath: /home/podman/.local/share/containers
      resources:
        limits:
          github.com/fuse: 1
      securityContext:
        capabilities:
          add:
            - "SYS_ADMIN"
            - "MKNOD"
            - "SYS_CHROOT"
            - "SETFCAP"

Steps to reproduce the issue:

  1. Amazon EKS + fuse overlay + podman build and Dockerfile posted above

Describe the results you received:

Receive the above error unless I use --isolation=chroot. If I use --isolation=chroot it throws errors with systemd stuff using Debian 9 as the FROM container. The same Dockerfile installing php builds fine in docker.

dpkg: error processing package systemd (--configure):

 subprocess installed post-installation script returned error exit status 1

dpkg: dependency problems prevent configuration of php7.2-fpm:

 php7.2-fpm depends on systemd | systemd-tmpfiles; however:

  Package systemd is not configured yet.

  Package systemd-tmpfiles is not installed.

dpkg: error processing package php7.2-fpm (--configure):

 dependency problems - leaving unconfigured

dpkg: dependency problems prevent configuration of libpam-systemd:amd64:

 libpam-systemd:amd64 depends on systemd (= 232-25+deb9u13); however:

  Package systemd is not configured yet.

dpkg: error processing package libpam-systemd:amd64 (--configure):

 dependency problems - leaving unconfigured

Processing triggers for libc-bin (2.24-11+deb9u4) ...

Processing triggers for dictionaries-common (1.27.2) ...

aspell-autobuildhash: processing: en [en-common].

aspell-autobuildhash: processing: en [en-variant_0].

aspell-autobuildhash: processing: en [en-variant_1].

aspell-autobuildhash: processing: en [en-variant_2].

aspell-autobuildhash: processing: en [en-w_accents-only].

aspell-autobuildhash: processing: en [en-wo_accents-only].

aspell-autobuildhash: processing: en [en_AU-variant_0].

aspell-autobuildhash: processing: en [en_AU-variant_1].

aspell-autobuildhash: processing: en [en_AU-w_accents-only].

aspell-autobuildhash: processing: en [en_AU-wo_accents-only].

aspell-autobuildhash: processing: en [en_CA-variant_0].

aspell-autobuildhash: processing: en [en_CA-variant_1].

aspell-autobuildhash: processing: en [en_CA-w_accents-only].

aspell-autobuildhash: processing: en [en_CA-wo_accents-only].

aspell-autobuildhash: processing: en [en_GB-ise-w_accents-only].

aspell-autobuildhash: processing: en [en_GB-ise-wo_accents-only].

aspell-autobuildhash: processing: en [en_GB-ize-w_accents-only].

aspell-autobuildhash: processing: en [en_GB-ize-wo_accents-only].

aspell-autobuildhash: processing: en [en_GB-variant_0].

aspell-autobuildhash: processing: en [en_GB-variant_1].

aspell-autobuildhash: processing: en [en_US-w_accents-only].

aspell-autobuildhash: processing: en [en_US-wo_accents-only].

Processing triggers for php7.2-cli (7.2.34-25+0~20210923.65+debian9~1.gbpa3cd00) ...

Processing triggers for php7.2-phpdbg (7.2.34-25+0~20210923.65+debian9~1.gbpa3cd00) ...

Processing triggers for php7.2-cgi (7.2.34-25+0~20210923.65+debian9~1.gbpa3cd00) ...

Processing triggers for libapache2-mod-php7.2 (7.2.34-25+0~20210923.65+debian9~1.gbpa3cd00) ...

Processing triggers for dbus (1.10.32-0+deb9u1) ...

Errors were encountered while processing:

 systemd

 php7.2-fpm

 libpam-systemd:amd64

E: Sub-process /usr/bin/dpkg returned an error code (1)

subprocess exited with status 100

subprocess exited with status 100

Describe the results you expected:

--isolation=oci or --isolation=rootless to work

Additional information you deem important (e.g. issue happens only occasionally):

Happens consistently

Output of podman version:

(paste your output here)

Output of podman info --debug:

+ podman info --debug
host:
  arch: amd64
  buildahVersion: 1.23.1
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: conmon-2.0.30-2.fc34.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.30, commit: '
  cpus: 8
  distribution:
    distribution: fedora
    variant: container
    version: "34"
  eventLogger: file
  hostname: test-qr0t8-6x2fb
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.4.141-67.229.amzn2.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 8956997632
  memTotal: 32836288512
  ociRuntime:
    name: crun
    package: crun-1.2-1.fc34.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.2
      commit: 4f6c8e0583c679bfee6a899c05ac6b916022561b
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.12-2.fc34.x86_64
    version: |-
      slirp4netns version 1.1.12
      commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.0
  swapFree: 0
  swapTotal: 0
  uptime: 170h 6m 35.87s (Approximately 7.08 days)
plugins:
  log:
  - k8s-file
  - none
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.imagestore: /var/lib/shared
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.7.1-2.fc34.x86_64
      Version: |-
        fusermount3 version: 3.10.4
        fuse-overlayfs: version 1.7.1
        FUSE library version 3.10.4
        using FUSE kernel interface version 7.31
    overlay.mountopt: nodev,fsync=0
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 0
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 3.4.0
  Built: 1633030821
  BuiltTime: Thu Sep 30 19:40:21 2021
  GitCommit: ""
  GoVersion: go1.16.8
  OsArch: linux/amd64
  Version: 3.4.0

Package info (e.g. output of rpm -q podman or apt list podman):

Using official stable container

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

AWS EKS + fuse overlay

mheon commented 2 years ago

Are you running Podman in a Kubernetes container?

@rhatdan PTAL

philnalwalker commented 2 years ago

Yes we are running it in a Kubernetes container on Jenkins on EKS using the following yaml:

   - name: podman
      image: quay.io/podman/stable:latest
      tty: true
      command: [ "cat" ]
      imagePullPolicy: Always
      volumeMounts:
        - name: container-storage
          mountPath: /home/podman/.local/share/containers
      resources:
        limits:
          github.com/fuse: 1
      securityContext:
        capabilities:
          add:
            - "SYS_ADMIN"
            - "MKNOD"
            - "SYS_CHROOT"
            - "SETFCAP"

And the following build command:

podman build --storage-driver=overlay  --isolation=chroot --ulimit=nofile=1048576:1048576 --events-backend=file .

I tested running Podman on my Mac laptop using the brew install and was able to build the container successfully there with --isolation=chroot and with --isolation=oci. The specific Dockerfile I tested was:

FROM debian:9

USER root

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get -y update && apt-get install -y locales \
    && dpkg-reconfigure --frontend=noninteractive locales \
    && echo "LC_ALL=en_US.UTF-8" >> /etc/environment \
    && echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen \
    && echo "LANG=en_US.UTF-8" > /etc/locale.conf \
    && locale-gen en_US.UTF-8

RUN apt-get install -qq curl nano vim netcat telnet less iputils-ping dnsutils wget

RUN apt-get install -qq \
    apt-transport-https lsb-release ca-certificates \
    && wget -O /etc/apt/trusted.gpg.d/php.gpg https://packages.sury.org/php/apt.gpg \
    && echo "deb https://packages.sury.org/php/ $(lsb_release -sc) main" > /etc/apt/sources.list.d/php.list \
    && apt-get -y update

RUN apt-get install -y -qq \
    php7.2 \   
    libapache2-mod-php7.2 \
    apache2 \
    xml-core \
    libnet-ssleay-perl \
    libio-socket-ssl-perl \
    mysql-client \
    w3m \
    postfix \
    fontconfig \
    libxrender1 \
    xfonts-base \
    xfonts-75dpi \
    libcap2-bin

# This lines fails on k8s Jenkins on EKS
RUN apt-get -y install php7.2-fpm

The error received on k8s Jenkins on EKS:

dding group `systemd-journal' (GID 105) ...
Done.
chfn: PAM: System error
adduser: `/usr/bin/chfn -f systemd Time Synchronization systemd-timesync' returned error code 1. Exiting.
dpkg: error processing package systemd (--configure):
 subprocess installed post-installation script returned error exit status 1
dpkg: dependency problems prevent configuration of php7.3-fpm:
 php7.3-fpm depends on systemd | systemd-tmpfiles; however:
  Package systemd is not configured yet.
  Package systemd-tmpfiles is not installed.

dpkg: error processing package php7.3-fpm (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of libpam-systemd:amd64:
 libpam-systemd:amd64 depends on systemd (= 232-25+deb9u13); however:
  Package systemd is not configured yet.

dpkg: error processing package libpam-systemd:amd64 (--configure):
 dependency problems - leaving unconfigured
Processing triggers for libc-bin (2.24-11+deb9u4) ...
Processing triggers for dictionaries-common (1.27.2) ...
aspell-autobuildhash: processing: en [en-common].
aspell-autobuildhash: processing: en [en-variant_0].
aspell-autobuildhash: processing: en [en-variant_1].
aspell-autobuildhash: processing: en [en-variant_2].
aspell-autobuildhash: processing: en [en-w_accents-only].
aspell-autobuildhash: processing: en [en-wo_accents-only].
aspell-autobuildhash: processing: en [en_AU-variant_0].
aspell-autobuildhash: processing: en [en_AU-variant_1].
aspell-autobuildhash: processing: en [en_AU-w_accents-only].
aspell-autobuildhash: processing: en [en_AU-wo_accents-only].
aspell-autobuildhash: processing: en [en_CA-variant_0].
aspell-autobuildhash: processing: en [en_CA-variant_1].
aspell-autobuildhash: processing: en [en_CA-w_accents-only].
aspell-autobuildhash: processing: en [en_CA-wo_accents-only].
aspell-autobuildhash: processing: en [en_GB-ise-w_accents-only].
aspell-autobuildhash: processing: en [en_GB-ise-wo_accents-only].
aspell-autobuildhash: processing: en [en_GB-ize-w_accents-only].
aspell-autobuildhash: processing: en [en_GB-ize-wo_accents-only].
aspell-autobuildhash: processing: en [en_GB-variant_0].
aspell-autobuildhash: processing: en [en_GB-variant_1].
aspell-autobuildhash: processing: en [en_US-w_accents-only].
aspell-autobuildhash: processing: en [en_US-wo_accents-only].
Processing triggers for php7.3-cli (7.3.31-1+0~20210923.88+debian9~1.gbpac4058) ...
Processing triggers for php7.3-phpdbg (7.3.31-1+0~20210923.88+debian9~1.gbpac4058) ...
Processing triggers for php7.3-cgi (7.3.31-1+0~20210923.88+debian9~1.gbpac4058) ...
Processing triggers for libapache2-mod-php7.3 (7.3.31-1+0~20210923.88+debian9~1.gbpac4058) ...
Processing triggers for dbus (1.10.32-0+deb9u1) ...
Errors were encountered while processing:
 systemd
 php7.3-fpm
 libpam-systemd:amd64
E: Sub-process /usr/bin/dpkg returned an error code (1)
subprocess exited with status 100
subprocess exited with status 100

When using the following command:

podman build --storage-driver=overlay  --ulimit=nofile=1048576:1048576 --events-backend=file .

We receive the following error on k8s Jenkins on EKS:

STEP 4/8: RUN apt-get -y update && apt-get install -y locales     && dpkg-reconfigure --frontend=noninteractive locales     && echo "LC_ALL=en_US.UTF-8" >> /etc/environment     && echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen     && echo "LANG=en_US.UTF-8" > /etc/locale.conf     && locale-gen en_US.UTF-8
error running container: error from /usr/bin/crun creating container for [/bin/sh -c apt-get -y update && apt-get install -y locales     && dpkg-reconfigure --frontend=noninteractive locales     && echo "LC_ALL=en_US.UTF-8" >> /etc/environment     && echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen     && echo "LANG=en_US.UTF-8" > /etc/locale.conf     && locale-gen en_US.UTF-8]: creating cgroup directory `/sys/fs/cgroup/devices/buildah-buildah571450975`: No such file or directory
: exit status 1
Error: error building at STEP "RUN apt-get -y update && apt-get install -y locales     && dpkg-reconfigure --frontend=noninteractive locales     && echo "LC_ALL=en_US.UTF-8" >> /etc/environment     && echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen     && echo "LANG=en_US.UTF-8" > /etc/locale.conf     && locale-gen en_US.UTF-8": error while running runtime: exit status 1
rhatdan commented 2 years ago

@umohnani8 PTAL

giuseppe commented 2 years ago

I think you need /sys/fs/cgroup to be writeable

philnalwalker commented 2 years ago

I think you need /sys/fs/cgroup to be writeable

Why would it not be? I added all the Linux capabilities to have a rootful Podman without the privileged flag.

What am I missing?

philnalwalker commented 2 years ago

Ran the amicontained executable in the Jenkins job on the EKS node and saw the following:

+ ./amicontained
Container Runtime: kube
Has Namespaces:
    pid: true
    user: false
AppArmor Profile: system_u:system_r:spc_t:s0
Capabilities:
    BOUNDING -> chown dac_override fowner fsetid kill setgid setuid setpcap net_bind_service net_raw sys_chroot sys_admin mknod audit_write setfcap
Seccomp: disabled
Blocked Syscalls (12):
    MSGRCV SETSID VHANGUP ACCT SETTIMEOFDAY REBOOT INIT_MODULE DELETE_MODULE KEXEC_LOAD OPEN_BY_HANDLE_AT FINIT_MODULE KEXEC_FILE_LOAD
Looking for Docker.sock

It looks like seccomp is disabled and only 12 syscalls are being blocked on the k8s/docker side? I will try running docker on the Jenkins node with the podman seccomp profile just in case. Any ideas what else to look at/try?

giuseppe commented 2 years ago

The mount point is handled differently than capabilities. Can you run cat /proc/self/mountinfo in the container to confirm it?

philnalwalker commented 2 years ago

4476 3998 0:697 / / rw,relatime master:315 - overlay overlay rw,seclabel,lowerdir=/var/lib/docker/overlay2/l/LBFS6MNPEPUN7IKI2P6RKTVTIO:/var/lib/docker/overlay2/l/72LKKRR46QJPFKL7YORZ4JRWG4:/var/lib/docker/overlay2/l/IISAYEKOXSFUGN6YOFF6P3NZR6:/var/lib/docker/overlay2/l/ABXLUXFPEV5U5V5E6WJCP7WNGF:/var/lib/docker/overlay2/l/EYAILJIXC3ZYPP4EY6TLWWR3TV:/var/lib/docker/overlay2/l/VKZ53ZSWAWTUF463Y4BGFGOX2V:/var/lib/docker/overlay2/l/NJ4SZOMR77IIT73FPYUNNP5D4I:/var/lib/docker/overlay2/l/22KE3P5OGDC6UIU7RD7NGXFWWA:/var/lib/docker/overlay2/l/OZZ4VRTPEFSHWQJYTOHRMTZ3YC:/var/lib/docker/overlay2/l/DK2MYOWOQ3DRDKKFFZMPC5NDC3,upperdir=/var/lib/docker/overlay2/da1f7a4a6c347be7088c0eaa91b4bb76b1aafb19507ae6e872720848ffaf4431/diff,workdir=/var/lib/docker/overlay2/da1f7a4a6c347be7088c0eaa91b4bb76b1aafb19507ae6e872720848ffaf4431/work
4477 4476 0:698 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
4478 4476 0:699 / /dev rw,nosuid - tmpfs tmpfs rw,seclabel,size=65536k,mode=755
4479 4478 0:700 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,seclabel,gid=5,mode=620,ptmxmode=666
4480 4476 0:686 / /sys ro,nosuid,nodev,noexec,relatime - sysfs sysfs ro,seclabel
4481 4480 0:701 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime - tmpfs tmpfs rw,seclabel,mode=755
4482 4481 0:25 /kubepods/burstable/pod6f7d521c-0e09-4282-9962-231c60e1c503/13bfd7be0983bfe129a6da2710c4f0674448c3c0e169b4fa7dd87c22dd01ee70 /sys/fs/cgroup/systemd ro,nosuid,nodev,noexec,relatime master:9 - cgroup cgroup rw,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
4483 4481 0:27 /kubepods/burstable/pod6f7d521c-0e09-4282-9962-231c60e1c503/13bfd7be0983bfe129a6da2710c4f0674448c3c0e169b4fa7dd87c22dd01ee70 /sys/fs/cgroup/cpu,cpuacct ro,nosuid,nodev,noexec,relatime master:10 - cgroup cgroup rw,seclabel,cpu,cpuacct
4484 4481 0:28 /kubepods/burstable/pod6f7d521c-0e09-4282-9962-231c60e1c503/13bfd7be0983bfe129a6da2710c4f0674448c3c0e169b4fa7dd87c22dd01ee70 /sys/fs/cgroup/blkio ro,nosuid,nodev,noexec,relatime master:11 - cgroup cgroup rw,seclabel,blkio
4485 4481 0:29 /kubepods/burstable/pod6f7d521c-0e09-4282-9962-231c60e1c503/13bfd7be0983bfe129a6da2710c4f0674448c3c0e169b4fa7dd87c22dd01ee70 /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime master:12 - cgroup cgroup rw,seclabel,memory
4486 4481 0:30 /kubepods/burstable/pod6f7d521c-0e09-4282-9962-231c60e1c503/13bfd7be0983bfe129a6da2710c4f0674448c3c0e169b4fa7dd87c22dd01ee70 /sys/fs/cgroup/net_cls,net_prio ro,nosuid,nodev,noexec,relatime master:13 - cgroup cgroup rw,seclabel,net_cls,net_prio
4487 4481 0:31 /kubepods/burstable/pod6f7d521c-0e09-4282-9962-231c60e1c503/13bfd7be0983bfe129a6da2710c4f0674448c3c0e169b4fa7dd87c22dd01ee70 /sys/fs/cgroup/devices ro,nosuid,nodev,noexec,relatime master:14 - cgroup cgroup rw,seclabel,devices
4488 4481 0:32 /kubepods/burstable/pod6f7d521c-0e09-4282-9962-231c60e1c503/13bfd7be0983bfe129a6da2710c4f0674448c3c0e169b4fa7dd87c22dd01ee70 /sys/fs/cgroup/pids ro,nosuid,nodev,noexec,relatime master:15 - cgroup cgroup rw,seclabel,pids
4489 4481 0:33 /kubepods/burstable/pod6f7d521c-0e09-4282-9962-231c60e1c503/13bfd7be0983bfe129a6da2710c4f0674448c3c0e169b4fa7dd87c22dd01ee70 /sys/fs/cgroup/perf_event ro,nosuid,nodev,noexec,relatime master:16 - cgroup cgroup rw,seclabel,perf_event
4490 4481 0:34 /kubepods/burstable/pod6f7d521c-0e09-4282-9962-231c60e1c503/13bfd7be0983bfe129a6da2710c4f0674448c3c0e169b4fa7dd87c22dd01ee70 /sys/fs/cgroup/freezer ro,nosuid,nodev,noexec,relatime master:17 - cgroup cgroup rw,seclabel,freezer
4491 4481 0:35 /kubepods/burstable/pod6f7d521c-0e09-4282-9962-231c60e1c503/13bfd7be0983bfe129a6da2710c4f0674448c3c0e169b4fa7dd87c22dd01ee70 /sys/fs/cgroup/hugetlb ro,nosuid,nodev,noexec,relatime master:18 - cgroup cgroup rw,seclabel,hugetlb
4492 4481 0:36 /kubepods/burstable/pod6f7d521c-0e09-4282-9962-231c60e1c503/13bfd7be0983bfe129a6da2710c4f0674448c3c0e169b4fa7dd87c22dd01ee70 /sys/fs/cgroup/cpuset ro,nosuid,nodev,noexec,relatime master:19 - cgroup cgroup rw,seclabel,cpuset
4527 4478 0:677 / /dev/mqueue rw,nosuid,nodev,noexec,relatime - mqueue mqueue rw,seclabel
4528 4478 259:1 /var/lib/kubelet/pods/6f7d521c-0e09-4282-9962-231c60e1c503/containers/podman/c8370b5a /dev/termination-log rw,noatime - xfs /dev/nvme0n1p1 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
4529 4476 259:1 /var/lib/kubelet/pods/6f7d521c-0e09-4282-9962-231c60e1c503/etc-hosts /etc/hosts rw,noatime - xfs /dev/nvme0n1p1 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
4530 4478 0:675 / /dev/shm rw,nosuid,nodev,noexec,relatime - tmpfs shm rw,seclabel,size=65536k
4531 4476 259:1 /var/lib/docker/containers/99ff66ae0a47581a6c3469bfc0bdaab0203ae98f501632344b8723ad6bf33362/resolv.conf /etc/resolv.conf rw,noatime - xfs /dev/nvme0n1p1 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
4532 4476 259:1 /var/lib/docker/containers/99ff66ae0a47581a6c3469bfc0bdaab0203ae98f501632344b8723ad6bf33362/hostname /etc/hostname rw,noatime - xfs /dev/nvme0n1p1 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
4533 4476 259:1 /var/lib/docker/volumes/358704a2d6d072f42281a7bca9c06e55d6ea71190c661aca6926e40c5985db46/_data /var/lib/containers rw,noatime master:1 - xfs /dev/nvme0n1p1 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
4534 4476 259:1 /var/lib/kubelet/pods/6f7d521c-0e09-4282-9962-231c60e1c503/volumes/kubernetes.io~empty-dir/workspace-volume /home/jenkins/agent rw,noatime - xfs /dev/nvme0n1p1 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
4535 4476 259:1 /var/lib/docker/volumes/f9e2eb49cc958ee84efcbbee665214beae2cb525848a6155db6b082bd2746e54/_data /home/podman/.local/share/containers rw,noatime master:1 - xfs /dev/nvme0n1p1 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
4536 4476 0:612 / /run/secrets/kubernetes.io/serviceaccount ro,relatime - tmpfs tmpfs rw,seclabel
4000 4478 0:700 /0 /dev/console rw,nosuid,noexec,relatime - devpts devpts rw,seclabel,gid=5,mode=620,ptmxmode=666
4001 4477 0:698 /bus /proc/bus ro,relatime - proc proc rw
4002 4477 0:698 /fs /proc/fs ro,relatime - proc proc rw
4003 4477 0:698 /irq /proc/irq ro,relatime - proc proc rw
4004 4477 0:698 /sys /proc/sys ro,relatime - proc proc rw
4006 4477 0:698 /sysrq-trigger /proc/sysrq-trigger ro,relatime - proc proc rw
4008 4477 0:702 / /proc/acpi ro,relatime - tmpfs tmpfs ro,seclabel
4009 4477 0:699 /null /proc/kcore rw,nosuid - tmpfs tmpfs rw,seclabel,size=65536k,mode=755
4010 4477 0:699 /null /proc/keys rw,nosuid - tmpfs tmpfs rw,seclabel,size=65536k,mode=755
4011 4477 0:699 /null /proc/latency_stats rw,nosuid - tmpfs tmpfs rw,seclabel,size=65536k,mode=755
4012 4477 0:699 /null /proc/timer_list rw,nosuid - tmpfs tmpfs rw,seclabel,size=65536k,mode=755
4013 4477 0:699 /null /proc/sched_debug rw,nosuid - tmpfs tmpfs rw,seclabel,size=65536k,mode=755
4014 4480 0:703 / /sys/firmware ro,relatime - tmpfs tmpfs ro,seclabel```
giuseppe commented 2 years ago

4482 4481 0:25 /kubepods/burstable/pod6f7d521c-0e09-4282-9962-231c60e1c503/13bfd7be0983bfe129a6da2710c4f0674448c3c0e169b4fa7dd87c22dd01ee70 /sys/fs/cgroup/systemd ro,nosuid,nodev,noexec,relatime master:9 - cgroup cgroup rw,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd 4483 4481 0:27 /kubepods/burstable/pod6f7d521c-0e09-4282-9962-231c60e1c503/13bfd7be0983bfe129a6da2710c4f0674448c3c0e169b4fa7dd87c22dd01ee70 /sys/fs/cgroup/cpu,cpuacct ro,nosuid,nodev,noexec,relatime master:10 - cgroup cgroup rw,seclabel,cpu,cpuacct 4484 4481 0:28 /kubepods/burstable/pod6f7d521c-0e09-4282-9962-231c60e1c503/13bfd7be0983bfe129a6da2710c4f0674448c3c0e169b4fa7dd87c22dd01ee70 /sys/fs/cgroup/blkio ro,nosuid,nodev,noexec,relatime master:11 - cgroup cgroup rw,seclabel,blkio 4485 4481 0:29 /kubepods/burstable/pod6f7d521c-0e09-4282-9962-231c60e1c503/13bfd7be0983bfe129a6da2710c4f0674448c3c0e169b4fa7dd87c22dd01ee70 /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime master:12 - cgroup cgroup rw,seclabel,memory 4486 4481 0:30 /kubepods/burstable/pod6f7d521c-0e09-4282-9962-231c60e1c503/13bfd7be0983bfe129a6da2710c4f0674448c3c0e169b4fa7dd87c22dd01ee70 /sys/fs/cgroup/net_cls,net_prio ro,nosuid,nodev,noexec,relatime master:13 - cgroup cgroup rw,seclabel,net_cls,net_prio 4487 4481 0:31 /kubepods/burstable/pod6f7d521c-0e09-4282-9962-231c60e1c503/13bfd7be0983bfe129a6da2710c4f0674448c3c0e169b4fa7dd87c22dd01ee70 /sys/fs/cgroup/devices ro,nosuid,nodev,noexec,relatime master:14 - cgroup cgroup rw,seclabel,devices 4488 4481 0:32 /kubepods/burstable/pod6f7d521c-0e09-4282-9962-231c60e1c503/13bfd7be0983bfe129a6da2710c4f0674448c3c0e169b4fa7dd87c22dd01ee70 /sys/fs/cgroup/pids ro,nosuid,nodev,noexec,relatime master:15 - cgroup cgroup rw,seclabel,pids 4489 4481 0:33 /kubepods/burstable/pod6f7d521c-0e09-4282-9962-231c60e1c503/13bfd7be0983bfe129a6da2710c4f0674448c3c0e169b4fa7dd87c22dd01ee70 /sys/fs/cgroup/perf_event ro,nosuid,nodev,noexec,relatime master:16 - cgroup cgroup rw,seclabel,perf_event 4490 4481 0:34 /kubepods/burstable/pod6f7d521c-0e09-4282-9962-231c60e1c503/13bfd7be0983bfe129a6da2710c4f0674448c3c0e169b4fa7dd87c22dd01ee70 /sys/fs/cgroup/freezer ro,nosuid,nodev,noexec,relatime master:17 - cgroup cgroup rw,seclabel,freezer 4491 4481 0:35 /kubepods/burstable/pod6f7d521c-0e09-4282-9962-231c60e1c503/13bfd7be0983bfe129a6da2710c4f0674448c3c0e169b4fa7dd87c22dd01ee70 /sys/fs/cgroup/hugetlb ro,nosuid,nodev,noexec,relatime master:18 - cgroup cgroup rw,seclabel,hugetlb 4492 4481 0:36 /kubepods/burstable/pod6f7d521c-0e09-4282-9962-231c60e1c503/13bfd7be0983bfe129a6da2710c4f0674448c3c0e169b4fa7dd87c22dd01ee70 /sys/fs/cgroup/cpuset ro,nosuid,nodev,noexec,relatime master:19 - cgroup cgroup rw,seclabel,cpuset

all the cgroup controllers are mounted as read-only (you can see the 'ro' option).

You could probably change the mount options at runtime from within the container, since you run with CAP_SYS_ADMIN, i.e. mount -o remount,rw /sys/fs/cgroup/$CONTROLLER, but the way to let Kubernetes configure it for you is to use a privileged container that has an explicit privileged: true in the spec.

philnalwalker commented 2 years ago

This article does not mention having to remount /sys/fs? See "rootful Podman without the privilege flag set."

https://www.redhat.com/sysadmin/podman-inside-kubernetes

Even if I run a rootless Podman I still run into issues where with --isolation=chroot it tries to install systemd and fails on EKS/Jenkins, but the same Dockerfile with FROM debian:9 works with Podman on my laptop. This is when also using --isolation=chroot.

giuseppe commented 2 years ago

@umohnani8 @rhatdan is "podman build --isolation oci" supposed to work without such configuration?

philnalwalker commented 2 years ago

@giuseppe I also tried:

mount -o remount,rw /sys/fs/cgroup/perf_event

In the Podman to see if this works (this is definitely not something I would want to do in production.) It fails:

mount: /sys/fs/cgroup/perf_event: mount point is busy.

We would actually prefer running rootless. I was using a "rootless" configuration and tried going "rootful" as I figured being "rootful" would get me past this weird issue with packages pulling in systemd in Dockerfiles with FROM debian:9 I detailed earlier.

I think there may be a few issues here related to building containers (I am uncertain if it's just 1 underlying issue):

  1. Using "Rootful without privileges" does not seem to work on EKS - this may be by design?
  2. "Rootless" with --isolation=chroot building FROM debian:9 containers that install packages like ssh or php7.2 try to install systemd and fail on EKS (detailed in above posts.) These same Dockerfiles build succesfully using the same official Podman v3.4.0 container using Podman on MacOS even with --isolation=chroot.
  3. "Rootless" with --isolation=rootless results in: mount/procto/proc: Operation not permitted
philnalwalker commented 2 years ago

What is even stranger is that if I take the same Dockerfile and switch to debian:10 I don't get the systemd install freaking out.

philnalwalker commented 2 years ago

Any update on this?

rhatdan commented 2 years ago

@giuseppe any new thoughts. As far as should this work, I have no idea. The issue is the complexity of how kubernetes/cri-o are configuring the container that is running with Podman.

giuseppe commented 2 years ago

@giuseppe any new thoughts.

not really. Personally I don't see any advantage in running a nested container while being already in a locked environment. To be able to create the nested container we probably need to grant more permissions than are required just to run --isolation chroot.

So my question is, what do we really try to protect from that --isolation oci works better than --isolation chroot while being in an unprivileged container where cgroups are already mounted ro since delegation is not safe on cgroup v1?

philnalwalker commented 2 years ago

Our specific use case is to build containers using Jenkins on EKS, with Podman instead of DinD, for security and performance reasons. We ideally want to be able to run rootlesss with OCI isolation mode. Please let me know if I may do any additional testing to help troubleshoot. Does Redhat offer any type of Enterprise support subscription for Podman?

philnalwalker commented 2 years ago

--isolation oci also does not work for us (error is pasted in earlier posts.) Should --isolation oci work when building containers on Amazon EKS using Podman?

rhatdan commented 2 years ago

Depends on the privs available in AKS. In order to run rootless Podman within a container you need at least CAP_SETUID and CAP_SETGID. In order to run rootfull Podman you need CAP_SYS_ADMIN. We are working on running OpenShift/CRI-O Containers directly within a user namespace, which would allow you to run without these capabilities. But I don't know if EKS has anything similar.

Bottom line to run a container within a container the container almost always needs multiple UIDs and needs capabilties to setup namespaces and mounts.

philnalwalker commented 2 years ago

Only 12 syscalls are getting filtered and seccomp is disabled when when running "amicontained" in the k8s pod running Podman. I posted the output in an earlier reply to this issue.

I also tried adding every available Linux capability and see the same issue with Podman. --privileged is a non-starter for us as reduced privileges on k8s is the big reason why we are replacing Docker with Podman for container build operations.

I'm wondering if this has something to do with AWS EKS being based on older RHEL and Podman being Fedora. Perhaps control groups related?

Would you please try running Podman build with --isolation=oci on vanilla AWS EKS + fuse overlay Daemonset? Maybe there is something obvious or an easy fix?

giuseppe commented 2 years ago

systemd on cgroupv1 needs the /sys/fs/cgroup/systemd named hierarchy to be writeable.

We have no control over that. You need to ensure it exists in the outer container and that it is usable by systemd in a container.