containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.22k stars 2.37k forks source link

Running rootless Podman inside OpenShift container #6667

Closed jskov-jyskebank-dk closed 3 years ago

jskov-jyskebank-dk commented 4 years ago

/kind bug

Description

Podman is not able to pull images when running in an OpenShift container.

There are elements seen in many other reported issues. I hope to get some help in tracking down what I am missing, so it can be added to the podman documentation proper (I will be happy to help with this).

Steps to reproduce the issue:

Full description of the steps taken - with bog standard images and very little setup - in this repository: https://github.com/jskovjyskebankdk/openshift-podman

Describe the results you received: Podman fails image pull with "there might not be enough IDs available in the namespace".

Describe the results you expected: The image pull should complete :)

Additional information you deem important (e.g. issue happens only occasionally):

This may be a side effect of how OpenShift configures containers, compared to running a container locally in docker/podman. It must be similar to whatever people face on cloud hosting, but I have not been able to find anything documenting how to get things running.

Output of podman version:

Version:      2.0.0-dev
API Version:  1
Go Version:   go1.14.3
Git Commit:   d857275901e8c1ea7515360631e5894018e17f30
Built:        Sat Jun  6 00:00:00 2020
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.15.0-dev
  cgroupVersion: v1
  conmon:
    package: conmon-2.0.18-0.6.dev.git50aeae4.fc33.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.18-dev, commit: 51e91bbc42aaf0676bb4023fb86f00460bf7a0a2'
  cpus: 4
  distribution:
    distribution: fedora
    version: "33"
  eventLogger: file
  hostname: podman-2-67wg2
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 0
      size: 1
    uidmap:
    - container_id: 0
      host_id: 1000590000
      size: 1
  kernel: 4.18.0-147.8.1.el8_1.x86_64
  linkmode: dynamic
  memFree: 358072320
  memTotal: 33726861312
  ociRuntime:
    name: runc
    package: runc-1.0.0-238.dev.git1b97c04.fc33.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc10+dev
      commit: 1aa8febe14501045ff2a65ec0c01b0400245cb3c
      spec: 1.0.2-dev
  os: linux
  remoteSocket:
    path: /tmp/run-1000590000/podman/podman.sock
  rootless: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.1-2.dev.git483e855.fc33.x86_64
    version: |-
      slirp4netns version 1.1.1+dev
      commit: 483e85547b22a6f8b9230e23b3e9815a41347771
      libslirp: 4.3.0
      SLIRP_CONFIG_VERSION_MAX: 3
  swapFree: 0
  swapTotal: 0
  uptime: 560h 7m 38.01s (Approximately 23.33 days)
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - docker.io
store:
  configFile: /home/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.0.0-3.dev.gitf3e4154.fc33.x86_64
      Version: |-
        fusermount3 version: 3.9.1
        fuse-overlayfs: version 1.0.0
        FUSE library version 3.9.1
        using FUSE kernel interface version 7.31
  graphRoot: /home/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: overlayfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 0
  runRoot: /tmp/run-1000590000/containers
  volumePath: /home/.local/share/containers/storage/volumes
version:
  APIVersion: 1
  Built: 1591401600
  BuiltTime: Sat Jun  6 00:00:00 2020
  GitCommit: d857275901e8c1ea7515360631e5894018e17f30
  GoVersion: go1.14.3
  OsArch: linux/amd64
  Version: 2.0.0-dev

Package info (e.g. output of rpm -q podman or apt list podman):

podman-2.0.0-0.121.dev.git1fcb678.fc33.x86_64

Additional environment details (AWS, VirtualBox, physical, etc.):

On-premise OpenShift 4.3.21

rhatdan commented 4 years ago

If you are pulling an image with multiple UIDs then you need to run the container with multiple UIDs Usually you can do this as root. If you are running in openshift without being root, then this could cause issues. You could setup user namespace within the container, but it would still require CAP_SETUID and CAP_SETGID in order to start the user namespace within the container.

jskov-jyskebank-dk commented 4 years ago

Yes, I am pulling the fedora image.

I do not mind CAP_SETUID/GID as much on their own. But I think running under the 'privileged' SCC would be a deal breaker. I will give it a shot though, just to confirm that I can get it running.

Do I understand the last sentence correctly? Is it possible to setup an additional user namespace inside the container, even though OpenShift only appears to provide it an uidmap size of 1?

Ta!

rhatdan commented 4 years ago

You should be able to get this to work. This is the way the buildah and podman images are configured in quay.io/buildah/stable and quay.io/podman/stable

https://github.com/containers/libpod/blob/master/contrib/podmanimage/stable/Dockerfile

The idea is to setup a user within the container and then to launch the container with that user. This is still not fully working, and I hope to get back to it, once we ship podman 2.0.

jskov-jyskebank-dk commented 4 years ago

Just a progress update:

I have tried using the podman:stable image as base for my OpenShift container, and it does bring me past the user id problem.

So I can now pull images. Thanks!

But running anything in those pulled images still fails (due to missing fuse module, it seems).

I will explore further on Monday.

jskov-jyskebank-dk commented 4 years ago

Waiting for the rain to end, so I collected a little more info.

Running an image fails with:

$ podman run -it docker.io/library/alpine /bin/sh -c "echo 'hello world!'"
ERRO[0000] error unmounting /home/.local/share/containers/storage/overlay/a8e3377a8f75d187a906823f1d8da6bfe5d37771b2d0a4354444f86f722a854c/merged: invalid argument 
Error: error mounting storage for container c967d9189c3ca165788ca68d069cafd3a3f60fd95eb86c6726c6ef3215a20918: error creating overlay mount to /home/.local/share/containers/storage/overlay/a8e3377a8f75d187a906823f1d8da6bfe5d37771b2d0a4354444f86f722a854c/merged: using mount program /usr/bin/fuse-overlayfs: fuse: device not found, try 'modprobe fuse' first
fuse-overlayfs: cannot mount: No such file or directory
: exit status 1

The kernel is 4.18.0-147.8.1.el8_1.x86_64 (which should be new enough).

But according to https://developers.redhat.com/blog/2019/08/14/best-practices-for-running-buildah-in-a-container/ the container needs to be provided with access to /dev/fuse.

And that matches the "setup" arguments: podman run --device /dev/fuse ... in the link you provided.

Presumably this is a deal breaker in context of OpenShift?

rhatdan commented 4 years ago

You need to add /dev/fuse to the container, you can do this with CRI-O via the crio.conf for every container, which is what I am recommending we do by default. I am no Kubernetes expert, but I believe their is now a way to add devices via Kubernetes, which should also work.

BTW Great that you are working on this. I have plans to try this out, but am tied up in releasing podman2.0.

jskov-jyskebank-dk commented 4 years ago

I got /dev/fuse added (via hostPath mapping in the Pod), but access to it fails:

$ podman run -it --device /dev/fuse:rw docker.io/library/alpine /bin/sh -c "echo 'hello'"
ERRO[0000] error unmounting /home/.local/share/containers/storage/overlay/06cdc160e71b46ce840709b7567a2bf377c96b51e40c139c437597a012bdef46/merged: invalid argument 
Error: error mounting storage for container 9e3da9c3ae96a85d4315bdb09e93891578eff603dbe0a19a7345326298262c5f: error creating overlay mount to /home/.local/share/containers/storage/overlay/06cdc160e71b46ce840709b7567a2bf377c96b51e40c139c43759
7a012bdef46/merged: using mount program /usr/bin/fuse-overlayfs: fuse: failed to open /dev/fuse: Operation not permitted
fuse-overlayfs: cannot mount: Operation not permitted
: exit status 1

(full debug output at end of https://github.com/jskovjyskebankdk/openshift-podman)

Various issues on libpod suggests this may be a selinux issue, and our OpenShift installation runs with selinux enabled (as it should). Selinux is not something we are likely be able to tweak (unless it can be done from SCC or something).

Is there maybe something else that could be the cause of this problem?

Thanks!

jskov-jyskebank-dk commented 4 years ago

Adding to this, if I run the container privileged, it actually does work.

sh-5.0$ podman run -it --device /dev/fuse:rw docker.io/library/alpine /bin/sh -c "echo 'hello'"
hello

But that just wets the appetite for running it without high privileges.

Any suggestions appreciated!

rhatdan commented 4 years ago

Run podman as root not rootless.

rhatdan commented 4 years ago

BTW, this works fine for me. $ podman run -it --device /dev/fuse:rw docker.io/library/alpine /bin/sh -c "echo 'hello'" hello

When running rootless do you have a /dev/fuse on your host?

$ ls /dev/fuse -l crw-rw-rw-. 1 root root 10, 229 Jun 18 15:50 /dev/fuse

jskov-jyskebank-dk commented 4 years ago

Hm, run as root. I assume you mean something like the OpenShift anyuid SCC?

If I do that, it fails on just the info command - because podman no longer uses the rootless configuration entries?

sh-5.0# id
uid=0(root) gid=0(root) groups=0(root)

sh-5.0# podman --log-level debug info
DEBU[0000] Found deprecated file /usr/share/containers/libpod.conf, please remove. Use /etc/containers/containers.conf to override defaults. 
DEBU[0000] Reading configuration file "/usr/share/containers/libpod.conf" 
DEBU[0000] Ignoring lipod.conf EventsLogger setting "journald". Use containers.conf if you want to change this setting and remove libpod.conf files. 
DEBU[0000] Reading configuration file "/usr/share/containers/containers.conf" 
DEBU[0000] Merged system config "/usr/share/containers/containers.conf": &{{[] [] container-default [] host [CAP_AUDIT_WRITE CAP_CHOWN CAP_DAC_OVERRIDE CAP_FOWNER CAP_FSETID CAP_KILL CAP_MKNOD CAP_NET_BIND_SERVICE CAP_NET_RAW CAP_SETFCAP CAP_SETGID CAP_SETPCAP CAP_SETUID CAP_SYS_CHROOT] [] [nproc=1048576:1048576]  [] [] [] false [PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] false false false  private k8s-file -1 bridge false 2048 private /usr/share/containers/seccomp.json 65536k private host 65536} {false systemd [PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] [/usr/libexec/podman/conmon /usr/local/libexec/podman/conmon /usr/local/lib/podman/conmon /usr/bin/conmon /usr/sbin/conmon /usr/local/bin/conmon /usr/local/sbin/conmon /run/current-system/sw/bin/conmon] ctrl-p,ctrl-q true /var/run/libpod/events/events.log file [/usr/share/containers/oci/hooks.d] docker:// /pause k8s.gcr.io/pause:3.2 /usr/libexec/podman/catatonit shm   false 2048 crun map[crun:[/usr/bin/crun /usr/sbin/crun /usr/local/bin/crun /usr/local/sbin/crun /sbin/crun /bin/crun /run/current-system/sw/bin/crun] kata:[/usr/bin/kata-runtime /usr/sbin/kata-runtime /usr/local/bin/kata-runtime /usr/local/sbin/kata-runtime /sbin/kata-runtime /bin/kata-runtime /usr/bin/kata-qemu /usr/bin/kata-fc] kata-fc:[/usr/bin/kata-fc] kata-qemu:[/usr/bin/kata-qemu] kata-runtime:[/usr/bin/kata-runtime] runc:[/usr/bin/runc /usr/sbin/runc /usr/local/bin/runc /usr/local/sbin/runc /sbin/runc /bin/runc /usr/lib/cri-o-runc/sbin/runc /run/current-system/sw/bin/runc]] missing [] [crun runc] [crun] {false false false true true true}  false 3 /var/lib/containers/storage/libpod 10 /var/run/libpod /var/lib/containers/storage/volumes} {[/usr/libexec/cni /usr/lib/cni /usr/local/lib/cni /opt/cni/bin] podman /etc/cni/net.d/}} 
DEBU[0000] Reading configuration file "/etc/containers/containers.conf" 
DEBU[0000] Merged system config "/etc/containers/containers.conf": &{{[] [] container-default [] host [CAP_AUDIT_WRITE CAP_CHOWN CAP_DAC_OVERRIDE CAP_FOWNER CAP_FSETID CAP_KILL CAP_MKNOD CAP_NET_BIND_SERVICE CAP_NET_RAW CAP_SETFCAP CAP_SETGID CAP_SETPCAP CAP_SETUID CAP_SYS_CHROOT] [] [nproc=1048576:1048576]  [] [] [] false [PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] false false false  host k8s-file -1 host false 2048 private /usr/share/containers/seccomp.json 65536k host host 65536} {false cgroupfs [PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] [/usr/libexec/podman/conmon /usr/local/libexec/podman/conmon /usr/local/lib/podman/conmon /usr/bin/conmon /usr/sbin/conmon /usr/local/bin/conmon /usr/local/sbin/conmon /run/current-system/sw/bin/conmon] ctrl-p,ctrl-q true /var/run/libpod/events/events.log file [/usr/share/containers/oci/hooks.d] docker:// /pause k8s.gcr.io/pause:3.2 /usr/libexec/podman/catatonit shm   false 2048 crun map[crun:[/usr/bin/crun /usr/sbin/crun /usr/local/bin/crun /usr/local/sbin/crun /sbin/crun /bin/crun /run/current-system/sw/bin/crun] kata:[/usr/bin/kata-runtime /usr/sbin/kata-runtime /usr/local/bin/kata-runtime /usr/local/sbin/kata-runtime /sbin/kata-runtime /bin/kata-runtime /usr/bin/kata-qemu /usr/bin/kata-fc] kata-fc:[/usr/bin/kata-fc] kata-qemu:[/usr/bin/kata-qemu] kata-runtime:[/usr/bin/kata-runtime] runc:[/usr/bin/runc /usr/sbin/runc /usr/local/bin/runc /usr/local/sbin/runc /sbin/runc /bin/runc /usr/lib/cri-o-runc/sbin/runc /run/current-system/sw/bin/runc]] missing [] [crun runc] [crun] {false false false true true true}  false 3 /var/lib/containers/storage/libpod 10 /var/run/libpod /var/lib/containers/storage/volumes} {[/usr/libexec/cni /usr/lib/cni /usr/local/lib/cni /opt/cni/bin] podman /etc/cni/net.d/}} 
DEBU[0000] Using conmon: "/usr/bin/conmon"              
DEBU[0000] Initializing boltdb state at /var/lib/containers/storage/libpod/bolt_state.db 
DEBU[0000] Using graph driver overlay                   
DEBU[0000] Using graph root /var/lib/containers/storage 
DEBU[0000] Using run root /var/run/containers/storage   
DEBU[0000] Using static dir /var/lib/containers/storage/libpod 
DEBU[0000] Using tmp dir /var/run/libpod                
DEBU[0000] Using volume path /var/lib/containers/storage/volumes 
DEBU[0000] Set libpod namespace to ""                   
DEBU[0000] [graphdriver] trying provided driver "overlay" 
DEBU[0000] overlay: imagestore=/var/lib/shared          
DEBU[0000] overlay: mount_program=/usr/bin/fuse-overlayfs 
ERRO[0000] could not get runtime: mount /var/lib/containers/storage/overlay:/var/lib/containers/storage/overlay, flags: 0x1000: operation not permitted

sh-5.0# ls -l /var/lib/containers/storage/overlay
total 0
drwx------. 2 root root 6 Jun 23 06:50 l

I see in https://github.com/containers/buildah/issues/867 that you suggest fixing this by mapping in some other drive at /var/lib/containers.

But when running as rootless, this was not a problem, and it was using the same drive (at another path, /home/.local/share/containers/storage).

The mount map looks like this:

$ df
Filesystem                           1K-blocks     Used Available Use% Mounted on
overlay                              125277164 60691652  64585512  49% /
tmpfs                                    65536        0     65536   0% /dev
tmpfs                                 16468192        0  16468192   0% /sys/fs/cgroup
shm                                      65536        0     65536   0% /dev/shm
tmpfs                                 16468192     9660  16458532   1% /etc/hostname
devtmpfs                              16430336        0  16430336   0% /dev/fuse
/dev/mapper/coreos-luks-root-nocrypt 125277164 60691652  64585512  49% /etc/hosts
tmpfs                                 16468192       24  16468168   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs                                 16468192        0  16468192   0% /proc/acpi
tmpfs                                 16468192        0  16468192   0% /proc/scsi
tmpfs                                 16468192        0  16468192   0% /sys/firmware

We only have NFS-based PVCs on the platform, and I have found earlier that overlay fails on NFS. Workaround then was to use a /tmp folder from the running image (similar to using /home/.local I guess).

jskov-jyskebank-dk commented 4 years ago

Oh, the warning in the output shows it reads another config file when running as root. I deleted the file, but the only apparent difference is that the warning is removed.

jskov-jyskebank-dk commented 4 years ago

I have tried mounting a folder at /dev/xx/storage (tmpfs) into the /var/lib/containers/storage path, and it makes no apparent difference.

I am out of ideas of tweaking stuff now :(

jskov-jyskebank-dk commented 4 years ago

Your question about /dev/fuse as rootless:

sh-5.0$ ls -lZ /dev/fuse
crw-rw-rw-. 1 root root system_u:object_r:fuse_device_t:s0 10, 229 Jun 22 11:11 /dev/fuse

Same as when I run as root.

vrothberg commented 4 years ago

Regarding missing UIDs and GIDs in the user namespace. You can set ignore_chown_errors=true in the storage.conf (see https://github.com/containers/storage/blob/master/docs/containers-storage.conf.5.md#storage-options-for-overlay-table). This will squash all UIDs and GIDs to the ones available in the user namespace.

Regarding fuse: although we need to find a way to address the issue, a quick workaround could be using the VFS storage driver. That will have some considerable performance impacts though.

jskov-jyskebank-dk commented 4 years ago

I will give VFS a shot.

It would be nice with image building on OpenShift that is secure. Performance is not a super critical parameter right now (we are still bringing up the platform). Of course, I do not know how slow it is yet :)

The alternative we had discussed was building images on a dedicated box. But we would obviously prefer to keep all workloads on the OpenShift platform.

Thanks!

jskov-jyskebank-dk commented 4 years ago

It also fails with VFS. The error is something I can only find in https://github.com/containers/libpod/issues/4079 which is ironically worked around by switching to fuse :)

$ id
uid=1000590000(builder) gid=0(root) groups=0(root),1000590000

$ podman --storage-driver=vfs version                                                                     
Version:            1.9.1
RemoteAPI Version:  1
Go Version:         go1.14.2
OS/Arch:            linux/amd64

$ podman --storage-driver=vfs --log-level debug run -it docker.io/library/alpine /bin/sh -c "echo 'hello'"
...
WARN[0000] Error initializing configured OCI runtime kata: no valid executable found for OCI runtime kata: invalid argument 
DEBU[0000] parsed reference into "[vfs@/home/.local/share/containers/storage+/tmp/run-1000590000/containers]docker.io/library/alpine:latest" 
DEBU[0000] parsed reference into "[vfs@/home/.local/share/containers/storage+/tmp/run-1000590000/containers]@a24bb4013296f61e89ba57005a7b3e52274d8edd3ae2077d04395f806b63d83e" 
DEBU[0000] [graphdriver] trying provided driver "vfs"   
DEBU[0000] exporting opaque data as blob "sha256:a24bb4013296f61e89ba57005a7b3e52274d8edd3ae2077d04395f806b63d83e" 
DEBU[0000] Using host netmode                           
DEBU[0000] Loading seccomp profile from "/usr/share/containers/seccomp.json" 
DEBU[0000] created OCI spec and options for new container 
DEBU[0000] Allocated lock 1 for container fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911 
DEBU[0000] parsed reference into "[vfs@/home/.local/share/containers/storage+/tmp/run-1000590000/containers]@a24bb4013296f61e89ba57005a7b3e52274d8edd3ae2077d04395f806b63d83e" 
DEBU[0000] exporting opaque data as blob "sha256:a24bb4013296f61e89ba57005a7b3e52274d8edd3ae2077d04395f806b63d83e" 
DEBU[0000] created container "fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911" 
DEBU[0000] container "fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911" has work directory "/home/.local/share/containers/storage/vfs-containers/fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911/userdata" 
DEBU[0000] container "fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911" has run directory "/tmp/run-1000590000/containers/vfs-containers/fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911/userdata" 
DEBU[0000] New container created "fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911" 
DEBU[0000] container "fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911" has CgroupParent "/libpod_parent/libpod-fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911" 
DEBU[0000] Handling terminal attach                     
DEBU[0000] mounted container "fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911" at "/home/.local/share/containers/storage/vfs/dir/3d651c0bc695dbdbac73b64a34431110b4a0eb2f465bd42330744a5a534c35b8" 
DEBU[0000] Created root filesystem for container fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911 at /home/.local/share/containers/storage/vfs/dir/3d651c0bc695dbdbac73b64a34431110b4a0eb2f465bd42330744a5a534c35b8 
DEBU[0000] /etc/system-fips does not exist on host, not mounting FIPS mode secret 
DEBU[0000] reading hooks from /usr/share/containers/oci/hooks.d 
DEBU[0000] Created OCI spec for container fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911 at /home/.local/share/containers/storage/vfs-containers/fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911/userdata/config.json 
DEBU[0000] /usr/bin/conmon messages will be logged to syslog 
DEBU[0000] running conmon: /usr/bin/conmon               args="[--api-version 1 -c fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911 -u fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911 -r /usr/bin/crun -b /home/.local/share/containers/storage/vfs-containers/fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911/userdata -p /tmp/run-1000590000/containers/vfs-containers/fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911/userdata/pidfile -l k8s-file:/home/.local/share/containers/storage/vfs-containers/fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911/userdata/ctr.log --exit-dir /tmp/run-1000590000/libpod/tmp/exits --socket-dir-path /tmp/run-1000590000/libpod/tmp/socket --log-level debug --syslog -t --conmon-pidfile /tmp/run-1000590000/containers/vfs-containers/fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /home/.local/share/containers/storage --exit-command-arg --runroot --exit-command-arg /tmp/run-1000590000/containers --exit-command-arg --log-level --exit-command-arg debug --exit-command-arg --cgroup-manager --exit-command-arg cgroupfs --exit-command-arg --tmpdir --exit-command-arg /tmp/run-1000590000/libpod/tmp --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg vfs --exit-command-arg --events-backend --exit-command-arg file --exit-command-arg container --exit-command-arg cleanup --exit-command-arg fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911]"
WARN[0000] Failed to add conmon to cgroupfs sandbox cgroup: error creating cgroup for cpu: mkdir /sys/fs/cgroup/cpu/libpod_parent: read-only file system 
DEBU[0000] Received: -1                                 
DEBU[0000] Cleaning up container fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911 
DEBU[0000] Network is already cleaned up, skipping...   
DEBU[0000] unmounted container "fea970dd476588d3b2fa34673674edcc00916da44662caa4f750307a201c1911" 
DEBU[0000] ExitCode msg: "mount `proc` to '/home/.local/share/containers/storage/vfs/dir/3d651c0bc695dbdbac73b64a34431110b4a0eb2f465bd42330744a5a534c35b8/proc': permission denied: oci runtime permission denied error" 
ERRO[0000] mount `proc` to '/home/.local/share/containers/storage/vfs/dir/3d651c0bc695dbdbac73b64a34431110b4a0eb2f465bd42330744a5a534c35b8/proc': Permission denied: OCI runtime permission denied error 

(full log in https://github.com/jskovjyskebankdk/openshift-podman/blob/master/README.md)

danielkucera commented 4 years ago

@jskovjyskebankdk how did you manage to make /dev/fuse work inside of the container? I am getting:

sh-5.0# ls -lah /dev/fuse 
crw-rw-rw-. 1 root root 10, 229 Jun 13 08:45 /dev/fuse
sh-5.0# id
uid=0(root) gid=0(root) groups=0(root)
sh-5.0# cat /dev/fuse 
cat: /dev/fuse: Operation not permitted

and:

+ buildah bud --build-arg TAG=7.7 -t test:latest .
STEP 1: FROM registry/rhel7:7.7
Getting image source signatures
Copying blob sha256:32be9843afa050552a66345576a59497ba7c81c272aa895d67e6e349841714da
Copying blob sha256:1f1202c893ce2775c72b2a3f42ac33b25231d16ca978244bb0c6d1453dc1f39e
Copying config sha256:6682529ce3faf028687cef4fc6ffb30f51a1eb805b3709d31cb92a54caeb3daf
Writing manifest to image destination
Storing signatures
level=error msg="error unmounting /var/lib/containers/storage/overlay/20eb3e378dd5186b133651b1b23a60d3b3eab611ac1283294a342c1c5a905e42/merged: invalid argument"
error mounting new container: error mounting build container "69f52236781b491f49e742644a8af63e237cad149f32e532ac2530b14720f429": error creating overlay mount to /var/lib/containers/storage/overlay/20eb3e378dd5186b133651b1b23a60d3b3eab611ac1283294a342c1c5a905e42/merged: using mount program /usr/bin/fuse-overlayfs: fuse: failed to open /dev/fuse: Operation not permitted
fuse-overlayfs: cannot mount: Operation not permitted
jskov-jyskebank-dk commented 4 years ago

@danielkucera I am not sure cat /dev/fuse is valid. I get that error on a workstation as root.

I only saw it working/failing in context of podman execution.

And podman was only happy when I ran in a container with privileged: true

So I think you see the same problem as I do; it appears /dev/fuse cannot be used in a non-privileged container.

(or that is my theory - hopefully Daniel has something to add)

rhatdan commented 4 years ago

Ok I am trying to run a buildah container within a privileged container non root and I am failing. I am not sure if you start inside of a user namespace that this is possible

$ podman run --privileged --device=/dev/fuse -ti quay.io/buildah/testing sh
#

At this point I modified the /etc/subuild and /etc/subgid inside of the container to use different UID mappings, since my host account was only able to use 65k UIDS.

# cat > /etc/subuid << _EOF
build:10000:2000
_EOF
# cat > /etc/subgid << _EOF
build:10000:2000
_EOF

Now I will switch to the build user and attempt to pull an image

# su - build
$ buildah from alpine
Getting image source signatures
Copying blob df20fa9351a1 done  
Copying config a24bb40132 done  
Writing manifest to image destination
Storing signatures
alpine-working-container

So far so good, this means this used fuse-overlayfs to pull to a user namespace. Now I will attempt to run a container on it. Using --isolation=chroot

$ buildah run alpine-working-container ls /
2020-06-23T13:03:08.000474825Z: executable file not found in $PATH: No such file or directory
error running container: error creating container for [ls /]: : exit status 1

I can enter the user namespace and make sure everything is set, then mount the image, and attempt to mount the proc file system.

$ buildah unshare
# buildah mount alpine-working-container
/home/build/.local/share/containers/storage/overlay/ea45d5325b23dcff9349d334600e347521bb9ab196981534f2490e2a905575a5/merged
# mount -t proc none /home/build/.local/share/containers/storage/overlay/ea45d5325b23dcff9349d334600e347521bb9ab196981534f2490e2a905575a5/merged/proc
mount: /home/build/.local/share/containers/storage/overlay/ea45d5325b23dcff9349d334600e347521bb9ab196981534f2490e2a905575a5/merged/proc: permission denied.

I do not know what is causing permission denied here. Basically I am blocked from mounting a proc file system from inside of the user namespace.

rhatdan commented 4 years ago

@rhvgoyal Any ideas? @giuseppe ?

danielkucera commented 4 years ago

I am unable to even open the file when not privileged: privileged:true :

sh-5.0# exec 3<> /dev/fuse 
sh-5.0# ls -lah /proc/self/fd
total 0
dr-x------. 2 root root  0 Jun 23 13:27 .
dr-xr-xr-x. 9 root root  0 Jun 23 13:27 ..
lrwx------. 1 root root 64 Jun 23 13:27 0 -> /dev/pts/1
lrwx------. 1 root root 64 Jun 23 13:27 1 -> /dev/pts/1
lrwx------. 1 root root 64 Jun 23 13:27 2 -> /dev/pts/1
lrwx------. 1 root root 64 Jun 23 13:27 3 -> /dev/fuse
lr-x------. 1 root root 64 Jun 23 13:27 4 -> /proc/809/fd

privileged: false :

sh-5.0# exec 3<> /dev/fuse 
sh: /dev/fuse: Operation not permitted
danielkucera commented 4 years ago

@jskovjyskebankdk I was trying to find out how builds triggered by BuildConfig are executed and it turns out that they run a pod with serviceaccount builder and security context, guess what....

      securityContext:
        privileged: true

So I presume there is no way how to avoid this when even OpenShift native build mechanism runs as privileged....

rhatdan commented 4 years ago

We are under full discussion of this right now. We see two choices, one involves a container running with CAP_SETUID, CAP_SETGID, CAP_SYS_CHROOT and we can get that to run buildah inside of a user namespace as a non root user.

Second choice is to get OpenShift/Kubernetes/CRI-O to launch the builder container inside of a user namespace which would be fully locked down from a UID point of view, but might have issues dealing with volumes and secrets, since kubernetes would have to set any content it creates for the container to match the "root" of the container.

giuseppe commented 4 years ago

@rhvgoyal Any ideas? @giuseppe ?

I think you'll first need to create a new pid namespace

jskov-jyskebank-dk commented 4 years ago

I am happy with the CAP_SYS_CHROOT route (easiest for me to work with, and I will need volumes).

But I fail to make anything work as suggested. I probably need a little more specific guidance :)

What I have done is run the podman-stable image (installing buildah) with SCC anyuid, and it shows CAP_SYS_CHROOT:

sh-5.0# echo podman:10000:65536 > /etc/subuid
sh-5.0# echo podman:10000:65536 > /etc/subgid
sh-5.0# su - podman

[podman@podman-8-c5qp7 ~]$ buildah version
Version:         1.14.9
Go Version:      go1.14.2
Image Spec:      1.0.1-dev
Runtime Spec:    1.0.1-dev
CNI Spec:        0.4.0
libcni Version:  
image Version:   5.4.3
Git Commit:      
Built:           Thu Jan  1 00:00:00 1970
OS/Arch:         linux/amd64

[podman@podman-8-rmtgx ~]$ capsh --print
Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot+i
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot
Ambient set =
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
 secure-no-ambient-raise: no (unlocked)
uid=1000(podman)
gid=1000(podman)
groups=1000(podman)

[podman@podman-8-c5qp7 ~]$ buildah --log-level debug run --isolation chroot alpine-working-container ls /
DEBU running [buildah-in-a-user-namespace --log-level debug run --isolation chroot alpine-working-container ls /] with environment [SHELL=/bin/bash HISTCONTROL=ignoredups HISTSIZE=1000 HOSTNAME= PWD=/home/podman LOGNAME=podman HOME=/home/pod
man LANG=C.UTF-8 LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31
:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz
=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=0
1;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.
pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:
*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.m4a=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.
mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.oga=01;36:*.opus=01;36:*.spx=01;36:*.xspf=01;36: BUILDAH_ISOLATION=chroot TERM=xterm USER=podman SHLVL=1 PATH=/home/podman/.local/bin:/home/podman/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/us
r/sbin MAIL=/var/spool/mail/podman _=/usr/bin/buildah TMPDIR=/var/tmp _CONTAINERS_USERNS_CONFIGURED=1], UID map [{ContainerID:0 HostID:1000 Size:1} {ContainerID:1 HostID:10000 Size:65536}], and GID map [{ContainerID:0 HostID:1000 Size:1} {Co
ntainerID:1 HostID:10000 Size:65536}] 
DEBU [graphdriver] trying provided driver "overlay" 
DEBU overlay: mount_program=/usr/bin/fuse-overlayfs 
DEBU backingFs=overlayfs, projectQuotaSupported=false, useNativeDiff=false, usingMetacopy=false 
DEBU using "/var/tmp/buildah994730804" to hold bundle data 
DEBU Resources: &buildah.CommonBuildOptions{AddHost:[]string{}, CgroupParent:"", CPUPeriod:0x0, CPUQuota:0, CPUShares:0x0, CPUSetCPUs:"", CPUSetMems:"", HTTPProxy:true, Memory:0, DNSSearch:[]string{}, DNSServers:[]string{}, DNSOptions:[]stri
ng{}, MemorySwap:0, LabelOpts:[]string(nil), SeccompProfilePath:"/usr/share/containers/seccomp.json", ApparmorProfile:"", ShmSize:"65536k", Ulimit:[]string{"nproc=1048576:1048576"}, Volumes:[]string{}} 
DEBU overlay: mount_data=lowerdir=/home/podman/.local/share/containers/storage/overlay/l/VUBMQEZB7D4VJWLROCODAIR24F,upperdir=/home/podman/.local/share/containers/storage/overlay/1ca1feda8f3a76261656185490fb0faeb6a192fa8a04ac9a4e12ef0082e0ec2
8/diff,workdir=/home/podman/.local/share/containers/storage/overlay/1ca1feda8f3a76261656185490fb0faeb6a192fa8a04ac9a4e12ef0082e0ec28/work 
ERRO error unmounting /home/podman/.local/share/containers/storage/overlay/1ca1feda8f3a76261656185490fb0faeb6a192fa8a04ac9a4e12ef0082e0ec28/merged: invalid argument 
DEBU error running [ls /] in container "alpine-working-container": error mounting container "8563a43b0e4254fa3003b5fafc79c0f6371ca7bc89f3ffb8d61bcb314d80d05b": error mounting build container "8563a43b0e4254fa3003b5fafc79c0f6371ca7bc89f3ffb8d
61bcb314d80d05b": error creating overlay mount to /home/podman/.local/share/containers/storage/overlay/1ca1feda8f3a76261656185490fb0faeb6a192fa8a04ac9a4e12ef0082e0ec28/merged: using mount program /usr/bin/fuse-overlayfs: fuse: failed to open
 /dev/fuse: Operation not permitted
fuse-overlayfs: cannot mount: Operation not permitted
: exit status 1 
error mounting container "8563a43b0e4254fa3003b5fafc79c0f6371ca7bc89f3ffb8d61bcb314d80d05b": error mounting build container "8563a43b0e4254fa3003b5fafc79c0f6371ca7bc89f3ffb8d61bcb314d80d05b": error creating overlay mount to /home/podman/.loc
al/share/containers/storage/overlay/1ca1feda8f3a76261656185490fb0faeb6a192fa8a04ac9a4e12ef0082e0ec28/merged: using mount program /usr/bin/fuse-overlayfs: fuse: failed to open /dev/fuse: Operation not permitted
fuse-overlayfs: cannot mount: Operation not permitted
: exit status 1
ERRO exit status 1 

[podman@podman-8-c5qp7 ~]$ ls -lZ /dev/fuse
crw-rw-rw-. 1 root root system_u:object_r:fuse_device_t:s0 10, 229 Jun 22 11:03 /dev/fuse

Seems to be the same problem as with podman, so I am probably missing some (hopefully not too obvious) magic.

danielkucera commented 4 years ago

This is the current minimal configuration working for me:

      securityContext:
        privileged: false
        runAsUser: 0

command:

buildah --storage-driver vfs bud --isolation chroot -t test:latest .
jskov-jyskebank-dk commented 4 years ago

Yes! Obviously I should not have used fuse but tried VFS again.

It does indeed work!

Thank you all!

I will make a PR with an OpenShift-specific howto for how to set it up.

rhatdan commented 4 years ago

We believe that you should be able to specify the /dev/fuse device in kubernetes but currently the kublet is only passing in block devices. @nalind is looking into a fix.

jskov-jyskebank-dk commented 4 years ago

I have written this: https://github.com/jskovjyskebankdk/buildah/blob/rootlessBudOpenShift/docs/tutorials/05-openshift-rootless-bud.md

@ashokponkumar would you have a look?

@rhatdan I have written a tutorial for the buildah project since that is what works right now. When/if it gets to work with podman, I will be happy to provide a similar variant for podman. Does that suit you, or would you prefer it somewhere else/in another form?

rhatdan commented 4 years ago

No that sounds good.

jskov-jyskebank-dk commented 4 years ago

I created https://github.com/containers/buildah/pull/2453

We believe that you should be able to specify the /dev/fuse device in kubernetes but currently the kublet is only passing in block devices. @nalind is looking into a fix.

Do you have an issue for this so that I can track it? I would like to follow this to the end, if possible. So the project can have a simple tutorial that simpletons like myself can follow :)

Thanks!

nalind commented 4 years ago

In a pod spec, one possibility would be to mount a HostPathCharDev volume to mount the device from the node, but those volume devices don't get added to the container's device cgroup. The runtime will add all devices to the container's device cgroup if the container is privileged, but that's what we're trying to avoid requiring here,

https://github.com/kubernetes/kubernetes/pull/79925 attempts to modify the kubelet to add devices to the device cgroup for not-privileged containers. I've not yet personally verified that we can't also get the desired results using one of the other options that were suggested there.

jskov-jyskebank-dk commented 4 years ago

OK, thanks @nalind

github-actions[bot] commented 4 years ago

A friendly reminder that this issue had no activity for 30 days.

jskov-jyskebank-dk commented 4 years ago

I can make builds now (using Buildah).

But we only have NFS-based storage which makes it really hard to get performant builds (because VFS cannot use NFS).

So I am still very much hoping that there will be some way in the future to make use of /dev/fuse.

elgalu commented 3 years ago

@jskovjyskebankdk did you try https://github.com/flavio/fuse-device-plugin ?

jskov-jyskebank-dk commented 3 years ago

@elgalu I was not aware of that project, no.

And I do not think I will test it - I am a little concerned about its age, and it not being mentioned as a device plugin in the official documentation.

But thanks for the reference.

jskov-jyskebank-dk commented 3 years ago

By the way; we now use Podman to make builds in non-privileged containers.

Still using NFS backend and VFS, so performance is not great.

But it does work (same instructions as for Buildah, https://github.com/containers/buildah/blob/master/docs/tutorials/05-openshift-rootless-bud.md)

So I will close this issue.