containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.75k stars 2.42k forks source link

Support running podman containers inside unprivileged (docker) container #4131

Closed johanbrandhorst closed 3 years ago

johanbrandhorst commented 5 years ago

/kind feature

Description

Very similar to https://github.com/containers/libpod/issues/4056 but with the exception that the host container is an unprivileged (docker) container.

The specific use case is being able to programmatically create and destroy containers while running inside an unprivileged container, for automated tests in CI environments such as CircleCI and Github actions.

The comments by @mheon (https://github.com/containers/libpod/issues/4056#issuecomment-535511841) imply this is currently impossible, and may never be possible, but I'd like to explore in more detail the feasibility of this separately from that issue.

The stack overflow discussion https://stackoverflow.com/q/56032747 seems to touch on the same problem, and unfortunately come to the same conclusion, that --privileged is required at this time, which makes it impossible to use in CircleCI and Github Actions.

Steps to reproduce the issue:

  1. Start an unprivileged container (with docker or podman)
  2. Install podman inside the container
  3. Try to run another container using podman inside the first container.

Describe the results you received:

At the moment the error I'm getting looks like this:

# podman run --rm -it ubuntu
ERRO[0000] unable to write system event: "write unixgram @00045->/run/systemd/journal/socket: sendmsg: no such file or directory" 
ERRO[0000] unable to write pod event: "write unixgram @00045->/run/systemd/journal/socket: sendmsg: no such file or directory" 
ERRO[0000] error creating network namespace for container fc189c2fb049f6d0955773f86245d7394e0a35181ca97c23782e4b17f8f66fba: mount --make-rshared /var/run/netns failed: "operation not permitted" 
ERRO[0000] unable to write pod event: "write unixgram @00045->/run/systemd/journal/socket: sendmsg: no such file or directory" 
Error: failed to mount shm tmpfs "/home/REDACTED/.local/share/containers/storage/vfs-containers/fc189c2fb049f6d0955773f86245d7394e0a35181ca97c23782e4b17f8f66fba/userdata/shm": operation not permitted

Describe the results you expected:

I expected to be able to run a container inside a container.

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

$ podman version
Version:            1.5.1
RemoteAPI Version:  1
Go Version:         go1.12.8
OS/Arch:            linux/amd64

Output of podman info --debug:

$ podman info --debug
debug:
  compiler: gc
  git commit: ""
  go version: go1.12.8
  podman version: 1.5.1
host:
  BuildahVersion: 1.10.1
  Conmon:
    package: Unknown
    path: /usr/bin/conmon
    version: 'conmon version 2.0.0, commit: e217fdff82e0b1a6184a28c43043a4065083407f'
  Distribution:
    distribution: manjaro
    version: unknown
  MemFree: 198766592
  MemTotal: 16569856000
  OCIRuntime:
    package: Unknown
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc8
      commit: 425e105d5a03fabd737a126ad93d62a9eeede87f
      spec: 1.0.1-dev
  SwapFree: 18179530752
  SwapTotal: 18223570944
  arch: amd64
  cpus: 8
  eventlogger: file
  hostname: REDACTED-x1
  kernel: 4.19.69-1-MANJARO
  os: linux
  rootless: true
  uptime: 22h 29m 13.05s (Approximately 0.92 days)
registries:
  blocked: null
  insecure: null
  search:
  - docker.io
  - registry.fedoraproject.org
  - quay.io
  - registry.access.redhat.com
  - registry.centos.org
store:
  ConfigFile: /home/REDACTED/.config/containers/storage.conf
  ContainerStore:
    number: 1
  GraphDriverName: vfs
  GraphOptions: null
  GraphRoot: /home/REDACTED/.local/share/containers/storage
  GraphStatus: {}
  ImageStore:
    number: 1
  RunRoot: /run/user/1000
  VolumePath: /home/REDACTED/.local/share/containers/storage/volumes

Additional environment details (AWS, VirtualBox, physical, etc.):

Running on bare metal (laptop)

AkihiroSuda commented 5 years ago

This is possible with UML but extremely slow

https://github.com/weber-software/diuid

johanbrandhorst commented 5 years ago

Thanks for the link @AkihiroSuda, however it doesn't seem like it supports running containers, in my testing.

$ podman run -it --rm  --cap-add=SYS_PTRACE weberlars/diuid docker run --rm -it ubuntu
Docker: Docker version 18.09.7, build 2d0083d
Kernel: 5.2.0
Rootfs: Debian GNU/Linux 9.9 (stretch)

Configuration: MEM=2G DISK=10G
[ ok ] Starting OpenBSD Secure Shell server: sshd.
Formatting /persistent/var_lib_docker.img
For better performance, consider mounting a tmpfs on /umlshm like this: `docker run --tmpfs /umlshm:rw,nosuid,nodev,exec,size=8g`
waiting for dockerd ...
$

I'm not sure if I'm doing something wrong, or maybe it doesn't support running inside podman (though there shouldn't be any difference, right?). If it's only for building docker images, it's not nearly as interesting to me.

AkihiroSuda commented 5 years ago

UML should be able to run containers. (both podman-in-docker and docker-in-podman)

Something seems wrong with either Podman or sysctl.

johanbrandhorst commented 5 years ago

Ok thanks I will debug it a bit more!

rhatdan commented 5 years ago

@giuseppe Would this be possible with rootless containers running with fuse-overlay? We would need setuid and setgid to handle setting up a namespace.

We could try this out with podman in podman.

I think the issue with running podman in Docker is the tighter seccomp controls. Docker seccomp.json mistakenly blocks all mount syscalls, even though non privileged mount syscall is allowed for procfs, tmpfs, bind, fuse and sysfs, I believe.

mheon commented 5 years ago

I think UID/GID mapping will also be an issue.

You'll either need the storage flag to ignore chown errors, or a separate newuidmap/newgidmap setup within the container - and I suspect you won't have the privileges to run them in a rootless container.

c-goes commented 5 years ago

podman in LXD seems to work fine (unprivileged LXD container created with -c security.nesting=true, same option as for running Docker in LXD). Only problem I have is creating rootless containers. There is an undescriptive error with slirp4netns.

Here is a debug log for rootless if it is any help for developing this feature.

$ podman run -it --log-level=debug k8s.gcr.io/busybox sh
DEBU[0000] using conmon: "/usr/libexec/podman/conmon"   
DEBU[0000] Initializing boltdb state at /home/ubuntu/.local/share/containers/storage/libpod/bolt_state.db 
DEBU[0000] Using graph driver overlay                   
DEBU[0000] Using graph root /home/ubuntu/.local/share/containers/storage 
DEBU[0000] Using run root /run/user/1000                
DEBU[0000] Using static dir /home/ubuntu/.local/share/containers/storage/libpod 
DEBU[0000] Using tmp dir /run/user/1000/libpod/tmp      
DEBU[0000] Using volume path /home/ubuntu/.local/share/containers/storage/volumes 
DEBU[0000] Set libpod namespace to ""                   
DEBU[0000] [graphdriver] trying provided driver "overlay" 
DEBU[0000] overlay: mount_program=/usr/bin/fuse-overlayfs 
DEBU[0000] backingFs=zfs, projectQuotaSupported=false, useNativeDiff=false, usingMetacopy=false 
DEBU[0000] Initializing event backend journald          
DEBU[0000] using runtime "/usr/lib/cri-o-runc/sbin/runc" 
WARN[0000] Error initializing configured OCI runtime crun: no valid executable found for OCI runtime crun: invalid argument 
DEBU[0000] Failed to add podman to systemd sandbox cgroup: dial unix /run/user/0/bus: connect: permission denied 
INFO[0000] running as rootless                          
DEBU[0000] using conmon: "/usr/libexec/podman/conmon"   
DEBU[0000] Initializing boltdb state at /home/ubuntu/.local/share/containers/storage/libpod/bolt_state.db 
DEBU[0000] Using graph driver overlay                   
DEBU[0000] Using graph root /home/ubuntu/.local/share/containers/storage 
DEBU[0000] Using run root /run/user/1000                
DEBU[0000] Using static dir /home/ubuntu/.local/share/containers/storage/libpod 
DEBU[0000] Using tmp dir /run/user/1000/libpod/tmp      
DEBU[0000] Using volume path /home/ubuntu/.local/share/containers/storage/volumes 
DEBU[0000] Set libpod namespace to ""                   
DEBU[0000] Initializing event backend journald          
WARN[0000] Error initializing configured OCI runtime crun: no valid executable found for OCI runtime crun: invalid argument 
DEBU[0000] using runtime "/usr/lib/cri-o-runc/sbin/runc" 
DEBU[0000] parsed reference into "[overlay@/home/ubuntu/.local/share/containers/storage+/run/user/1000:overlay.mount_program=/usr/bin/fuse-overlayfs]k8s.gcr.io/busybox:latest" 
DEBU[0000] parsed reference into "[overlay@/home/ubuntu/.local/share/containers/storage+/run/user/1000:overlay.mount_program=/usr/bin/fuse-overlayfs]@e7d168d7db455c45f4d0315d89dbd18806df4784f803c3cc99f8a2e250585b5b" 
DEBU[0000] [graphdriver] trying provided driver "overlay" 
DEBU[0000] overlay: mount_program=/usr/bin/fuse-overlayfs 
DEBU[0000] backingFs=zfs, projectQuotaSupported=false, useNativeDiff=false, usingMetacopy=false 
DEBU[0000] parsed reference into "[overlay@/home/ubuntu/.local/share/containers/storage+/run/user/1000:overlay.mount_program=/usr/bin/fuse-overlayfs]@e7d168d7db455c45f4d0315d89dbd18806df4784f803c3cc99f8a2e250585b5b" 
DEBU[0000] parsed reference into "[overlay@/home/ubuntu/.local/share/containers/storage+/run/user/1000:overlay.mount_program=/usr/bin/fuse-overlayfs]@e7d168d7db455c45f4d0315d89dbd18806df4784f803c3cc99f8a2e250585b5b" 
DEBU[0000] No hostname set; container's hostname will default to runtime default 
DEBU[0000] Using slirp4netns netmode                    
DEBU[0000] created OCI spec and options for new container 
DEBU[0000] Allocated lock 6 for container 14bd288c9d7b8499371a3ee93f09f4cdcc57a77ff69bb3d4ddbe673d1bbeca85 
DEBU[0000] parsed reference into "[overlay@/home/ubuntu/.local/share/containers/storage+/run/user/1000:overlay.mount_program=/usr/bin/fuse-overlayfs]@e7d168d7db455c45f4d0315d89dbd18806df4784f803c3cc99f8a2e250585b5b" 
DEBU[0000] created container "14bd288c9d7b8499371a3ee93f09f4cdcc57a77ff69bb3d4ddbe673d1bbeca85" 
DEBU[0000] container "14bd288c9d7b8499371a3ee93f09f4cdcc57a77ff69bb3d4ddbe673d1bbeca85" has work directory "/home/ubuntu/.local/share/containers/storage/overlay-containers/14bd288c9d7b8499371a3ee93f09f4cdcc57a77ff69bb3d4ddbe673d1bbeca85/userdata" 
DEBU[0000] container "14bd288c9d7b8499371a3ee93f09f4cdcc57a77ff69bb3d4ddbe673d1bbeca85" has run directory "/run/user/1000/overlay-containers/14bd288c9d7b8499371a3ee93f09f4cdcc57a77ff69bb3d4ddbe673d1bbeca85/userdata" 
DEBU[0000] New container created "14bd288c9d7b8499371a3ee93f09f4cdcc57a77ff69bb3d4ddbe673d1bbeca85" 
DEBU[0000] container "14bd288c9d7b8499371a3ee93f09f4cdcc57a77ff69bb3d4ddbe673d1bbeca85" has CgroupParent "/libpod_parent/libpod-14bd288c9d7b8499371a3ee93f09f4cdcc57a77ff69bb3d4ddbe673d1bbeca85" 
DEBU[0000] Handling terminal attach                     
DEBU[0000] overlay: mount_data=lowerdir=/home/ubuntu/.local/share/containers/storage/overlay/l/S7XHB2WPYOR4RJSE7R7ISPTZXN:/home/ubuntu/.local/share/containers/storage/overlay/l/PZAPRTUCJVM3BFACSLRF3JFFJP:/home/ubuntu/.local/share/containers/storage/overlay/l/WIAN5K2O3J3IKJYKTA2UXGK5VP:/home/ubuntu/.local/share/containers/storage/overlay/l/Y6P37THEEBEEBA2JBV7N2B53KL,upperdir=/home/ubuntu/.local/share/containers/storage/overlay/cd035583578cbe64d5824a90217285db106a083087efff53d5b6e3aa2db983fe/diff,workdir=/home/ubuntu/.local/share/containers/storage/overlay/cd035583578cbe64d5824a90217285db106a083087efff53d5b6e3aa2db983fe/work 
DEBU[0000] mounted container "14bd288c9d7b8499371a3ee93f09f4cdcc57a77ff69bb3d4ddbe673d1bbeca85" at "/home/ubuntu/.local/share/containers/storage/overlay/cd035583578cbe64d5824a90217285db106a083087efff53d5b6e3aa2db983fe/merged" 
DEBU[0000] Created root filesystem for container 14bd288c9d7b8499371a3ee93f09f4cdcc57a77ff69bb3d4ddbe673d1bbeca85 at /home/ubuntu/.local/share/containers/storage/overlay/cd035583578cbe64d5824a90217285db106a083087efff53d5b6e3aa2db983fe/merged 
DEBU[0000] Made network namespace at /run/user/1000/netns/cni-d09ac9e5-1983-ab8b-e33a-9a8e47e69a9b for container 14bd288c9d7b8499371a3ee93f09f4cdcc57a77ff69bb3d4ddbe673d1bbeca85 
DEBU[0000] slirp4netns command: /usr/bin/slirp4netns --disable-host-loopback --mtu 65520 --enable-sandbox -c -e 3 -r 4 --netns-type=path /run/user/1000/netns/cni-d09ac9e5-1983-ab8b-e33a-9a8e47e69a9b tap0 
DEBU[0001] unmounted container "14bd288c9d7b8499371a3ee93f09f4cdcc57a77ff69bb3d4ddbe673d1bbeca85" 
DEBU[0001] Tearing down network namespace at /run/user/1000/netns/cni-d09ac9e5-1983-ab8b-e33a-9a8e47e69a9b for container 14bd288c9d7b8499371a3ee93f09f4cdcc57a77ff69bb3d4ddbe673d1bbeca85 
DEBU[0001] Cleaning up container 14bd288c9d7b8499371a3ee93f09f4cdcc57a77ff69bb3d4ddbe673d1bbeca85 
DEBU[0001] Network is already cleaned up, skipping...   
DEBU[0001] Container 14bd288c9d7b8499371a3ee93f09f4cdcc57a77ff69bb3d4ddbe673d1bbeca85 storage is already unmounted, skipping... 
DEBU[0001] ExitCode msg: "slirp4netns failed"           
ERRO[0001] slirp4netns failed                           
WARN[0001] unable to find /home/ubuntu/.config/containers/registries.conf. some podman (image shortnames) commands may be limited 
rhatdan commented 5 years ago

@AkihiroSuda @giuseppe Ideas?

giuseppe commented 5 years ago

Only problem I have is creating rootless containers. There is an undescriptive error with slirp4netns.

does the container work fine if you use --net=host?

AkihiroSuda commented 5 years ago

I reproduced the LXD issue.

slirp4netns sandbox doesn't seem to work on LXD with -c security.nesting=true.

slirp4netns --configure --mtu=65520 --disable-host-loopback $(cat /tmp/pid) --enable-sandbox tap0
WARNING: Support for sandboxing is experimental
sent tapfd=5 for tap0
received tapfd=5
Starting slirp
* MTU:             65520
* Network:         10.0.2.0
* Netmask:         255.255.255.0
* Gateway:         10.0.2.2
* DNS:             10.0.2.3
* Recommended IP:  10.0.2.100
cannot mount tmpfs on /tmp
create_sandbox failed
do_slirp is exiting
do_slirp failed
parent failed

does the container work fine if you use --net=host?

yes

AkihiroSuda commented 5 years ago

strace:

write(1, "* Recommended IP:  10.0.2.100\n", 30* Recommended IP:  10.0.2.100
) = 30
geteuid()                               = 1001
openat(AT_FDCWD, "/proc/3122/ns/user", O_RDONLY) = 3
setns(3, CLONE_NEWUSER)                 = 0
close(3)                                = 0
close(-1)                               = -1 EBADF (Bad file descriptor)
setresgid(-1, 0, -1)                    = 0
setresuid(-1, 0, -1)                    = 0
openat(AT_FDCWD, "/dev/urandom", O_RDONLY) = 3
read(3, "\357e]\207\35\203)\36@\253\273m\6\2415j", 16) = 16
close(3)                                = 0
futex(0x7f8c29790f68, FUTEX_WAKE_PRIVATE, 2147483647) = 0
brk(0x56140053e000)                     = 0x56140053e000
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f8c2945ff60}, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=0}, 8) = 0
unshare(CLONE_NEWNS)                    = 0
mount("", "/", 0x5614001c8d77, MS_PRIVATE, NULL) = 0
mount("tmpfs", "/tmp", "tmpfs", MS_NOSUID|MS_NODEV|MS_NOEXEC, "size=1k") = 0
mkdir("/tmp/etc", 0755)                 = 0
mkdir("/tmp/old", 0755)                 = 0
mkdir("/tmp/run", 0755)                 = 0
mount("/etc", "/tmp/etc", 0x5614001c8d77, MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_BIND|MS_REC|MS_SLAVE, NULL) = 0
mount("/etc", "/tmp/etc", 0x5614001c8d77, MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0
mount("/run", "/tmp/run", 0x5614001c8d77, MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_BIND|MS_REC|MS_SLAVE, NULL) = 0
mount("/run", "/tmp/run", 0x5614001c8d77, MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0
chdir("/tmp")                           = 0
pivot_root(".", "old")                  = 0
chdir("/")                              = 0
umount2("/old", MNT_DETACH)             = 0
rmdir("/old")                           = 0
mount("tmpfs", "/", 0x5614001c827f, MS_RDONLY|MS_REMOUNT, "size=0k") = -1 EACCES (Permission denied)
write(2, "cannot mount tmpfs on /tmp\n", 27cannot mount tmpfs on /tmp
) = 27
write(2, "create_sandbox failed\n", 22create_sandbox failed
) = 22
write(2, "do_slirp is exiting\n", 20do_slirp is exiting
)   = 20
brk(0x56140052e000)                     = 0x56140052e000
write(2, "do_slirp failed\n", 16do_slirp failed
)       = 16
close(5)                                = 0
write(2, "parent failed\n", 14parent failed
)         = 14
exit_group(1)                           = ?
+++ exited with 1 +++
github-actions[bot] commented 4 years ago

This issue had no activity for 30 days. In the absence of activity or the "do-not-close" label, the issue will be automatically closed within 7 days.

mheon commented 4 years ago

I believe Dan is working on this. We may need a few capabilities that aren't available in an unprivileged container - SETUID was mentioned

On Sun, Nov 17, 2019, 19:09 github-actions[bot] notifications@github.com wrote:

This issue had no activity for 30 days. In the absence of activity or the "do-not-close" label, the issue will be automatically closed within 7 days.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/containers/libpod/issues/4131?email_source=notifications&email_token=AB3AOCBHCCOIROWEDPGKZM3QUHMKLA5CNFSM4I3D3OMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEI2F2A#issuecomment-554803944, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3AOCBSL5AA4HOQZYPENQDQUHMKLANCNFSM4I3D3OMA .

rhatdan commented 4 years ago

Yes I have a working prototype of this, now, will publish a blog on it shortly. I think we could get some additional support into containers.conf to make this easier to do.

johanbrandhorst commented 4 years ago

Just to be clear, the issue specifically mentions a use case:

The specific use case is being able to programmatically create and destroy containers while running inside an unprivileged container, for automated tests in CI environments such as CircleCI and Github actions.

Is this supported by the prototype?

AkihiroSuda commented 4 years ago

How is this possible? With seccomp=unconfined apparmor=unconfined?

rhatdan commented 4 years ago

Currently have to disable SELinux since by default it blocks a few commands I have a modified seccomp.json file also. BTW I have been sending out updates on podman.io mailing list.

Since I don't use apparmor, I would figure it is similar to SELinux. Main SELinux issues were on mounting file systems.

github-actions[bot] commented 4 years ago

This issue had no activity for 30 days. In the absence of activity or the "do-not-close" label, the issue will be automatically closed within 7 days.

johanbrandhorst commented 4 years ago

Still being worked on

rhatdan commented 4 years ago

It has been back burnered especially until after the break.

github-actions[bot] commented 4 years ago

A friendly reminder that this issue had no activity for 30 days.

rhatdan commented 4 years ago

This is the main PR to get this working. https://github.com/containers/libpod/pull/4698

rhatdan commented 4 years ago

Still waiting to get containers.conf PR merged.

chengkuangan commented 4 years ago

Great one! Testing this in Docker container with Jenkins. Saw this #4698. Waiting for this to be merged. Copying blob sha256:57de4da701b511cba33bbdc424757f7f3b408bea741ca714ace265da9b59191a Writing manifest to image destination Storing signatures time="2020-03-24T09:43:03Z" level=error msg="Error preparing container 12b1b20f183bad5c4215bd53a8dd160e604c62b8c16f566b7f3e99a92ef9619f: error creating network namespace for container 12b1b20f183bad5c4215bd53a8dd160e604c62b8c16f566b7f3e99a92ef9619f: mount --make-rshared /var/run/netns failed: "operation not permitted""

ohumbe commented 4 years ago

At first I was under the impression that #4698 would solve this issue, but I think I was confused. If possible, I'd like to know what steps need to take place in order to get this containerception to work - I'm trying to become more familiar with the internals, and my team does have a use case.

FlorianLudwig commented 4 years ago

@the-humbe

When running inside docker make sure you are using an up to date podman. Beware: quay.io/podman/stable is NOT up to date.

The following should work:

docker run --privileged -ti quay.io/podman/stable sh -c "dnf update -y && podman run --cgroup-manager=cgroupfs --net=host hello-world"

There is a similar discussion over there on https://github.com/containers/libpod/issues/4056 and here https://github.com/containers/buildah/issues/2175 with some more details.

Running unprivileged seems not yet possible https://github.com/containers/libpod/issues/4056#issuecomment-603808938

johanbrandhorst commented 4 years ago

Note that this sort of thing has been possible for a long time, as mentioned in the topic:

The stack overflow discussion https://stackoverflow.com/q/56032747 seems to touch on the same problem, and unfortunately come to the same conclusion, that --privileged is required at this time, which makes it impossible to use in CircleCI and Github Actions.

Any solution that requires --privileged is not a solution to this issue.

rhatdan commented 4 years ago

Working on it. I am working to make this as simple as possiblem IE using containers.conf to embed the correct options into the container image to make it easier.

I am working on the podman in podman issue right now. One problem with podman inside of Docker is that docker has stricter seccomp rules, that are going to force you to disable seccomp.

smekkley commented 4 years ago

I've found a very dirty workaround (kind of).

podman system service --varlink -t 0

and then from a whatever container with podman-remote inside,

podman-remote --username $USER --remote-host 127.0.0.1

This isn't podman in podman and requires ssh server but theoretically it can solves issue like this, where docker mounts docker socket. https://code.visualstudio.com/docs/remote/containers

However, certain tricks like Docker-in-Docker do not work due to limitations in Podman. This affects the Remote-Containers: Try a Sample... and Remote- Containers: Open repository in container... commands.

In case of docker it's a big security issue but with podman it can be done rootless.

ShadowJonathan commented 4 years ago

I'm currently looking out for this feature, any chance that the final "set up" image could be published as something as easily pulled as podman/pind (or when doing from podman: podman/pinp)?

rhatdan commented 4 years ago

I have been sidelined working on the APIV2 support for podman 2.0. I hope to get back to this soon.

rhatdan commented 4 years ago

We have at least one other issue on this as well.

ShadowJonathan commented 4 years ago

https://github.com/containers/libpod/issues/4056, yes, but I think that one has a slightly different scope.

Whereas this issue focuses on making podman working inside a unprivileged docker container, that other issue talks about running podman inside another podman container.

pinp vs pind, so to speak

bryanmacfarlane commented 4 years ago

@ShadowJonathan My scenario is running podman run (after build) inside an unprivileged a K8s container in a pod. Hopefully that's the same set of work to working inside docker container. We're looking at K8s native CI/CD elastic pools.

ShadowJonathan commented 4 years ago

@bryanmacfarlane I'm sorry, can you clarify what you meant? I don't understand what you're trying to tell me

alippai commented 4 years ago

@ShadowJonathan it's similar to what I need. I have k8s without root capabilities and we'd like to run podman in it eg to be used by GitLab runner or VSCode Remote

rhatdan commented 4 years ago

Depends on your definition of "unprivileged". In order to run containers within a container you are going to need at least "CAP_SETUID" and "CAP_SETGID" available to the container engine within the container. This will probably also require real/root to be possible.

ashokponkumar commented 4 years ago

@rhatdan I went through the comments above, but I might not be well versed with all the core underlying requirements. I was able to get it working in docker docker run --privileged -ti quay.io/podman/stable sh -c "dnf update -y && podman run --cgroup-manager=cgroupfs --net=host hello-world"

but in podman I ran into issues when running inside podman.

root@de008df80600:/# podman --storage-driver=vfs info
Error: could not get runtime: database storage graph driver "overlay" does not match our storage graph driver "vfs": database configuration mismatch

My requirement is to run a container image in an openshift 4.3 cluster. Does the current status of podman and related features allow us to do it, even if it would mean running it as privileged?

jskov-jyskebank-dk commented 4 years ago

@ashokponkumar see https://github.com/containers/libpod/issues/6667, buildah running in OpenShift. The specific error you see is because .local/share/containers is already configured for overlay and you now tell it to use vfs.

ashokponkumar commented 4 years ago

Thanks @jskovjyskebankdk. There is lot of useful info in #6667. Will try to follow them. Looking forward to your PR on the openshift documentation. Would really help if you can reference the PR in #6667, once it is ready. Also, would the same steps work in any Kubernetes distribution (I would assume it would depend on the container runtime used)?

jskov-jyskebank-dk commented 4 years ago

I expect to start working on the PR today. With some luck :) I only have access to (interest in, really) OpenShift. So I am not sure.

ashokponkumar commented 4 years ago

I expect to start working on the PR today. With some luck :) I only have access to (interest in, really) OpenShift. So I am not sure.

sure @jskovjyskebankdk . Thanks! I will try out your instructions once it is ready.

bryanmacfarlane commented 4 years ago

I can help trying out in k8s when it's ready ...

jskov-jyskebank-dk commented 4 years ago

@bryanmacfarlane It is pretty OpenShift specific in its use of (at least) DeployConfig. It is here: https://github.com/jskovjyskebankdk/buildah/blob/rootlessBudOpenShift/docs/tutorials/05-openshift-rootless-bud.md

If it brings you closer to running on k8s, I guess we could amend, or clone it for k8s.

pothos commented 4 years ago

According to https://github.com/containers/podman/issues/4056#issuecomment-672905203 this works now without --privileged but only --cap-add SYS_ADMIN --cap-add SYS_RESOURCE (and turning off seccomp and selinux).

johanbrandhorst commented 4 years ago

Is it possible to add these capabilities to a CircleCI/GitHub actions container?

rhatdan commented 4 years ago

With the current quay.io/podman/stable I am able to get podman to run in a locked down podman session, to some extent.

# podman run --user podman --device /dev/fuse quay.io/podman/stable:latest podman version
Version:      2.0.5
API Version:  1
Go Version:   go1.14.6
Built:        Thu Jan  1 00:00:00 1970
OS/Arch:      linux/amd64
# podman run --user podman --device /dev/fuse quay.io/podman/stable:latest podman info
host:
  arch: amd64
  buildahVersion: 1.15.1
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.19-1.fc32.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.19, commit: 5dce9767526ed27f177a8fa3f281889ad509fea7'
  cpus: 8
  distribution:
    distribution: fedora
    version: "32"
  eventLogger: file
  hostname: d8e4d8e46270
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.8.4-200.fc32.x86_64
  linkmode: dynamic
  memFree: 199364608
  memTotal: 16416620544
  ociRuntime:
    name: crun
    package: crun-0.14.1-4.fc32.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.14.1
      commit: 598ea5e192ca12d4f6378217d3ab1415efeddefa
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    path: /tmp/run-1000/podman/podman.sock
  rootless: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.4-1.fc32.x86_64
    version: |-
      slirp4netns version 1.1.4
      commit: b66ffa8e262507e37fca689822d23430f3357fe8
      libslirp: 4.3.1
      SLIRP_CONFIG_VERSION_MAX: 2
  swapFree: 3635671040
  swapTotal: 8296329216
  uptime: 257h 55m 47.91s (Approximately 10.71 days)
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - docker.io
store:
  configFile: /home/podman/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.1.2-1.fc32.x86_64
      Version: |-
        fusermount3 version: 3.9.1
        fuse-overlayfs: version 1.1.0
        FUSE library version 3.9.1
        using FUSE kernel interface version 7.31
  graphRoot: /home/podman/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: overlayfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 0
  runRoot: /tmp/run-1000/containers
  volumePath: /home/podman/.local/share/containers/storage/volumes
version:
  APIVersion: 1
  Built: 0
  BuiltTime: Thu Jan  1 00:00:00 1970
  GitCommit: ""
  GoVersion: go1.14.6
  OsArch: linux/amd64
  Version: 2.0.5

But running a container still fails

# podman run --user podman --device /dev/fuse quay.io/podman/stable:latest podman run alpine ls 
Trying to pull registry.fedoraproject.org/alpine...
  manifest unknown: manifest unknown
Trying to pull registry.access.redhat.com/alpine...
  name unknown: Repo not found
Trying to pull registry.centos.org/alpine...
  manifest unknown: manifest unknown
Trying to pull docker.io/library/alpine...
Getting image source signatures
Copying blob sha256:df20fa9351a15782c64e6dddb2d4a6f50bf6d3688060a34c4014b0d9a752eb4c
Copying config sha256:a24bb4013296f61e89ba57005a7b3e52274d8edd3ae2077d04395f806b63d83e
Writing manifest to image destination
Storing signatures
Error: cannot chown /home/podman/.local/share/containers/storage/overlay/a961612a554f99da2616ae7ace0210f23eaf479b5a356321d1390eae4f523a37/merged to 0:0: chown /home/podman/.local/share/containers/storage/overlay/a961612a554f99da2616ae7ace0210f23eaf479b5a356321d1390eae4f523a37/merged: operation not permitted
jskov-jyskebank-dk commented 4 years ago

FWIW I see the same inside a rootless OpenShift container:

I can use podman build to build images (using VFS).

But I still have to use buildah to run a dynamic command line (same problem as Daniel noted, as I recall).

smekkley commented 4 years ago

I'm curious about the implementation. It's disabled to create new cgroup? But but I assume it's different from buildah isolation chroot mode, if it's not working properly.

I'd also like to add that most users don't need network, ipc and pid isolation, if that makes you guys development easier. As for filesystem, it doesn't have to be completely isolated as well. buildah chroot mode is quite fine.

tvvignesh commented 4 years ago

@rhatdan @AkihiroSuda Hi. I wanted to use Podman to build images in my Gitlab CI pipeline using Kubernetes executor and I have set a restricted PSP (Using exactly this: https://raw.githubusercontent.com/kubernetes/website/master/content/en/examples/policy/restricted-psp.yaml) where no capabilities are added and root access is disabled.

Since the build is running in kubernetes directly, there is no docker involved, its directly running within unpriviliged containerd with no root access in Kubernetes.

I get the same error as mentioned:

Capture

And this is how the sample pipeline looks:

image: "quay.io/podman/stable"

buildah:
  tags:
    - development
    - ops
  variables:
    STORAGE_DRIVER: "vfs"
    BUILDAH_FORMAT: "docker"
    IMAGE_TAG: $CI_REGISTRY_IMAGE:edge
  script:
    - podman version
    - whoami
    - echo "Logging into $CI_REGISTRY"
    - podman login -u gitlab-ci-token -p $CI_BUILD_TOKEN $CI_REGISTRY
    - podman build -t ${IMAGE_TAG} .
    - podman images
    - podman push ${IMAGE_TAG}
    - podman logout $CI_REGISTRY

May I know how I can get this to work? Would it not work with the current PSP? Should I be changing something else? Thanks.