containers / bubblewrap

Low-level unprivileged sandboxing tool used by Flatpak and similar projects
Other
3.95k stars 237 forks source link

bubblewrap inside unprivileged docker #505

Open aurium opened 2 years ago

aurium commented 2 years ago

bubblewrap is becoming a popular sandbox tool, so we need be able to use it inside unprivileged docker to containerize solutions.

As you may know bwrap works correctly in a privileged container:

$ docker run \
    --privileged \
    -v $HOME/SteamHome:/myself \
    -e HOME=/myself \
    -w /myself \
    -ti --entrypoint /bin/bash \
    ubuntu:jammy
# Ok! We are inside a privileged docker
root@2b7dfc1b3179:~# bwrap --ro-bind /usr /usr --ro-bind /bin /bin --ro-bind /etc /etc --ro-bind /lib /lib --ro-bind /lib32 /lib32 --ro-bind /lib64 /lib64 --dir /tmp --dir /var --proc /proc --dev /dev --unshare-all --share-net --die-with-parent --dir /run/user/$(id -u) --bind /tmp /SteamHome --chdir /SteamHome /bin/bash
root@2b7dfc1b3179:/SteamHome#
# Great! Not so great, because privileged docker services are not really jailed.

You also may know that wont work with a simple --privileged removal:

$ docker run \
    -v $HOME/SteamHome:/myself \
    -e HOME=/myself \
    -w /myself \
    -ti --entrypoint /bin/bash \
    ubuntu:jammy
# Now we are inside an unprivileged docker
root@2b7dfc1b3179:~# bwrap --ro-bind /usr /usr --ro-bind /bin /bin --ro-bind /etc /etc --ro-bind /lib /lib --ro-bind /lib32 /lib32 --ro-bind /lib64 /lib64 --dir /tmp --dir /var --proc /proc --dev /dev --unshare-all --share-net --die-with-parent --dir /run/user/$(id -u) --bind /tmp /SteamHome --chdir /SteamHome /bin/bash
bwrap: No permissions to create new namespace, likely because the kernel does not allow non-privileged user namespaces. See <https://deb.li/bubblewrap> or <file:///usr/share/doc/bubblewrap/README.Debian.gz>.
# Expected fail

Now lets try to give all permissions, then when we succeed, we can remove one by one to use only the necessary capabilities:

DEVICES='--device=/dev/rtc'
for dev in /dev/*; do
  test -h $dev && echo "Not shared: $(ls -l $dev)" || true
  test -d $dev -o -h $dev || DEVICES="$DEVICES --device=$dev"
  test -d $dev && DEVICES="$DEVICES -v=$dev:$dev" || true
done
Not shared: lrwxrwxrwx 1 root root 11 abr  1 08:58 /dev/core -> /proc/kcore
Not shared: lrwxrwxrwx 1 root root 13 abr  1 08:58 /dev/fd -> /proc/self/fd
Not shared: lrwxrwxrwx 1 root root 12 abr  1 08:58 /dev/initctl -> /run/initctl
Not shared: lrwxrwxrwx 1 root root 28 abr  1 08:58 /dev/log -> /run/systemd/journal/dev-log
Not shared: lrwxrwxrwx 1 root root 4 abr  1 08:58 /dev/rtc -> rtc0
Not shared: lrwxrwxrwx 1 root root 15 abr  1 08:58 /dev/stderr -> /proc/self/fd/2
Not shared: lrwxrwxrwx 1 root root 15 abr  1 08:58 /dev/stdin -> /proc/self/fd/0
Not shared: lrwxrwxrwx 1 root root 15 abr  1 08:58 /dev/stdout -> /proc/self/fd/1
docker run \
    --cap-add SYS_CHROOT \
    --cap-add SYS_ADMIN \
    --cap-add SETUID \
    --cap-add SETGID \
    --cap-add SYS_PTRACE \
    --cap-add NET_ADMIN \
    --cap-add AUDIT_WRITE \
    --cap-add CHOWN \
    --cap-add DAC_OVERRIDE \
    --cap-add FOWNER \
    --cap-add FSETID \
    --cap-add KILL \
    --cap-add MKNOD \
    --cap-add NET_BIND_SERVICE \
    --cap-add NET_RAW \
    --cap-add SETFCAP \
    --cap-add SETGID \
    --cap-add SETPCAP \
    --cap-add SETUID \
    --cap-add SYS_CHROOT \
    --cap-add AUDIT_CONTROL \
    --cap-add AUDIT_READ \
    --cap-add BLOCK_SUSPEND \
    --cap-add DAC_READ_SEARCH \
    --cap-add IPC_LOCK \
    --cap-add IPC_OWNER \
    --cap-add LEASE \
    --cap-add LINUX_IMMUTABLE \
    --cap-add MAC_ADMIN \
    --cap-add MAC_OVERRIDE \
    --cap-add NET_BROADCAST \
    --cap-add SYS_BOOT \
    --cap-add SYS_MODULE \
    --cap-add SYS_NICE \
    --cap-add SYS_PACCT \
    --cap-add SYS_PTRACE \
    --cap-add SYS_RAWIO \
    --cap-add SYS_RESOURCE \
    --cap-add SYS_TIME \
    --cap-add SYS_TTY_CONFIG \
    --cap-add SYSLOG \
    --cap-add WAKE_ALARM \
    $DEVICES \
    -v $HOME/SteamHome:/myself \
    -e HOME=/myself \
    -w /myself \
    -ti --entrypoint /bin/bash \
    ubuntu:jammy
# Ok... It looks alike the `--privileged` result.
root@335ec51ae632:~# bwrap --ro-bind /usr /usr --ro-bind /bin /bin --ro-bind /etc /etc --ro-bind /lib /lib --ro-bind /lib32 /lib32 --ro-bind /lib64 /lib64 --dir /tmp --dir /var --proc /proc --dev /dev --unshare-all --share-net --die-with-parent --dir /run/user/$(id -u) --bind /tmp /SteamHome --chdir /SteamHome /bin/bash
bwrap: Failed to make / slave: Permission denied
# Oh! Unexpected fail!

To be sure I ran capsh --print on both --privileged try and on the all --cap-add try. Both give me the same result:

Current: =
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read
Ambient set =
Current IAB: !cap_perfmon,!cap_bpf,!cap_checkpoint_restore
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
 secure-no-ambient-raise: no (unlocked)
uid=1000(myself) euid=1000(myself)
gid=1000(myself)
groups=
Guessed mode: UNCERTAIN (0)
thelamer commented 1 year ago

You are missing apparmor

--security-opt apparmor=unconfined

The setuid variant can potentially run as root in the container but I have not gotten that working. I discovered this while putting together a steamos container.

s-hamann commented 1 year ago

You are also missing the seccomp filter

--security-opt seccomp=unconfined

Docker's default seccomp filter blocks the clone and unshare syscalls (among others), which bubblewrap needs to create a new namespace. Podman's seccomp filter is more permissive. Bubblewrap works (with a few limitations) in an unprivileged Podman container.

horihel commented 1 year ago

I'm experiencing the same behaviour trying out bubblewrap inside a k8s pod - even with seccomp set to Unconfined.

mulawamichal commented 7 months ago

for me it was enough to add
--cap-add SYS_ADMIN --security-opt apparmor=unconfined --security-opt seccomp=unconfined like advised by @s-hamann and @thelamer

smcv commented 7 months ago

bubblewrap cannot work if it's run inside a container that doesn't allow the necessary syscalls, mount operations, etc. to let bubblewrap to do its job. The precise permissions that are required are not obvious, partly because the kernel gives us very little diagnostic information when we don't have them ("Permission denied" is as much as we get).

This isn't a bubblewrap bug: doing impossible things is out-of-scope for this project.

horihel commented 7 months ago

I think it's probably a common request that I've seen me and my team looking for too: people would like to use bubblewrap (or something similar) in a confined environment (like Openshift in its default configuration for example). I guess documenting clearly what's required might both help and cut down the noise.

andrew-aladjev commented 6 months ago

I am experiencing the same problems in ubuntu 24.04. I am using bwrap in docker container. apparmor=unconfined (included in --privileged option) is not enough, because you are just disabling some apparmor profiles and these profiles are not ideal, if you put it mildly. Actually apparmor profiles looks like bug on the bug and main bug is driving all this construction. Solution is the following:

abi <abi/4.0>,
include <tunables/global>

profile bwrap /usr/bin/bwrap flags=(unconfined) {
  userns,
  include if exists <local/bwrap>
}

You need to put this code in /etc/apparmor.d/usr.bin.bwrap (on the root machine) and run systemctl restart apparmor.service.

smcv commented 6 months ago

@andrew-aladjev:

I am experiencing the same problems in ubuntu 24.04

Not really: you are experiencing a new, different problem that has a similar symptom.

Ubuntu has changed the Ubuntu 24.04 kernel so that programs like bubblewrap are not allowed to create a new user namespace unless they are given an AppArmor profile that contains the userns permission. This is their choice, and if it's causing a problem for you, please report it to them. Changes in bubblewrap are not going to solve this.

A relevant Ubuntu bug is https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/2046844.

Ubuntu developers have said that they are intentionally not adding a profile like the one you've suggested (reference: https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/2046844/comments/90, https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/2046844/comments/91). What they are doing instead is adding a profile for each program that uses bubblewrap, including Flatpak, Steam, nautilus/GNOME Files (via libgnome-desktop), epiphany/GNOME Web (via WebKitGTK) and so on, as well as adding a profile for each program that does not use bubblewrap but does similar things a different way, such as Firefox and Chrome. If you are using some different program that invokes bwrap - for example mkosi - my understanding is that they would tell you to add a profile for that program instead of a profile for bwrap.

I personally think their stated reasoning is flawed: they say that the reason is that giving bwrap a profile like this would allow for an arbitrary bypass of their restriction, but programs like the ones for which they are adding profiles are not designed to impose a security boundary that distrusts their caller either, so it's straightforward for an unprivileged user to bypass their restriction anyway. But I didn't design their security model, and what they choose to do in their distro is not my decision.

Maryse47 commented 6 months ago

Ubuntu has changed the Ubuntu 24.04 kernel so that programs like bubblewrap are not allowed to create a new user namespace unless they are given an AppArmor profile that contains the userns permission. This is their choice

I wonder how long it would take for them to reconsider that choice.

smcv commented 6 months ago

I wonder how long it would take for them to reconsider that choice.

This is not Ubuntu's issue tracker and we have no control over what they do, so please take any speculation or advocacy about this to Ubuntu/Canonical issue trackers rather than here.

andrew-aladjev commented 6 months ago
What they are doing instead is adding a profile for each program that uses bubblewrap, including Flatpak, Steam, nautilus/GNOME Files (via libgnome-desktop), epiphany/GNOME Web (via WebKitGTK) and so on, as well as adding a profile for each program that does not use bubblewrap but does similar things a different way, such as Firefox and Chrome. If you are using some different program that invokes bwrap - for example mkosi - my understanding is that they would tell you to add a profile for that program instead of a profile for bwrap.

It will be good to add this info into bwrap docs, despite the fact it is related to ubuntu, thank you.