containers / bubblewrap

Low-level unprivileged sandboxing tool used by Flatpak and similar projects
Other
3.94k stars 237 forks source link

Running systemd, or openrc or another init process with bwrap #668

Open amirouche opened 1 day ago

amirouche commented 1 day ago

Within an alpine mini root filesystem aka. rootfs, where I installed openrc, openssh, and networking tool, I am trying to start openrc with rootless bwrap, and fail.

Here are debug logs:


~/rootfs/init/alpine $ bwrap --die-with-parent --as-pid-1 --clearenv --unshare-uts --unshare-ipc --unshare-pid --unshare-cgroup  --share-net --cap-add ALL --uid 0 --gid 0
 --tmpfs /tmp/ --dev-bind $(pwd) / --proc /proc --ro-bind /sys /sys --chdir /root --hostname nested -- /sbin/init 

   OpenRC 0.54 is starting up Linux 6.6.58-1-lts (x86_64)

 * /proc is already mounted
 * Mounting /run ... [ ok ]
 * /run/openrc: creating directory
 * /run/lock: creating directory
 * /run/lock: correcting owner
 * checkpath: chown: Invalid argument
Service `hwdrivers' needs non existent service `dev'
Service `machine-id' needs non existent service `dev'
 * Caching service dependencies ... [ ok ]
mkdir: can't create directory '/sys/fs/cgroup/openrc.fsck': Read-only file system
 * Checking local filesystems  ... [ ok ]
mkdir: can't create directory '/sys/fs/cgroup/openrc.root': Read-only file system
 * Remounting filesystems ... [ ok ]
mkdir: can't create directory '/sys/fs/cgroup/openrc.localmount': Read-only file system
 * Mounting local filesystems ... [ ok ]
mkdir: can't create directory '/sys/fs/cgroup/openrc.hostname': Read-only file system
 * Setting hostname ... [ ok ]
mkdir: can't create directory '/sys/fs/cgroup/openrc.networking': Read-only file system
 * Starting networking ... *   eth0 ...ip: ioctl 0x8913 failed: No such device
 [ !! ]
 * ERROR: networking failed to start
/lib/rc/sh/openrc-run.sh: line 197: can't create /sys/fs/cgroup/openrc.sshd/cgroup.procs: Read-only file system
 * Starting sshd ... [ ok ]
mkdir: can't create directory '/sys/fs/cgroup/openrc.networking': Read-only file system
 * Starting networking ... *   eth0 ...ip: ioctl 0x8913 failed: No such device
 [ !! ]
 * ERROR: networking failed to start
/lib/rc/sh/openrc-run.sh: line 197: can't create /sys/fs/cgroup/openrc.sshd/cgroup.procs: Read-only file system
 * Starting sshd ... [ ok ]
^C
~/rootfs/init/alpine $
pandaninjas commented 1 day ago

You're binding /sys read-only, which means the init process cannot create directories in it (which it seems to need to do)

amirouche commented 1 day ago

I removed --ro-bind /sys /sys, use --share-net and added ::askfirst:/bin/sh to /etc/inittab, and now I can login inside my alpine rootfs:

~/rootfs/init/alpine $ bwrap --die-with-parent --as-pid-1 --clearenv --unshare-uts --unshare-ipc --unshare-pid --unshare-cgroup  --share-net --cap-add ALL --uid 0 --gid 0
 --tmpfs /tmp/ --bind $(pwd) / --proc /proc --hostname nested -- /sbin/init 

   OpenRC 0.54 is starting up Linux 6.6.58-1-lts (x86_64)

 * /proc is already mounted
 * Mounting /run ... [ ok ]
 * /run/openrc: creating directory
 * /run/lock: creating directory
 * /run/lock: correcting owner
 * checkpath: chown: Invalid argument
Service `hwdrivers' needs non existent service `dev'
Service `machine-id' needs non existent service `dev'
 * Caching service dependencies ... [ ok ]
 * Starting sshd ... [ ok ]
 * Starting sshd ... [ ok ]

Please press Enter to activate this console. 
/bin/sh: can't access tty; job control turned off
/ # ps aux
PID   USER     TIME  COMMAND
    1 root      0:00 /sbin/init
  188 root      0:00 /bin/sh
  195 root      0:00 /sbin/getty 38400 tty1
  196 root      0:00 /sbin/getty 38400 tty2
  197 root      0:00 /sbin/getty 38400 tty3
  198 root      0:00 /sbin/getty 38400 tty4
  199 root      0:00 /sbin/getty 38400 tty5
  200 root      0:00 /sbin/getty 38400 tty6
  201 root      0:00 ps aux
/ # ping github.com
PING github.com (140.82.121.4): 56 data bytes
64 bytes from 140.82.121.4: seq=0 ttl=42 time=11.710 ms
64 bytes from 140.82.121.4: seq=1 ttl=42 time=13.878 ms
^C

Do you know what the following message means:

/bin/sh: can't access tty; job control turned off
smcv commented 12 hours ago

This is not really what bwrap is designed for, and I'd recommend a more fully-featured container system like podman, lxc or Incus.

bwrap is primarily designed for "app containers" where a single leaf application runs inside the sandbox: for example, it was originally written to be used as part of Flatpak, which uses a separate instance of bwrap to run each Flatpak app instance. You can think of these as "like a chroot, but better" or alternatively "a bit like an Android app".

If you're running a complete system from the init system up, behaving almost like a lightweight virtual machine, then what you have there is a "system container", which has rather different requirements.

smcv commented 11 hours ago

/bin/sh: can't access tty; job control turned off

This means that the shell inside the container was unable to gain full control over the terminal that it found itself running on, probably because the /dev/pts inside your container is either missing or a separate instance of the devpts pseudo-filesystem. Bind-mounting the pseudo-terminal that you want to use as the system console onto the container's /dev/console is the usual trick for resolving this; but if the container is going to behave like a system container, then you almost certainly need an instance of /dev/pts as well.

bwrap provides all the building blocks for putting together a container, but it's up to you to use them correctly in a way that matches your requirements and security model.

smcv commented 11 hours ago

chown: Invalid argument

This is probably because bubblewrap is limited by what's available to an unprivileged user without making use of setuid binaries, so it can only have one uid (in your case this seems to be root). In practice this is going to make your system container very limited.

Instead of fighting with bubblewrap, I would recommend using something that is designed for the job you're doing. podman, lxc, Incus and systemd-nspawn are all reasonable choices for a system container, although I don't know which of those are supported/supportable on an Alpine host.

Using Docker (with some special non-default configuration, because it isn't really designed for system containers) is also an option.