Can we avoid CVE-2019-5736-like attacks generically?

frankm42 commented 5 years ago

runc through 1.0-rc6, as used in Docker before 18.09.2 and other products, allows attackers to overwrite the host runc binary (and consequently obtain host root access) by leveraging the ability to execute a command as root within one of these types of containers: (1) a new container with an attacker-controlled image, or (2) an existing container, to which the attacker previously had write access, that can be attached with docker exec. This occurs because of file-descriptor mishandling, related to /proc/self/exe.

Is bubblewrap affected by this as well?

smcv commented 5 years ago

bubblewrap can be affected if the command inside the container runs attacker-supplied code as real uid 0 (i.e. uid 0 outside the container, not just uid 0 in a different namespace).

My understanding is that this is not considered to be a bubblewrap vulnerability, and larger programs that use bubblewrap in this way (like Flatpak when installing apps with extra_data system-wide) are responsible for solving it. For example, this was CVE-2019-8308 in flatpak, fixed in flatpak 1.2.3 and 1.0.7.

smcv commented 5 years ago

Normally, bubblewrap users run code inside the container as a uid that differs from the owner of the bubblewrap executable, in which case there's no problem. For instance, if uid 1000 runs a Flatpak app, or a bubblewrap-based sandbox like the one in libgnome-desktop, it will normally use a bubblewrap executable owned by root to run code as uid 1000.

frankm42 commented 5 years ago

If I understand this correctly https://github.com/flatpak/flatpak/commit/cd2142888fc4c199723a0dfca1f15ea8788a5483 using --proc will resolve this issue. Is that correct?

Maybe there should be a warning in the bubblewrap documentation/manpage about this issue and how to resolve it.

If I an application in bubblewrap as root, this will affect me (without the use of --proc). Should something like this really be fixed by every single application that uses bubblewrap instead of fixing it here?

smcv commented 5 years ago

If I understand this correctly flatpak/flatpak@cd21428 using --proc will resolve this issue. Is that correct?

I think so, but I don't have a reproducer for the Flatpak vulnerability, so I'm not completely clear about how an attacker makes use of it to overwrite something (bwrap itself?) on the host system.

Maybe there should be a warning in the bubblewrap documentation/manpage about this issue and how to resolve it.

Please propose a patch/MR that adds a suitable warning somewhere that you would have looked for it?

If I an application in bubblewrap as root, this will affect me (without the use of --proc). Should something like this really be fixed by every single application that uses bubblewrap instead of fixing it here?

Quite possibly yes. bubblewrap can't know what privileges you want the sandboxed process to have or not have, and there are many ways you can configure a bubblewrap command-line that result in the sandboxed process having some or all of the caller's privileges outside the sandbox (for instance --bind /run/user /run/user usually gives you access to the D-Bus session bus and systemd --user sockets, which give you arbitrary code execution outside the container). We don't second-guess whether this is what the caller wanted.

In particular, I would recommend not using real uid 0 inside containers, i.e. not running bwrap as root (by which I mean real uid 0 - effective uid 0, as caused by a setuid root bwrap executable, is OK).

Having said that, if you can propose a concrete change that would prevent this privilege escalation route without causing regressions, we can consider merging it. I think runc's solution to CVE-2019-5736 (cloning its binary into a memfd and exec'ing that) is probably too intrusive - the amount of code involved in doing this would be a significant proportion of the size of bubblewrap, which would partially defeat the goal of making bubblewrap straightforward to audit.

My understanding is that bubblewrap's general security policy is:

on systems where bubblewrap is setuid root, it's a serious bubblewrap bug if an unprivileged user can use it to get privileges that they didn't already have access to;
but if a process in a sandbox can escalate its privileges to those of the user outside the sandbox, that's a bug in how the caller is using bubblewrap (it isn't restricting the process in the sandbox enough), not a bug in bubblewrap itself.

smcv commented 5 years ago

If I understand this correctly flatpak/flatpak@cd21428 using --proc will resolve this issue. Is that correct?

Looking again:

No, it's the opposite. The solution used in Flatpak was ensuring that procfs is not mounted in the container, for the one case where Flatpak routinely runs something as real uid 0 (I think Flatpak 1.3.x might have stopped doing this, but 1.2.x and older still need to do that). You can't exploit the special properties of /proc/self/exe if it doesn't exist in the sandbox.

procfs is mounted in the container if either of these is true: a procfs was explicitly mounted with --proc, or the host system procfs (normally /proc) was exposed in the container by a --bind (or one of the same family of options, such as --dev-bind).

smcv commented 5 years ago

One feature request that might help, if people need to run bwrap as real uid 0, would be an option to map the --uid inside the container to a different uid outside the container.

When bwrap's real uid is some nonzero number, for example 1000, our only option is to map uid 1000 to the --uid (default 1000), and map all other uids to the overflowuid.

When bwrap's real uid is 0, at the moment we would map uid 0 to the --uid (default 0) and all other uids to the overflowuid, but we could equally well map (for example) uid 123 to the --uid (default 0 again) and all other uids to the overflowuid. However, to do that, we would need the caller to tell us a "safe" system uid on the host system (for example 123), that doesn't own anything important, and doesn't run any processes on the host system (because if it did, they could kill bwrap's child process, causing DoS).

frankm42 commented 5 years ago

If I understand this correctly flatpak/flatpak@cd21428 using --proc will resolve this issue. Is that correct?

Looking again:

No, it's the opposite. The solution used in Flatpak was ensuring that procfs is not mounted in the container, for the one case where Flatpak routinely runs something as real uid 0 (I think Flatpak 1.3.x might have stopped doing this, but 1.2.x and older still need to do that). You can't exploit the special properties of /proc/self/exe if it doesn't exist in the sandbox.

procfs is mounted in the container if either of these is true: a procfs was explicitly mounted with --proc, or the host system procfs (normally /proc) was exposed in the container by a --bind (or one of the same family of options, such as --dev-bind).

Seems like I misunderstood the --proc option. I assumed that it worked like the --dev option and provided a reduced /proc environment. If it simply mound /proc into the new mount namespace this will not reduce but widen the attack surface. This seems easily understood (or is it me?) especially since it looks the same as "--dev /dev" which does provide a reduced dev environment.

I think the --proc option should have a warning or clarification that this option should only be used where necessary and does not tighen the sandboxed environement but does the opposite.

Edit: After experienting with the --proc option, I think many applications need this. Firefox would not run without this, complaining not being able to find the application directory.

smcv commented 5 years ago

Seems like I misunderstood the --proc option. I assumed that it worked like the --dev option and provided a reduced /proc environment.

It sort of does, but there's a difference between "reduced" and "completely safe for any possible use".

It mounts a procfs that is appropriate for the namespaces used for the sandboxed app. If you're using --unshare-pid, then the new procfs only lists processes in the new pid namespace. If you're not, then the new procfs would effectively be equivalent to the real /proc on the host system (and in fact bwrap bind-mounts the real /proc instead in this case).

--proc also covers up the sys, sysrq-trigger, irq and bus directories, which a plain procfs mount wouldn't.

However, unlike --dev /dev, even a procfs that only lists processes in the new pid namespace has some attack surface (enough for CVE-2019-5736, if you're running as real uid 0, since bubblewrap's init process has to exist as pid 1 in the sandbox pid namespace).

I think the --proc option should have a warning or clarification that this option should only be used where necessary and does not tighen the sandboxed environement but does the opposite.

The documentation is terse, but accurate: it says "mount procfs" and that's what the option does. If you think it should say more, please propose a pull request with what you think it should say: only you can know whether a particular clarification would have guided you in the right direction.

Note that adding any option that mounts a filesystem could either tighten or loosen the sandbox, depending what is mounted and what they are mounted over. If you start from a read-only container image with an empty /proc, and then add --proc /proc, then you are loosening the sandbox: previously the sandboxed process couldn't list or manipulate processes, and now it can. Conversely, if your starting point is a --bind of the host system, and you are using --unshare-pid, adding --proc /proc would tighten the sandbox: previously the sandboxed process could list all processes on the host system, but now it can only list processes in its own pid namespace. So it isn't really true to say that --proc always makes the sandbox more strict, or that it always makes the sandbox less strict: the only thing you can say for sure is that it mounts a procfs in the requested location.

I agree that the interactions between pseudo-file-systems and "magic" files are not obvious, but there's a limit to what bubblewrap can do about that. bubblewrap provides a toolbox to set up sandboxes, but appropriate uses of those tools are facts about the Linux kernel and the particular sandbox behaviour that you want, more than they are facts about bubblewrap.

DemiMarie commented 5 years ago

However, unlike --dev /dev, even a procfs that only lists processes in the new pid namespace has some attack surface (enough for CVE-2019-5736, if you're running as real uid 0, since bubblewrap's init process has to exist as pid 1 in the sandbox pid namespace).

Not quite.

When running bubblewrap as PID 1 in the PID namespace, it is impossible to write to the bwrap executable, regardless of privileges (one will get -ETXTBSY): if PID 1 in the container dies, so will every other process there. That said, there might be race conditions in the kernel, and it is still possible to use chmod or chattr on it, resulting in a denial of service.

In particular, I would recommend not using real uid 0 inside containers, i.e. not running bwrap as root (by which I mean real uid 0 - effective uid 0, as caused by a setuid root bwrap executable, is OK).

What about the case where the container does have real UID 0, but Mandatory Access Control (SELinux/AppArmor/SMACK/etc) is enforcing and prevents the container from having access to things that it should not? This is the case for the SELinux policy in Docker, for example: it stopped CVE-2019-5736 cold, as container_t cannot write to container_runtime_exec_t. In fact, SELinux, when used with Docker, forbids a container from writing anywhere outside of itself (including other containers), regardless of DAC rules and UIDs.

smcv commented 5 years ago

What about the case where the container does have real UID 0, but Mandatory Access Control (SELinux/AppArmor/SMACK/etc) is enforcing and prevents the container from having access to things that it should not?

I would still recommend not using uid 0 inside containers, but if you have to use uid 0 for some other reason, then MAC could certainly help to mitigate that.

bubblewrap and Flatpak are distribution-neutral and cannot really rely on any particular MAC or policy (the conservative assumption is that they aim to be secure on every distribution, even those that don't use LSMs), but distributions that use SELinux or SMACK might be able to use those to protect against processes inside the container escaping. If you have a generic implementation that can be included in bubblewrap or Flatpak, please propose it.

AppArmor is considerably more difficult to deal with because it's path-based, but the things bubblewrap does with mount namespaces and pivot_root change the meaning of a path, so any applicable AppArmor policy needs to be written with both the "outside" and "inside" filesystem layouts in mind.

containers / bubblewrap

Can we avoid CVE-2019-5736-like attacks generically? #313