I've experimented a bit with runc's rootless containers + additional linux capabilities using sarus. I think rootless containers will get quite popular with the next major release of Docker, and I think it provides the perfect trade-off between flexibility and security. (rootless in this context = dropping privileged setuid before executing the container command)
The main reason to look into this is being able to build images inside of a container running in the sarus runtime, which is currently impossible (#10). It's also impossible to run package manager commands like apt-get [...] inside of an ubuntu container with sarus currently.
To solve these two problems, it seems we need a few Linux capabilities, to be precise: CAP_CHOWN, CAP_SETUID, CAP_SETGID, CAP_FOWNER, and CAP_DAC_OVERRIDE.
In the current situation we cannot have those capabilities in sarus because they are too powerful. E.g. a user can chown a root-owned file from a mounted directory to make him/herself owner, and there's probably more issues.
With user namespaces however, this is not an issue anymore. We can drop the seteuid and seteguid privileges right before executing the container command so that the container is executed as the current user, and then use namespaces with a user mapping to map the current user to root inside the container. This solves at least the obvious issues with mounting root-owned files (even when the user has CAP_CHOWN permissions):
# create a file owned by root that cannot be read by others, and verify it cannot be chown'ed when mounted inside the container
test-escalation-sarus $ echo "hi" > root-owned-file.txt
test-escalation-sarus $ sudo chown root:root root-owned-file.txt
test-escalation-sarus $ sudo chmod go-rw root-owned-file.txt
test-escalation-sarus $ sarus run --mount=type=bind,src=`pwd`,destination=/workspace -t ubuntu:18.04 /bin/bash
root@harmen-desktop:/# cd workspace/
root@harmen-desktop:/workspace# id
uid=0(root) gid=0(root) groups=0(root),65534(nogroup)
root@harmen-desktop:/workspace# cat root-owned-file.txt
cat: root-owned-file.txt: Permission denied
root@harmen-desktop:/workspace# chown harmen:harmen root-owned-file.txt
chown: changing ownership of 'root-owned-file.txt': Operation not permitted
Another great feature of namespaces is that files created as root inside of a mounted directory are in fact owned by the current user outside of the container.
The only potential issue at the moment seems to be that cgroups are not yet handled well with rootless containers, but the runc folks seem to have a workaround using cgroups v2, which is nearly finished.
Also note that it seems like a step in the direction of making the sarus not a setuid binary. Because we have to mount things, we can probably never entirely get rid of that, but with rootless containers we can at least drop the privileges before executing container commands.
Hi all,
I've experimented a bit with
runc
's rootless containers + additional linux capabilities using sarus. I think rootless containers will get quite popular with the next major release of Docker, and I think it provides the perfect trade-off between flexibility and security. (rootless in this context = dropping privileged setuid before executing the container command)The main reason to look into this is being able to build images inside of a container running in the sarus runtime, which is currently impossible (#10). It's also impossible to run package manager commands like
apt-get [...]
inside of an ubuntu container with sarus currently.To solve these two problems, it seems we need a few Linux capabilities, to be precise:
CAP_CHOWN
,CAP_SETUID
,CAP_SETGID
,CAP_FOWNER
, andCAP_DAC_OVERRIDE
.In the current situation we cannot have those capabilities in sarus because they are too powerful. E.g. a user can chown a root-owned file from a mounted directory to make him/herself owner, and there's probably more issues.
With user namespaces however, this is not an issue anymore. We can drop the
seteuid
andseteguid
privileges right before executing the container command so that the container is executed as the current user, and then use namespaces with a user mapping to map the current user to root inside the container. This solves at least the obvious issues with mounting root-owned files (even when the user has CAP_CHOWN permissions):Another great feature of namespaces is that files created as root inside of a mounted directory are in fact owned by the current user outside of the container.
The only potential issue at the moment seems to be that
cgroups
are not yet handled well with rootless containers, but the runc folks seem to have a workaround using cgroups v2, which is nearly finished.Also note that it seems like a step in the direction of making the sarus not a
setuid
binary. Because we have to mount things, we can probably never entirely get rid of that, but with rootless containers we can at least drop the privileges before executing container commands.I have a working example of everything here: https://github.com/eth-cscs/sarus/compare/develop...haampie:rootless not too many changes. If you want to compile it, you need to copy some hard-coded values from
/etc/subuid
and/etc/subgid
.With the above I can make sarus do all the things I would wish to do :) e.g.