Closed domdom82 closed 7 years ago
With --pids-limit coming in Docker 1.11 and Kernel 4.3 we no longer need nproc ulimit. This makes this problem much easier to handle as we don't need different user ids for every container. Instead I suggest we use an unprivileged user id like "nobody" to run the containers and see where we land.
Team meeting decision from Oct 4:
So non-root will only apply to the host. The container will still think it is running as root.
Initial design proposal:
Current problem is that the ansible docker module does not yet support the --userns option. There is an open issue here https://github.com/ansible/ansible-modules-core/issues/5054
So we currently can not deploy OpenWhisk w/ this feature using ansible. As a mitigation we can fall back to using shell commands, which is uncool.
Estimated steps for this feature:
--userns=host
for regular containers (modulo step 1 works)--userns-remap=default
Investigation results:
Following my experiments we will probably need two more steps:
Since docker on startup creates a folder /var/lib/docker/<uid>.<gid>
to hold the namespaced containers and images and invoker mounts /var/lib/docker/containers
, we will have to add that indirection to the invoker.
- [ ] Provide a well-known uid+gid range in
/etc/subuid
and /etc/subgid
for the dockremap
namespace. We must know the mapped root uid (e.g. 100000)- [ ] Adjust invoker mount from ~
(see below)
This should all be doable with touching invoker ansible playbooks only./var/lib/docker/containers
to /var/lib/docker/100000.100000/containers
Discussion w/ @jeremiaswerner we have decided to change the design to feature-detection instead of having to know subuid / subgid up front:
docker info
/var/lib/docker/<uid>.<gid>/containers
accordinglyRan into a couple of problems during testing, likely found minor bugs in Docker. Opened issues:
https://github.com/docker/docker/issues/27775 https://github.com/docker/docker/issues/27740
Discussion with @estesp has lead me to turn around the original way of least change and embrace user namespaces wherever possible. This has the implication that all containers are to run namespaced and subsequently mounted directories like wsklogs
need to be writeable by non-root (preferably only dockremap) users.
However, I feel this is the better decision because:
This is a design-proposal for security hardening of the invoker.
I have been testing with ways to achieve better security against forkbomb attacks using AppArmor. One of the main problem with applying the
nproc
limit is that it does not apply to root, no matter whether you apply the limit via ulimit or AppArmor (which uses ulimit under the hood anyway).The second problem is that the
nproc
limit applies only per user, not per container. It is, after all, a kernel feature and way older than lxc, docker or any other container tech. So to make this work for docker < 1.10 (i.e. without user namespaces) you have to use a new user id for every container you run. This makes sure that each process gets a freshnproc
limit.If you use the same user id for all containers, you will run into the problems described here as they will all share the same limit.
My proposal is this:
When running a user container, the invoker runs the container using a high UID and passes it using the
docker run -u <uid>
option. The user does not have to exist on the host for this to work. Inside the container, the user will still be member of the root group, however no longer be root itself, thus all ulimits like nproc apply. This will effectively protect against forkbomb attacks.This is how it would work:
makeContainer
gets called, the invoker first callsgetFreeUID(containerName)
makeContainer
then starts the action container like sormContainer
finally unblocks the user id to make it available again.There are two drawbacks I see for this approach: