User containers should run as non-root user

domdom82 commented 8 years ago

This is a design-proposal for security hardening of the invoker.

I have been testing with ways to achieve better security against forkbomb attacks using AppArmor. One of the main problem with applying the nproc limit is that it does not apply to root, no matter whether you apply the limit via ulimit or AppArmor (which uses ulimit under the hood anyway).

The second problem is that the nproc limit applies only per user, not per container. It is, after all, a kernel feature and way older than lxc, docker or any other container tech. So to make this work for docker < 1.10 (i.e. without user namespaces) you have to use a new user id for every container you run. This makes sure that each process gets a fresh nproc limit.

If you use the same user id for all containers, you will run into the problems described here as they will all share the same limit.

My proposal is this:

When running a user container, the invoker runs the container using a high UID and passes it using the docker run -u <uid> option. The user does not have to exist on the host for this to work. Inside the container, the user will still be member of the root group, however no longer be root itself, thus all ulimits like nproc apply. This will effectively protect against forkbomb attacks.

This is how it would work:

when makeContainer gets called, the invoker first calls getFreeUID(containerName)
getFreeUID returns an unused user id and blocks this user id for other containers.

makeContainer then starts the action container like so

docker run -d -u <freeUID> --security-stuff whisk/nodeJsAction

rmContainer finally unblocks the user id to make it available again.

There are two drawbacks I see for this approach:

We have to keep track of the user ids. So the invoker needs to make sure every new container gets a unique uid to prevent nproc limit collision. This is basically doing user namespaces manually.
The author of a blackbox container can no longer make the "I am root" assumption. Their stuff needs to be able to run as non-root users.

domdom82 commented 8 years ago

With --pids-limit coming in Docker 1.11 and Kernel 4.3 we no longer need nproc ulimit. This makes this problem much easier to handle as we don't need different user ids for every container. Instead I suggest we use an unprivileged user id like "nobody" to run the containers and see where we land.

domdom82 commented 8 years ago

Team meeting decision from Oct 4:

we will go forward with user namespaces
we will not mount containers read-only
we will limit storage using overlay2 + xfs using docker 1.13 (modulo stemcell being able to mount root fs as xfs)
if xfs not possible, pursue other solutions like io-limits or watchdogs until ext4 quota arrives

domdom82 commented 8 years ago

So non-root will only apply to the host. The container will still think it is running as root.

domdom82 commented 8 years ago

Initial design proposal:

Start docker daemon with --userns-remap=default
This will by default start all containers in mapped uid (0 -> 100000)
Disable mapping for regular containers such as controller, invoker etc. by passing --userns=host to docker run commands
Action containers will run as mapped containers and thus be protected by non-root uids.

domdom82 commented 8 years ago

Current problem is that the ansible docker module does not yet support the --userns option. There is an open issue here https://github.com/ansible/ansible-modules-core/issues/5054

So we currently can not deploy OpenWhisk w/ this feature using ansible. As a mitigation we can fall back to using shell commands, which is uncool.

domdom82 commented 8 years ago

Estimated steps for this feature:

[x] Investigate how docker namespaces behave if the kernel does not support it (should gracefully ignore and continue). We need this to support old kernels in travis and vagrant.
[x] Update ansible playbooks with --userns=host for regular containers (modulo step 1 works)
[x] Update docker daemon settings with --userns-remap=default
[x] Make sure invoker runs actions in namespaced containers
[x] Review tests & doc

domdom82 commented 8 years ago

Investigation results:

critical scenario A: daemon not configured w/ userns-remap but container started w/ --userns=host option Outcome: The option is ignored. Passing userns=host to a non-namespaced daemon equals passing userns= to a namespaced daemon. I.e. the container is run as root.
critical scenario B: daemon configured but kernel does not support user namespaces. Outcome: N/A because user namespaces are actually supported fully since kernel 3.12. See http://man7.org/linux/man-pages/man7/user_namespaces.7.html. Vagrant and travis kernels are currently at 3.13 and 3.19 respectively.

domdom82 commented 8 years ago

Following my experiments we will probably need two more steps:

Since docker on startup creates a folder /var/lib/docker/<uid>.<gid> to hold the namespaced containers and images and invoker mounts /var/lib/docker/containers, we will have to add that indirection to the invoker.

~~- [ ] Provide a well-known uid+gid range in /etc/subuid and /etc/subgid for the dockremap namespace. We must know the mapped root uid (e.g. 100000)~~ ~~- [ ] Adjust invoker mount from /var/lib/docker/containers to /var/lib/docker/100000.100000/containers~~~ (see below) This should all be doable with touching invoker ansible playbooks only.

domdom82 commented 8 years ago

Discussion w/ @jeremiaswerner we have decided to change the design to feature-detection instead of having to know subuid / subgid up front:

[ ] On invoker deployment, detect Docker root dir using docker info
[ ] Adjust mount point for /var/lib/docker/<uid>.<gid>/containers accordingly

domdom82 commented 8 years ago

Ran into a couple of problems during testing, likely found minor bugs in Docker. Opened issues:

https://github.com/docker/docker/issues/27775 https://github.com/docker/docker/issues/27740

domdom82 commented 8 years ago

Discussion with @estesp has lead me to turn around the original way of least change and embrace user namespaces wherever possible. This has the implication that all containers are to run namespaced and subsequently mounted directories like wsklogs need to be writeable by non-root (preferably only dockremap) users. However, I feel this is the better decision because:

Using user namespaces in general increases security, also for non-action containers.
Privileged or non-namespaced containers should become the exception over time.
The way user namespaces are designed in Docker today creates hurdles with non-single user containers like Zookeeper or Kafka which have their own users inside the container.

apache / openwhisk

User containers should run as non-root user #898

This is a design-proposal for security hardening of the invoker.