apache / openwhisk

Apache OpenWhisk is an open source serverless cloud platform
https://openwhisk.apache.org/
Apache License 2.0
6.54k stars 1.17k forks source link

User containers should run as non-root user #898

Closed domdom82 closed 7 years ago

domdom82 commented 8 years ago

This is a design-proposal for security hardening of the invoker.

I have been testing with ways to achieve better security against forkbomb attacks using AppArmor. One of the main problem with applying the nproc limit is that it does not apply to root, no matter whether you apply the limit via ulimit or AppArmor (which uses ulimit under the hood anyway).

The second problem is that the nproc limit applies only per user, not per container. It is, after all, a kernel feature and way older than lxc, docker or any other container tech. So to make this work for docker < 1.10 (i.e. without user namespaces) you have to use a new user id for every container you run. This makes sure that each process gets a fresh nproc limit.

If you use the same user id for all containers, you will run into the problems described here as they will all share the same limit.

My proposal is this:

When running a user container, the invoker runs the container using a high UID and passes it using the docker run -u <uid> option. The user does not have to exist on the host for this to work. Inside the container, the user will still be member of the root group, however no longer be root itself, thus all ulimits like nproc apply. This will effectively protect against forkbomb attacks.

This is how it would work:

  1. when makeContainer gets called, the invoker first calls getFreeUID(containerName)
  2. getFreeUID returns an unused user id and blocks this user id for other containers.
  3. makeContainer then starts the action container like so

    docker run -d -u <freeUID> --security-stuff whisk/nodeJsAction
  4. rmContainer finally unblocks the user id to make it available again.

There are two drawbacks I see for this approach:

  1. We have to keep track of the user ids. So the invoker needs to make sure every new container gets a unique uid to prevent nproc limit collision. This is basically doing user namespaces manually.
  2. The author of a blackbox container can no longer make the "I am root" assumption. Their stuff needs to be able to run as non-root users.
domdom82 commented 8 years ago

With --pids-limit coming in Docker 1.11 and Kernel 4.3 we no longer need nproc ulimit. This makes this problem much easier to handle as we don't need different user ids for every container. Instead I suggest we use an unprivileged user id like "nobody" to run the containers and see where we land.

domdom82 commented 8 years ago

Team meeting decision from Oct 4:

domdom82 commented 8 years ago

So non-root will only apply to the host. The container will still think it is running as root.

domdom82 commented 8 years ago

Initial design proposal:

domdom82 commented 8 years ago

Current problem is that the ansible docker module does not yet support the --userns option. There is an open issue here https://github.com/ansible/ansible-modules-core/issues/5054

So we currently can not deploy OpenWhisk w/ this feature using ansible. As a mitigation we can fall back to using shell commands, which is uncool.

domdom82 commented 8 years ago

Estimated steps for this feature:

domdom82 commented 8 years ago

Investigation results:

domdom82 commented 8 years ago

Following my experiments we will probably need two more steps:

Since docker on startup creates a folder /var/lib/docker/<uid>.<gid> to hold the namespaced containers and images and invoker mounts /var/lib/docker/containers, we will have to add that indirection to the invoker.

- [ ] Provide a well-known uid+gid range in /etc/subuid and /etc/subgid for the dockremap namespace. We must know the mapped root uid (e.g. 100000) - [ ] Adjust invoker mount from /var/lib/docker/containers to /var/lib/docker/100000.100000/containers~ (see below) This should all be doable with touching invoker ansible playbooks only.

domdom82 commented 8 years ago

Discussion w/ @jeremiaswerner we have decided to change the design to feature-detection instead of having to know subuid / subgid up front:

domdom82 commented 8 years ago

Ran into a couple of problems during testing, likely found minor bugs in Docker. Opened issues:

https://github.com/docker/docker/issues/27775 https://github.com/docker/docker/issues/27740

domdom82 commented 8 years ago

Discussion with @estesp has lead me to turn around the original way of least change and embrace user namespaces wherever possible. This has the implication that all containers are to run namespaced and subsequently mounted directories like wsklogs need to be writeable by non-root (preferably only dockremap) users. However, I feel this is the better decision because: