JanitorTechnology / janitor

The fastest development system in the world.
https://janitor.technology
GNU Affero General Public License v3.0
128 stars 22 forks source link

Allow namespace changes in Janitor containers #232

Open beaufortfrancois opened 6 years ago

beaufortfrancois commented 6 years ago

If docker image support namespace changes, I should be able to run Chrome with proper sandboxing.

Background thread:

3:15:27 PM <fbeaufort> I'm trying to set a proper sandbox for Chromium following https://chromium.googlesource.com/chromium/src/+/master/docs/linux_suid_sandbox_development.md
3:15:27 PM <fbeaufort> But I still get this error when running chromium:
3:15:27 PM <fbeaufort> ./out/Default/chrome 
3:15:27 PM <fbeaufort> Failed to move to new namespace: PID namespaces supported, Network namespace supported, but failed: errno = Operation not permitted
3:15:27 PM <fbeaufort> [29206:29206:0116/141032.979244:FATAL:zygote_host_impl_linux.cc(199)] Check failed: ReceiveFixedMessage(fds[0], kZygoteBootMessage, sizeof(kZygoteBootMessage), &boot_pid). 
3:15:27 PM <fbeaufort> #0 0x7f2ef9bfaacd base::debug::StackTrace::StackTrace()
3:15:27 PM <fbeaufort> #1 0x7f2ef9bf8f0c base::debug::StackTrace::StackTrace()
3:15:27 PM <fbeaufort> #2 0x7f2ef9c8068a logging::LogMessage::~LogMessage()
3:15:27 PM <fbeaufort> ...
3:16:18 PM <fbeaufort> Is that a known issue from Docker images?
3:16:18 PM <fbeaufort> See for instance: https://github.com/jessfraz/dockerfiles/issues/65
3:23:50 PM <@janx> fbeaufort: I think that's because our Docker containers don't allow namespace changes, because it would allow becoming root on the host
3:25:01 PM <fbeaufort> Agh ;( I can run Chrome with a special flag to disable sandbox but I can't run my layout tests this way sadly.
3:26:15 PM <fbeaufort> Is there a workaround?
3:26:15 PM <@janx> fbeaufort: this also prevents the Firefox sandbox from working, and also from using the amazing rr debugger :/
3:27:21 PM <@janx> fbeaufort: actually, I think we could run trusted containers with this capability
jankeromnes commented 6 years ago

Thanks for this feature request!

I think that's because our Docker containers don't the CAP_SYS_ADMIN capability, for security reasons.

This also prevents Firefox from running with a sandbox (which it apparently does in Debug mode, as @whimboo found out), and it also prevents us from using rr in our containers.

I don't think we'll want to add CAP_SYS_ADMIN to all Janitor containers (because this allows becoming root on the host), but maybe we could grant it to certain trusted containers, on a case-by-base basis, to enable the valuable use cases listed above?

@notriddle what do you think?

jld commented 6 years ago

Neither Firefox sandboxing nor Chromium's namespace sandbox should need capabilities in the namespace they're launched in (nor any enclosing namespace), but they do need to be able to create new user namespaces.

Normally this is allowed for unprivileged users, but there are concerns about it due to the possibility of exposing exploitable kernel bugs that unprivileged callers normally couldn't reach, so sandboxes usually block those system calls. That seems to be what's going on in Mozilla bug 1430756unshare(0) is a no-op that's normally allowed unconditionally, but it fails.

Docker's documentation mentions a seccomp-bpf policy that would do this. It also links to the policy, in a JSON format, which mentions allowing the syscalls in question in connection with CAP_SYS_ADMIN, and I think what's going on here is that the seccomp-bpf program varies based on the capabilities granted to the container. But, if I'm right about this, it should be possible to edit that profile to allow unshare and clone normally, without capabilities.

notriddle commented 6 years ago

Yeah, unshare and clone are allowed without ADMIN. It's only setns that requires a privileged container.

jankeromnes commented 6 years ago

Thank you for these details! https://github.com/jessfraz/dockerfiles/issues/65#issuecomment-145731454 prompted me to consult man clone, which seems to indicate that CAP_SYS_ADMIN is required for the following flags:

Also, man unshare seems to indicate that some unshare options are associated to some clone flags (although I don't know if that means they need CAP_SYS_ADMIN to work or not):

I guess my questions here are:

  1. Does Chromium's sandbox error message Failed to move to new namespace: PID namespaces supported, Network namespace supported, but failed: errno = Operation not permitted mean that it tried to unsuccessfully use setns, clone or unshare?
  2. @notriddle If unshare is really allowed without ADMIN, then why is Firefox's sandbox choking on "unshare nothing"? https://searchfox.org/mozilla-central/source/security/sandbox/linux/SandboxInfo.cpp#168 [0]
  3. Would it be reasonable to edit Janitor's seccomp-bpf policy to allow unshare and clone for every container, without requiring CAP_SYS_ADMIN, or is this too dangerous from a security standpoint?

[0] This Docker seccomp profile page linked by @jld mentions that for unshare it will "Deny cloning new namespaces for processes. Also gated by CAP_SYS_ADMIN, with the exception of unshare --user." Maybe Firefox is using unshare (gated by ADMIN) instead of unshare --user (not gated)?

jankeromnes commented 6 years ago

Random note, https://github.com/docker/docker-bench-security and Lynis can help us audit the security of our Docker configurations and dockerfiles.

jankeromnes commented 6 years ago

Other random note, this Docker docs page says:

By default Docker drops all capabilities except those needed, a whitelist instead of a blacklist approach. You can see a full list of available capabilities in Linux manpages.

Now we just need to know which capabilities we need to grant to our containers to support gdb, rr and Firefox/Chromium namespace-changing sandboxes, and if granting them is reasonably secure, or if we should only grant them to a select few containers upon special request.