docker / for-linux

Docker Engine for Linux
https://docs.docker.com/engine/installation/
754 stars 85 forks source link

Allow setns() in container, or add flag to allow it specifically #496

Open mercmobily opened 5 years ago

mercmobily commented 5 years ago

The docker update from 1.11.x to 1.12.x seems to have broken setns() calls inside container. setns() is used by Chrome for creating a namespaces. I figured this out after reading this SO post

The only solution right now is to run chrome with --no-sandbox but that's way way less than ideal. Another "solution" is to run the container with --cap-add=SYS_ADMIN -- which is a rather broad thing to do.

Expected behavior

I expect to EITHER have a flag to enable setns() in the container (so that Chrome can run securely), OR allow setns() in docker containers.

Actual behavior

Right now, the whole world is effectively using --no-sandbox to run Chrome in containers. Seriously.

Steps to reproduce the behavior

Output of docker version:

    Client:
     Version:      1.13.1
     API version:  1.26
     Go version:   go1.8.3
     Git commit:   092cba3
     Built:        Thu Oct 12 22:34:44 2017
     OS/Arch:      linux/amd64

    Server:
     Version:      1.13.1
     API version:  1.26 (minimum version 1.12)
     Go version:   go1.8.3
     Git commit:   092cba3
     Built:        Thu Oct 12 22:34:44 2017
     OS/Arch:      linux/amd64
     Experimental: false

Output of docker info:


Containers: 1
 Running: 1
 Paused: 0
 Stopped: 0
Images: 23
Server Version: 1.13.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 23
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1
runc version: 9df8b306d01f59d3a8029be411de015b7304dd8f
init version: N/A (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.13.0-46-generic
Operating System: Ubuntu 17.10
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 7.68 GiB
Name: merc-B250M-D3H
ID: 5VQF:HZG3:ULIM:TQOZ:ITG2:SUGX:HFZ2:QBZH:HJR6:GABW:COXR:CY3E
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
cpuguy83 commented 5 years ago

Most likely this is the seccomp policy blocking setns. You can supply a custom seccomp policy.

But also, you are already running chrome in a container, what is the need to do another "setns"?

mercmobily commented 5 years ago

I can see this page that explains how to set Seccomp profiles for dockers, and especially the option --security-opt seccomp=/path/to/seccomp/profile.json. What would the profile.json have to contain to just whitelist setns()?

As per your second question, imagine a complex application that needs Chrome in headless mode, to run some client-side testing or to generate PDFs -- or whatever. In this case, running Chrome without a sandbox would imply that a hacker could exploit some of Chrome's vulnerabilities to gain access to the instance.

If you let me know, and I see that it works, I will make sure I tell pretty much everybody online (that would include the Selenium people, but also countless people out there on SO and various forums) that there is a solution other than disabling the sandbox, which in many cases is a really bad idea.

thaJeztah commented 5 years ago

You may be interested in @jessfraz's Dockerfile for chrome https://github.com/jessfraz/dockerfiles/blob/master/chrome/stable/Dockerfile

and the corresponding seccomp profile https://github.com/jessfraz/dotfiles/blob/master/etc/docker/seccomp/chrome.json

mercmobily commented 5 years ago

Ah, I even looked into those repos a lot in order to figure out what was going on... but never together, and without knowing about the seccomp flag. So... The chrome json file seems to be listing a lot of calls to allow. I guess it's because it basically overrides the full default seccomp setting Is there no way to make this future-proof, and say "apply the default setting, with this difference" so to speak?

mercmobily commented 5 years ago

@jessfraz I saw a few tickets on your repos where people asked you about the Chrome issue; you referred them to your dotfiles. However, I believe a more detailed explanation, when this problem arises. Just my humble 2c -- thank you for everything!

thaJeztah commented 5 years ago

Is there no way to make this future-proof, and say "apply the default setting, with this difference" so to speak?

Yes, the seccomp profile is unfortunately quite verbose. This was by design, because the default profile is configurable on the daemon, so if a container would only specify the "diff" (assuming the daemon runs the default profile), the result would be unpredictable. So for that reason, the seccomp profile requires you to specify exactly what the profile should look like.

Perhaps it would be a fun "pet" project to create a seccomp-profile generator, i.e. something like;

seccomp-bake \
  --default-profile=profiles/seccomp/default.json \
  --whitelist-add=foo,bar,baz \
  -o ./my-profile.json

(although probably could be done with, e.g., jq)

Another improvement would be this proposal; https://github.com/moby/moby/issues/32801 (adding "entitlements"), which would make setting security options more user-friendly

mercmobily commented 5 years ago

Hi,

alright... I will test this out on my own machine (mainly making sure that setns() is the only thing Chrome needs, and if it isn't, wrestling permissions till I get it right, possibly checking @jessfraz's settings in Chrome) and will then proceed to mass-answering people with the same issue in the gazillion places I've found (probably just pointing to this issue, which right now is pure gold to a lot of people out there)

mercmobily commented 5 years ago

Hi,

So, it's not just setns -- as I imagined. After cutting and sorting and diffing, here is the list of calls that are NOT whitelisted in the default config file but are listed in @jessfraz's Chrome config file.

    > arch_prctl
    > chroot
    > clone
    > fanotify_init
    > name_to_handle_at
    > open_by_handle_at
    > setdomainname
    > sethostname
    > syslog
    > unshare
    > vhangup
    > setns

I frankly don't know if all of them are needed. I assume @jessfraz would have straced chrome and checked which calls were called... maybe?

So, at this stage if somebody wants to run Chrome in a docker container, they can basically:

I think the Selenium people are the first one that must be warned, since right now basically anybody running Travis/Selenium, is running an insecure sandbox-less Chrome. That's planet-wise.

Before I go out there and tell everybody, may I ask: I realise that the list above is the full list that will make it work with Chrome. But... can it be shortened? How was it worked out? Trial & error? Strace? Grepping Chrome's source?

I guess the best person to answer would be @jessfraz -- any hints?

cpuguy83 commented 5 years ago

Almost certainly strace

mercmobily commented 5 years ago

@cpuguy83 If that is the case, there is no point in trying and shorten it.

Do you think it's worthwhile trying my luck, and see if the Docker people would accept a pull request adding CHROME the same way CAP_SYS_ADMIN is?

This wold be to help out all those people out there trying to get headless chrome to do software testing in a container...

cpuguy83 commented 5 years ago

Sorry no. Those capabilities are actual Linux capabilities

thaJeztah commented 5 years ago

I think the Selenium people are the first one that must be warned, since right now basically anybody running Travis/Selenium, is running an insecure sandbox-less Chrome. That's planet-wise.

Chrome will be sandboxed as a whole by the container; if those containers are minimal (only contain chrome, and the bare minimum required), and follow best practices, such as running as a non-privileged user, run with a read-only filesystem, have --security-opt=no-new-privileges set, as well as memory and CPU constraints), no damage could be done beyond what's inside the container (possibly, the profile could be tightened further, as the default profile is a "generic" profile for common use).

Note that @jessfraz's Dockerfile (and seccomp profile) is targeted at desktop / interactive use of the Chrome container, and therefore may be more permissive than required for your use case (running Selenium tests in headless mode).

Given that more syscalls are whitelisted in the Chrome seccomp profile, that actually means the profile is less restrictive than the default, thus introducing more risks if the container gets compromised.

mercmobily commented 5 years ago

@thaJeztah Yours is a compelling argument. However, if the container for example must be able to connect to a database server, for example, a non-sandbox chrome might become the gateway to gain read-access to the database and get credentials. If a shell is obtained, the intruder will be able to reach hosts that would normally be unreachable. So, while it's true that a malicious user exploiting a Chrome vulnerability would "only" be able to access the container, there are many cases where access to that container's data -- and even just having a shell in that container -- might be a problem bigger than expected. You can surely think of several dangerous scenarios if you have an application server that needs to run headless Chrome (for example to create PDFs, for example).

Your comment on the possibiity of headless Chrome not needing all of these:

> arch_prctl
> chroot
> clone
> fanotify_init
> name_to_handle_at
> open_by_handle_at
> setdomainname
> sethostname
> syslog
> unshare
> vhangup
> setns

Is interesting; by looking at them, I doubt headless Chrome would need much less. But, it would need investigation for sure.

mercmobily commented 5 years ago

@thaJeztah Any thoughts? I don't want to recommend anything to anybody unless it's sound advice, and your message cast some doubts on my reasoning. When you write if those containers are minimal (only contain chrome, and the bare minimum required),, I think that those "minimum requirements" for server-side Chrome will inevitably have to report the results to another host, possibly have access to hosts otherwise protected, and have some privileges to do so. Om the other hand, do you think the syscalls above have security implications? (arch_prctl jumps to mind)

thaJeztah commented 5 years ago

I think that those "minimum requirements" for server-side Chrome will inevitably have to report the results to another host, possibly have access to hosts otherwise protected, and have some privileges to do so.

If this is in a CI environment, you should assume the content you're running is compromised, and configure what the container is allowed to access based on that assumption. In a Docker setup, that could also mean; connect the container to a network that only allows it to connect to those services/containers that you want it to be able to reach. (If this is about "results", and you don't want it to be able to "push" those changes, perhaps writing to a file, and collect those changes would be an option). That said; I don't have a lot of experience with setting up Selenium, so not sure I can give more advice on that part 😅

Om the other hand, do you think the syscalls above have security implications? (arch_prctl jumps to mind)

I'll defer that one to @justincormack and @jessfraz, who are probably better at answering that.

mercmobily commented 5 years ago

My use case is not actually selenium/CI. That's just a common user case that requires chrome to run. in my specific case, my server uses Chrome headless to create PDF files. I realise it's not very common, but there are cases where headless Chrome needs to run as part of a complex server application.

On Mon, Nov 26, 2018, 6:34 PM Sebastiaan van Stijn <notifications@github.com wrote:

I think that those "minimum requirements" for server-side Chrome will inevitably have to report the results to another host, possibly have access to hosts otherwise protected, and have some privileges to do so.

If this is in a CI environment, you should assume the content you're running is compromised, and configure what the container is allowed to access based on that assumption. In a Docker setup, that could also mean; connect the container to a network that only allows it to connect to those services/containers that you want it to be able to reach. (If this is about "results", and you don't want it to be able to "push" those changes, perhaps writing to a file, and collect those changes would be an option). That said; I don't have a lot of experience with setting up Selenium, so not sure I can give more advice on that part 😅

Om the other hand, do you think the syscalls above have security implications? (arch_prctl jumps to mind)

I'll defer that one to @justincormack https://github.com/justincormack and @jessfraz https://github.com/jessfraz, who are probably better at answering that.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/docker/for-linux/issues/496#issuecomment-441592637, or mute the thread https://github.com/notifications/unsubscribe-auth/ACB7XqDU4vBbz8_k7ygzb1QNN5YAOGL3ks5uy8PJgaJpZM4YuiJM .

lucifer1004 commented 5 years ago

My use case is running Chrome (headful) with AWS Fargate, where neither --cap-add nor --security-opt can be used, does this mean I can only run Chrome with --no-sandbox?

thaJeztah commented 5 years ago

My use case is running Chrome (headful) with AWS Fargate, where neither --cap-add nor --security-opt can be used, does this mean I can only run Chrome with --no-sandbox?

If there's no option to customize those (or the daemon configuration), then probably: yes.

If a dedicated option was added for this, then you'd probably also not be able to configure that in that case, so it may be better to open a feature request with AWS

nick-kang commented 1 year ago

I think the Selenium people are the first one that must be warned, since right now basically anybody running Travis/Selenium, is running an insecure sandbox-less Chrome. That's planet-wise.

Chrome will be sandboxed as a whole by the container; if those containers are minimal (only contain chrome, and the bare minimum required), and follow best practices, such as running as a non-privileged user, run with a read-only filesystem, have --security-opt=no-new-privileges set, as well as memory and CPU constraints), no damage could be done beyond what's inside the container (possibly, the profile could be tightened further, as the default profile is a "generic" profile for common use).

Note that @jessfraz's Dockerfile (and seccomp profile) is targeted at desktop / interactive use of the Chrome container, and therefore may be more permissive than required for your use case (running Selenium tests in headless mode).

Given that more syscalls are whitelisted in the Chrome seccomp profile, that actually means the profile is less restrictive than the default, thus introducing more risks if the container gets compromised.

Just want to point out that when I run dockerized chromium with --security-opt=no-new-privileges, I get the following error:

       The setuid sandbox is not running as root. Common causes:
         * A parent process set prctl(PR_SET_NO_NEW_PRIVS, ...)
       Failed to move to new namespace: PID namespaces supported, Network namespace supported, but failed: errno = Operation not permitted

Workaround we're using is leaving out no-new-privileges.

CryptoKiddies commented 1 year ago

@thaJeztah Yours is a compelling argument. However, if the container for example must be able to connect to a database server, for example, a non-sandbox chrome might become the gateway to gain read-access to the database and get credentials. If a shell is obtained, the intruder will be able to reach hosts that would normally be unreachable. So, while it's true that a malicious user exploiting a Chrome vulnerability would "only" be able to access the container, there are many cases where access to that container's data -- and even just having a shell in that container -- might be a problem bigger than expected. You can surely think of several dangerous scenarios if you have an application server that needs to run headless Chrome (for example to create PDFs, for example).

Your comment on the possibiity of headless Chrome not needing all of these:

> arch_prctl
> chroot
> clone
> fanotify_init
> name_to_handle_at
> open_by_handle_at
> setdomainname
> sethostname
> syslog
> unshare
> vhangup
> setns

Is interesting; by looking at them, I doubt headless Chrome would need much less. But, it would need investigation for sure.

What I'm unsure about is how the no-sandbox option entirely bypasses the need for some of these system calls. I can understand how setns would be needed to create the boundaries between chrome processes/windows, but how about all the rest of these calls? Wouldn't chromium still need a call like setdomainname even without sandboxing?

dusty-1 commented 1 year ago

Currently, I am able to run Chromium in a container if I pass either "--no-sandbox" or "--security-opt=seccomp=unconfined" command line arguments. I, however, would prefer to have sandboxes working properly as they appear necessary for Chromium's "Site Isolation" Design. Site Isolation "helps defend against... UXSS and fully compromised renderer processes."

There are two (short) Chromium design documents that I found helpful, as I just started researching this. Links to these follow, in the hopes that they may be helpful to others: https://github.com/chromium/chromium/blob/main/docs/linux/sandboxing.md

Linux Sandboxing Chromium uses a multiprocess model, which allows to give different privileges and restrictions to different parts of the browser. For instance, we want renderers to run with a limited set of privileges since they process untrusted input and are likely to be compromised. Renderers will use an IPC mechanism to request access to resource from a more privileged (browser process). ...

https://sites.google.com/a/chromium.org/dev/developers/design-documents/site-isolation

Chrome's multi-process architecture provides many benefits for speed, stability, and security. It allows web pages in unrelated tabs to run in parallel, and it allows users to continue using the browser and other tabs when a renderer process crashes. Because the renderer processes don't require direct access to disk, network, or devices, Chrome can also run them inside a restricted sandbox. This limits the damage that attackers can cause if they exploit a vulnerability in the renderer, including making it difficult for attackers to access the user's filesystem or devices, as well as privileged pages (e.g., settings or extensions) and pages in other profiles (e.g., Incognito mode)....

My goal is to learn how to create a seccomp profile that is optimal for Chromium, where neither Docker nor the browser are less secure as a result. I should probably start with something current. Does anyone know where the source is located for the official "builtin" profile reported by "docker info" command?

Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
thaJeztah commented 1 year ago

Code to generate the default (builtin) profile is found here; https://github.com/moby/moby/blob/v24.0.2/profiles/seccomp/default_linux.go

That code generates the default.json file that's in the same directory; https://github.com/moby/moby/tree/v24.0.2/profiles/seccomp