apyrgio commented 6 days ago

Problem

Our recent gVisor integration (#590) requires allowlisting the ptrace(2) system call in the outer container, in order to spawn the inner container with runsc. Nowadays, this is the default [1], but we have encountered systems that don't allow this system call, and thus Dangerzone cannot run in them, at least out of the box.

Affected systems are:

Ubuntu Focal (via OpenSUSE's repo), with Podman version 3.4.2
Ubuntu Jammy, with Podman version 3.4.4
Debian Bullseye, with Podman version 3.0.1
Older Docker Desktop releases, e.g., with runc version 1.1.5

Background

Before explaining how we plan to fix this issue, we'll give some background on the ptrace(2) system call.

First of all, why is this syscall dangerous in the first place? The main reason is that a malicious process can use it in order to escalate its privileges, or thwart some system protections. A real-life example is CVE-2019-2054. This CVE is the reason why ptrace(2) is not allowed in Linux kernels < 4.8, but it's not the only ptrace-related CVE that has been reported.

In order to control the scope of ptrace(2) system call, the Linux kernel offers the following mechanisms:

The CAP_SYS_PTRACE Linux capability. If this capability is enabled, then the process can have full tracing capabilities, such as tracing other processes that it has not started. If this capability is not granted, then the usage of ptrace(2) is still allowed, but restricted through the mechanisms listed below.
Disabling the system call (or arguments to it) via a seccomp policy. For instance:
- Docker originally had disabled ptrace(2) in their seccomp policy, and then re-enabled it for kernels >= 4.8.
- Podman similarly lifted this restriction a few years later.
- Containerd did so around the same time.
The YAMA Linux Security Module ptrace_scope setting. This setting controls the behavior of ptrace(2) system-wide. In the Linux platforms we support, the default seems to be 1, i.e., allow ptrace(2) only for processes that the parent has direct relationship with (e.g., child processes).

[1] See Podman's seccomp policy, Docker's seccomp policy, and containerd's seccomp policy.

apyrgio commented 6 days ago

Requirements

Our solution must take into account the following:

It must work on kernels >= 4.8.
It must work with the default ptrace_scope on Linux systems.
It must work on older Podman and Docker Desktop releases.
- Yes, these releases may be insecure by now, but if we don't support them and our users cannot update to newer ones, they will just open the suspicious file.
The user must not interact with the system in order to make Dangerzone work.

On (1), we have verified that none of the systems we support has Linux kernel < 4.8. This applies also to Windows (WSL2) and macOS (HyperKit). On (2), we have seen that the default ptrace_scope is 1 in platforms we support. This scope is supported by gVisor.

Solution

For Podman versions < 4, we already have a workaround in our code that starts the process with Podman's default seccomp policy as of June 6th, 2024 (see seccomp.json):

https://github.com/freedomofpress/dangerzone/blob/c2a47ec46b077798b371da7624f6c78121105569/dangerzone/isolation_provider/container.py#L117-L119

For Docker Desktop, we have not a similar workaround, because we don't know exactly when was this restriction lifted. We do know that Containerd 1.6.7 first allowed the ptrace() syscall, and that Docker Desktop 4.12.0 included this Containerd version. However, we have tested with Docker Desktop release 4.19.0 on macOS, and the ptrace() syscall was disabled, so we're not sure.

So, our suggestion is to:

Check if the Docker Desktop release is recent. We have had good results with Docker Desktop 4.27.0, for example.
If the release is older, spawn a container using the stored seccomp.json file we have for Podman as well.

This way, older releases will use our Podman seccomp policy, which will guarantee that ptrace(2) will be allowed. In case an older Docker Desktop release allows the ptrace(2) system call, our seccomp policy will mask it, but the differences should be negligible.

Newer releases will use their default seccomp policy, and thus we will not mask any security-related fixes that happen in the future.

Alternatives

Docker also allows the ptrace(2) system call, if CAP_SYS_PTRACE is specified in the container invocation. Note that we don't add this Linux capability in the current implementation:

https://github.com/freedomofpress/dangerzone/blob/c2a47ec46b077798b371da7624f6c78121105569/dangerzone/isolation_provider/container.py#L123-L124

Why is that? Because using the CAP_SYS_PTRACE capability, the outer container will be able to trace any process, which significantly increases our attack surface.

For this reason, we choose not to go down that path, and simply pass our own seccomp policy.

apyrgio commented 6 days ago

It seems that docker version gives an output that is not friendly to parsing, if we just want the Docker Desktop release (i.e., the 4.27.2 part):

$ docker version -f {{.Server.Platform.Name}}
Docker Desktop 4.27.2 (137060)
$ docker version -f json
{
    "Client": {
        "CloudIntegration": "v1.0.35+desktop.10",
        "Version": "25.0.3",
        "ApiVersion": "1.44",
        "DefaultAPIVersion": "1.44",
        "GitCommit": "4debf41",
        "GoVersion": "go1.21.6",
        "Os": "darwin",
        "Arch": "arm64",
        "BuildTime": "Tue Feb  6 21:13:26 2024",
        "Context": "default"
    },
    "Server": {
        "Platform": {
            "Name": "Docker Desktop 4.27.2 (137060)"
        },
        "Components": [
            {
                "Name": "Engine",
                "Version": "25.0.3",
                "Details": {
                    "ApiVersion": "1.44",
                    "Arch": "arm64",
                    "BuildTime": "Tue Feb  6 21:14:22 2024",
                    "Experimental": "false",
                    "GitCommit": "f417435",
                    "GoVersion": "go1.21.6",
                    "KernelVersion": "6.6.12-linuxkit",
                    "MinAPIVersion": "1.24",
                    "Os": "linux"
                }
            },
            {
                "Name": "containerd",
                "Version": "1.6.28",
                "Details": {
                    "GitCommit": "ae07eda36dd25f8a1b98dfbf587313b99c0190bb"
                }
            },
            {
                "Name": "runc",
                "Version": "1.1.12",
                "Details": {
                    "GitCommit": "v1.1.12-0-g51d5e94"
                }
            },
            {
                "Name": "docker-init",
                "Version": "0.19.0",
                "Details": {
                    "GitCommit": "de40ad0"
                }
            }
        ],
        "Version": "25.0.3",
        "ApiVersion": "1.44",
        "MinAPIVersion": "1.24",
        "GitCommit": "f417435",
        "GoVersion": "go1.21.6",
        "Os": "linux",
        "Arch": "arm64",
        "KernelVersion": "6.6.12-linuxkit",
        "BuildTime": "2024-02-06T21:14:22.000000000+00:00"
    }
}

We can use the Docker Engine version instead:

$ docker version -f {{.Server.Version}}
25.0.3

Most likely, we can consider anything greater than 25.0 as safe.

freedomofpress / dangerzone

Handle seccomp policies that don't include ptrace(2) #846

Problem

Background

Requirements

Solution

Alternatives