Closed apyrgio closed 6 days ago
Our solution must take into account the following:
ptrace_scope
on Linux systems.On (1), we have verified that none of the systems we support has Linux kernel < 4.8. This applies also to Windows (WSL2) and macOS (HyperKit). On (2), we have seen that the default ptrace_scope
is 1
in platforms we support. This scope is supported by gVisor.
For Podman versions < 4, we already have a workaround in our code that starts the process with Podman's default seccomp policy as of June 6th, 2024 (see seccomp.json
):
For Docker Desktop, we have not a similar workaround, because we don't know exactly when was this restriction lifted. We do know that Containerd 1.6.7 first allowed the ptrace()
syscall, and that Docker Desktop 4.12.0 included this Containerd version. However, we have tested with Docker Desktop release 4.19.0 on macOS, and the ptrace()
syscall was disabled, so we're not sure.
So, our suggestion is to:
seccomp.json
file we have for Podman as well.This way, older releases will use our Podman seccomp policy, which will guarantee that ptrace(2)
will be allowed. In case an older Docker Desktop release allows the ptrace(2)
system call, our seccomp policy will mask it, but the differences should be negligible.
Newer releases will use their default seccomp policy, and thus we will not mask any security-related fixes that happen in the future.
Docker also allows the ptrace(2)
system call, if CAP_SYS_PTRACE
is specified in the container invocation. Note that we don't add this Linux capability in the current implementation:
Why is that? Because using the CAP_SYS_PTRACE
capability, the outer container will be able to trace any process, which significantly increases our attack surface.
For this reason, we choose not to go down that path, and simply pass our own seccomp policy.
It seems that docker version
gives an output that is not friendly to parsing, if we just want the Docker Desktop release (i.e., the 4.27.2
part):
$ docker version -f {{.Server.Platform.Name}}
Docker Desktop 4.27.2 (137060)
$ docker version -f json
{
"Client": {
"CloudIntegration": "v1.0.35+desktop.10",
"Version": "25.0.3",
"ApiVersion": "1.44",
"DefaultAPIVersion": "1.44",
"GitCommit": "4debf41",
"GoVersion": "go1.21.6",
"Os": "darwin",
"Arch": "arm64",
"BuildTime": "Tue Feb 6 21:13:26 2024",
"Context": "default"
},
"Server": {
"Platform": {
"Name": "Docker Desktop 4.27.2 (137060)"
},
"Components": [
{
"Name": "Engine",
"Version": "25.0.3",
"Details": {
"ApiVersion": "1.44",
"Arch": "arm64",
"BuildTime": "Tue Feb 6 21:14:22 2024",
"Experimental": "false",
"GitCommit": "f417435",
"GoVersion": "go1.21.6",
"KernelVersion": "6.6.12-linuxkit",
"MinAPIVersion": "1.24",
"Os": "linux"
}
},
{
"Name": "containerd",
"Version": "1.6.28",
"Details": {
"GitCommit": "ae07eda36dd25f8a1b98dfbf587313b99c0190bb"
}
},
{
"Name": "runc",
"Version": "1.1.12",
"Details": {
"GitCommit": "v1.1.12-0-g51d5e94"
}
},
{
"Name": "docker-init",
"Version": "0.19.0",
"Details": {
"GitCommit": "de40ad0"
}
}
],
"Version": "25.0.3",
"ApiVersion": "1.44",
"MinAPIVersion": "1.24",
"GitCommit": "f417435",
"GoVersion": "go1.21.6",
"Os": "linux",
"Arch": "arm64",
"KernelVersion": "6.6.12-linuxkit",
"BuildTime": "2024-02-06T21:14:22.000000000+00:00"
}
}
We can use the Docker Engine version instead:
$ docker version -f {{.Server.Version}}
25.0.3
Most likely, we can consider anything greater than 25.0 as safe.
Problem
Our recent gVisor integration (#590) requires allowlisting the
ptrace(2)
system call in the outer container, in order to spawn the inner container withrunsc
. Nowadays, this is the default [1], but we have encountered systems that don't allow this system call, and thus Dangerzone cannot run in them, at least out of the box.Affected systems are:
runc
version 1.1.5Background
Before explaining how we plan to fix this issue, we'll give some background on the
ptrace(2)
system call.First of all, why is this syscall dangerous in the first place? The main reason is that a malicious process can use it in order to escalate its privileges, or thwart some system protections. A real-life example is CVE-2019-2054. This CVE is the reason why
ptrace(2)
is not allowed in Linux kernels < 4.8, but it's not the only ptrace-related CVE that has been reported.In order to control the scope of
ptrace(2)
system call, the Linux kernel offers the following mechanisms:CAP_SYS_PTRACE
Linux capability. If this capability is enabled, then the process can have full tracing capabilities, such as tracing other processes that it has not started. If this capability is not granted, then the usage ofptrace(2)
is still allowed, but restricted through the mechanisms listed below.ptrace(2)
in their seccomp policy, and then re-enabled it for kernels >= 4.8.ptrace_scope
setting. This setting controls the behavior ofptrace(2)
system-wide. In the Linux platforms we support, the default seems to be1
, i.e., allowptrace(2)
only for processes that the parent has direct relationship with (e.g., child processes).[1] See Podman's seccomp policy, Docker's seccomp policy, and
containerd
's seccomp policy.