concourse / concourse-chart

Helm chart to install Concourse
Apache License 2.0
145 stars 175 forks source link

Kubernetes 1.26.1 - Linux Capabilities - starting container process caused: apply caps: operation not permitted #330

Open MysticalMount opened 1 year ago

MysticalMount commented 1 year ago

Describe the bug

Ive deployed the workers to a privileged namespace:

Namespace: cc

apiVersion: v1
kind: Namespace
metadata:
  name: cc
  labels:
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/enforce-version: v1.26
    pod-security.kubernetes.io/audit: privileged
    pod-security.kubernetes.io/audit-version: v1.26
    pod-security.kubernetes.io/warn: privileged
    pod-security.kubernetes.io/warn-version: v1.26

On Kubernetes 1.26.1

When trying to run a hello world pipeline I get this using Guardian inside the worker pod:

{"timestamp":"2023-04-09T16:51:45.884909106Z","level":"error","source":"guardian","message":"guardian.api.garden-server.create.failed","data":{"error":"runc run: exit status 1: container_linux.go:380: starting container process caused: apply caps: operation not permitted","request":{"Handle":"54e0c267-01e1-4e01-690f-df2cff3b5bf8","GraceTime":0,"RootFSPath":"raw:///concourse-work-dir/volumes/live/68c0ccae-e204-453a-6365-4e8b36d6e541/volume","BindMounts":[{"src_path":"/concourse-work-dir/volumes/live/63e92077-b1e9-428d-5172-fca9332f4ac1/volume","dst_path":"/scratch","mode":1}],"Network":"","Privileged":true,"Limits":{"bandwidth_limits":{},"cpu_limits":{},"disk_limits":{},"memory_limits":{},"pid_limits":{}}},"session":"3.1.4548"}

Im fairly new to Concourse, so if Im missing something, sorry!

I can see that securityContext: privileged: true is set on the workers statefulset - in the source YAML and its also seemingly set in the resulting statefulset:

        securityContext:
          capabilities:
            add:
            - all
          privileged: true

(Ive been adding the capabilities to try to resolve the issue)

As far as I can tell the container is privileged - I am also using TalosCtl, but cant find anything, thus far to suggest it it Talos related.

Any steps/help/advice on where to go next or what Ive missed welcome.

Reproduction steps

  1. Deploy Kubernetes v1.26.1
  2. Deploy Helm Chart with mostly default settings with Web and Worker
  3. Connect to web, deploy example pipeline using fly ...

Expected behavior

Expected would be the container image to pull and start successfully

Additional context

In my setup Im using custom registries so expect some setup here, but suspect we are hitting this issue pre to that being the problem

flokli commented 1 year ago

I digged a bit through the error messages, ended up dropping CAP_SYS_MODULE from concourse worker/runtime/spec/capabilities.go, but then I get a slightly different error message from runc:

runc run: exit status 1: runc run failed: unable to start container process: unable to apply caps: operation not permitted

This was essentially that patch:

commit af3cebb55c01a298b69243517e72b268665b9e2b
Author: Florian Klink <flokli@flokli.de>
Date:   Thu Jul 13 14:27:28 2023 +0300

    worker: drop CAP_SYS_MODULE from the list of capabilities

    `worker/runtime/spec/spec.go@defaultGardenOCISpec` calls out to
    `OciCapabilities(privileged bool)`, returning a list of capabilities to
    put in the OCI spec, which is then passed to runc.

    Note this is independent of what the container payload might actually
    need, it always asks for these capabilities.

    This causes problems when running concourse-worker in a Talos cluster,
    which does not allow asking for CAP_SYS_MODULE and CAP_SYS_BOOT
    (Concourse doesn't  ask for the latter):
concourse-worker-2 concourse-worker {"timestamp":"2023-07-13T11:17:47.529591744Z","level":"error","source":"guardian","message":"guardian.api.garden-server.create.failed","data":{"error":"runc run: exit status 1: container_linux.go:380: starting container process caused: apply caps: operation not permitted","request":{"Handle":"af712415-e9aa-4ba7-639f-b291f6e2caaf","GraceTime":0,"RootFSPath":"raw:///concourse-work-dir/volumes/live/e5bce4ac-4d45-45c5-6338-38aaaaf27e72/volume","BindMounts":[{"src_path":"/concourse-work-dir/volumes/live/1b925d7a-e33b-41dc-6f4f-9cdc701583f0/volume","dst_path":"/scratch","mode":1}],"Network":"","Privileged":true,"Limits":{"bandwidth_limits":{},"cpu_limits":{},"disk_limits":{},"memory_limits":{},"pid_limits":{}}},"session":"3.1.140807"}}
```

See https://www.talos.dev/v1.4/learn-more/process-capabilities/ for
details.

Removing that CAP from the list should get runc to successfully execute
in Talos clusters. It might cause problems for people trying to modprobe
kernel modules inside Concourse, but I hope noone does that ;-)

Signed-off-by: Florian Klink <flokli@flokli.de>

diff --git a/worker/runtime/spec/capabilities.go b/worker/runtime/spec/capabilities.go index b38c32f4a..6443c1fb0 100644 --- a/worker/runtime/spec/capabilities.go +++ b/worker/runtime/spec/capabilities.go @@ -72,7 +72,6 @@ var ( "CAP_SYS_ADMIN", "CAP_SYS_BOOT", "CAP_SYS_CHROOT",

A version of this was pushed to flokli/concourse:20230713-01.

I went ahead and patched OciCapabilities to always return UnprivilegedContainerCapabilities, just to see how far it'd get:

commit 043babd9347f4e671e5e03f22b1a3d9065fac5bb
Author: Florian Klink <flokli@flokli.de>
Date:   Thu Jul 13 15:37:16 2023 +0300

    HACK

diff --git a/worker/runtime/spec/capabilities.go b/worker/runtime/spec/capabilities.go
index 6443c1fb0..2600f0001 100644
--- a/worker/runtime/spec/capabilities.go
+++ b/worker/runtime/spec/capabilities.go
@@ -3,11 +3,7 @@ package spec
 import "github.com/opencontainers/runtime-spec/specs-go"

 func OciCapabilities(privileged bool) specs.LinuxCapabilities {
-   if !privileged {
-       return UnprivilegedContainerCapabilities
-   }
-
-   return PrivilegedContainerCapabilities
+   return UnprivilegedContainerCapabilities
 }

 var (

A version of this was pushed to flokli/concourse:20230713-02.

With that, runc fails with runc run failed: unable to start container process: can't get final child's PID from pipe: EOF

It looks like the Concourse model of running runc inside privileged pods gets more and more incompatible with more recent/secure versions of Kubernetes.

I'm not sure how much further time I'm willing to spend on trying to get this working - https://github.com/concourse/concourse/issues/5682 sounds like a more sustainable long-term solution.

flokli commented 1 year ago

Hmmh, concourse adds both CAP_SYS_BOOT and CAP_SYS_MODULE, I just got tricked by the Talos documentation calling it wrong (fixed in https://github.com/siderolabs/talos/pull/7473). I'll re-roll the first patch and see what dropping both capabilities will do:

commit 92d624adbb1c7d4e855602703f6a81387a8868d8 (HEAD)
Author: Florian Klink <flokli@flokli.de>
Date:   Thu Jul 13 14:27:28 2023 +0300

    worker: drop CAP_SYS_{BOOT,MODULE} from the list of capabilities

    `worker/runtime/spec/spec.go@defaultGardenOCISpec` calls out to
    `OciCapabilities(privileged bool)`, returning a list of capabilities to
    put in the OCI spec, which is then passed to runc.

    Note this is independent of what the container payload might actually
    need, it always asks for these capabilities.

    This causes problems when running concourse-worker in a Talos cluster,
    which does not allow asking for CAP_SYS_MODULE and CAP_SYS_BOOT
    (Concourse doesn't  ask for the latter):
concourse-worker-2 concourse-worker {"timestamp":"2023-07-13T11:17:47.529591744Z","level":"error","source":"guardian","message":"guardian.api.garden-server.create.failed","data":{"error":"runc run: exit status 1: container_linux.go:380: starting container process caused: apply caps: operation not permitted","request":{"Handle":"af712415-e9aa-4ba7-639f-b291f6e2caaf","GraceTime":0,"RootFSPath":"raw:///concourse-work-dir/volumes/live/e5bce4ac-4d45-45c5-6338-38aaaaf27e72/volume","BindMounts":[{"src_path":"/concourse-work-dir/volumes/live/1b925d7a-e33b-41dc-6f4f-9cdc701583f0/volume","dst_path":"/scratch","mode":1}],"Network":"","Privileged":true,"Limits":{"bandwidth_limits":{},"cpu_limits":{},"disk_limits":{},"memory_limits":{},"pid_limits":{}}},"session":"3.1.140807"}}
```

See https://www.talos.dev/v1.4/learn-more/process-capabilities/ for
details.

Removing these capabilities from the list should get runc to
successfully execute in Talos clusters. It might cause problems for
people trying to modprobe kernel modules inside Concourse, but I hope
noone does that ;-)

Signed-off-by: Florian Klink <flokli@flokli.de>

diff --git a/worker/runtime/spec/capabilities.go b/worker/runtime/spec/capabilities.go index b38c32f4a..9819650a4 100644 --- a/worker/runtime/spec/capabilities.go +++ b/worker/runtime/spec/capabilities.go @@ -70,9 +70,7 @@ var ( "CAP_SETUID", "CAP_SYSLOG", "CAP_SYS_ADMIN",

flokli commented 1 year ago

Ok, with the new patch applied (pushed to flokli/concourse:20230713-03), removing both of these two caps from the list, and adding all capabilities in the pod spec, I get the same runc run failed: unable to start container process: can't get final child's PID from pipe: EOF.

That smells like an incompatibility, either with the cgroup structure in Talos, or assuming it's using Docker as an outer container runtime.

flokli commented 1 year ago

https://github.com/moby/moby/issues/40835#issuecomment-663397714 suggests this might be an issue with what mountpoints are seen inside the container, or with user namespace support, even though I'm a bit unsure where runc itself is emitting that error message…

flokli commented 1 year ago

I sent a PR containing the first patch to https://github.com/concourse/concourse/pull/8791.