actions / runner

The Runner for GitHub Actions :rocket:
https://github.com/features/actions
MIT License
4.91k stars 964 forks source link

Kubernetes container pods fail with EACESS when using a custom user #3290

Open kwohlfahrt opened 6 months ago

kwohlfahrt commented 6 months ago

Describe the bug I am running the actions-runner-controller to host GitHub workflows. When I combine this with a container action where the container user is neither root nor the same as the runner user (UID 1001), the run fails with EACCES: permission denied, open '/__w/_temp/_runner_file_commands/set_env_8e7dea0f-bec9-4fd6-9b11-824b0bb16a6c'.

The issue persists, even if I set fsGroup: 1001 on both the runner and the workflow container. This is because the runner pre-creates the output files with -rw-r--r-- permissions, so group membership is insufficient for writes:

$ ls -l /home/runner/_work/_temp/_runner_file_commands:
total 0
-rw-r--r-- 1 runner runner 0 May 14 17:23 add_path_3da96eb5-2ed4-41a6-b402-c9f1be15a554
-rw-r--r-- 1 runner runner 0 May 14 17:23 add_path_8e7dea0f-bec9-4fd6-9b11-824b0bb16a6c
-rw-r--r-- 1 runner runner 0 May 14 17:23 save_state_3da96eb5-2ed4-41a6-b402-c9f1be15a554
-rw-r--r-- 1 runner runner 0 May 14 17:23 save_state_8e7dea0f-bec9-4fd6-9b11-824b0bb16a6c
-rw-r--r-- 1 runner runner 0 May 14 17:23 set_env_3da96eb5-2ed4-41a6-b402-c9f1be15a554
-rw-r--r-- 1 runner runner 0 May 14 17:23 set_env_8e7dea0f-bec9-4fd6-9b11-824b0bb16a6c
-rw-r--r-- 1 runner runner 0 May 14 17:23 set_output_3da96eb5-2ed4-41a6-b402-c9f1be15a554
-rw-r--r-- 1 runner runner 0 May 14 17:23 set_output_8e7dea0f-bec9-4fd6-9b11-824b0bb16a6c
-rw-r--r-- 1 runner runner 0 May 14 17:23 step_summary_3da96eb5-2ed4-41a6-b402-c9f1be15a554
-rw-r--r-- 1 runner runner 0 May 14 17:23 step_summary_8e7dea0f-bec9-4fd6-9b11-824b0bb16a6c

If I set runAsUser: 1001 on the workflow container, the run gets further, but eventually (expectedly) fails because our image assumes the runtime user is the same as the user the image was built with.

To Reproduce

  1. Deploy the runner controller
  2. Deploy a runner scale-set, using the kubernetes containerMode. Configure spec.securityContext.fsGroup: 1001: a. On the runner, using the template property of the Helm chart b. On the worker, using ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
  3. Launch a workflow in a custom container, that specifies a user that is not root, and not 1001

The full installation manifests are included at the end of this report.

Expected behavior

I expect the workflow container to be able to write its output, if it has the same fsGroup as the runner container. I think the best solution is for the runner container to add group write permissions to the output files it creates.

Runner Version and Platform

Version of your runner? 2.315.0

OS of the machine running the runner? Linux (Ubuntu 22.04) + Kubernetes

What's not working?

Container workflows cannot write their output as expected, if the container sets a custom user.

Job Log Output

Controller and runner pod logs can be found here: https://gist.github.com/kwohlfahrt/1d45d62aa963e4a4eec2ca6b04c2cc19

Runner values.yaml:

containerMode:
  kubernetesModeWorkVolumeClaim:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 14Gi
    storageClassName: exclusive
  type: kubernetes
controllerServiceAccount:
  name: actions-runner-system-d3d990a5
  namespace: actions-runner-system-be6fdde6
githubConfigSecret: actions-runner-81cb830f
githubConfigUrl: https://github.com/CHARM-Tx
maxRunners: 3
minRunners: 1
template:
  spec:
    containers:
    - command:
      - /home/runner/run.sh
      env:
      - name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
        value: "false"
      - name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
        value: /home/runner/templates/worker.yaml
      image: <snip>.dkr.ecr.eu-central-1.amazonaws.com/github-runner:2.315.0
      name: runner
      resources:
        limits:
          cpu: "1"
      volumeMounts:
      - mountPath: /home/runner/templates
        name: templates
    securityContext:
      fsGroup: 1001
    volumes:
    - configMap:
        name: templates-3892142c
      name: templates

templates ConfigMap:

apiVersion: v1
data:
  worker.yaml: '{"spec":{"securityContext":{"fsGroup":1001}}}'
kind: ConfigMap
metadata:
  name: templates-3892142c
  namespace: actions-runner-66769bad
kwohlfahrt commented 6 months ago

I had previously filed this in the wrong repo, in actions/actions-runner-controller#3517.

gdubicki commented 5 months ago

In case someone didn't catch this, the workaround for this problem is to force the non-root user that you use in your Docker image to have UID 1001.

kwohlfahrt commented 5 months ago

Unfortunately, I don't think we can apply this workaround. We don't have control over this step of the build, as the base image is from a vendor (which sets up the home directory with some configuration files necessary to make the software run), so we can't set the UID during the build.

Overriding the UID at runtime then also fails, because the permissions associated with the files don't apply to the new UID, as described in the issue. The only thing we can inject is fsGroup, but that doesn't allow the worker to read the GitHub actions files, hence the issue.

We could probably do some recursive chown as part of our build steps, but that's starting to get into quite hacky territory.