actions / runner

The Runner for GitHub Actions :rocket:
https://github.com/features/actions
MIT License
4.91k stars 965 forks source link

Kubernetes mode: Multi-Attach error for volume XXX Volume is already used by pod(s) RUNNER POD #3325

Open aladdin-atypon opened 5 months ago

aladdin-atypon commented 5 months ago

Describe the bug github runner in kubernetes mode expects kubernetesModeWorkVolumeClaim, the default is accessModes: ["ReadWriteOnce"], and in most of the doc it's always accessModes: ["ReadWriteOnce"].

However, in kubernetes mode, the runner container hook is expected to create new pod, get the volume from the runner pod and use it there, but it does't work for me since the ReadWriteOnce allows just 1 pod to be mounting the pvc, which is the runner pod, I've seen a lot of examples where aws gp3 storage class is used and no one has complained about the issue I'm facing!

I've tried to use EFS, it works fine, but the point is it's X15 slower than ebs, regardless, how the default is ReadWriteOnce and you expect it to work fine although, by definition, ReadWriteOnce doesn't work with more than one pod but the hook actually uses the same PVC of the runner .

To Reproduce Steps to reproduce the behavior:

  1. Use the following template: githubConfigUrl: XXX githubConfigSecret:XXX runnerGroup: XXX template: spec: securityContext: fsGroup: 1001 containers:
    • name: runner image: ghcr.io/actions/actions-runner:latest command: [ "/home/runner/run.sh" ] env:
      • name: ACTIONS_RUNNER_CONTAINER_HOOKS value: /home/runner/k8s/index.js
      • name: ACTIONS_RUNNER_POD_NAME valueFrom: fieldRef: fieldPath: metadata.name
      • name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER value: "false"
      • name: ACTIONS_RUNNER_USE_KUBE_SCHEDULER value: "true" volumeMounts:
      • name: work mountPath: /home/runner/_work volumes:
    • name: work ephemeral: volumeClaimTemplate: spec: accessModes: [ "ReadWriteOnce" ] storageClassName: "gp3" resources: requests: storage: 1Gi
  2. install the arc chart of version 0.9.2, apply any job that creates a contrainer.
  3. The created pod with XXX-workflow suffix fails due to Multi-Attach error for volume "pvc-XXX" Volume is already used by pod(s) XXX-s6jpq

Expected behavior Explain how ReadWriteOnce can be working both both pods at the same time, or change the default doc. so nfs alike FS could be used.

Runner Version and Platform

runner: latest gha-runner-scale-set-0.9.2

Job Log Output

If applicable, include the relevant part of the job / step log output here. All sensitive information should already be masked out, but please double-check before pasting here.