Open aladdin-atypon opened 3 weeks ago
I have started lately to experience this issue as well. It is not clear why it has just started recently to appear. Before I was not present.
The way to solve this is to make sure that the pods spun by the k8s job (in kube mode) are co-located in the same k8s node where the pod running the main workflow gh job exists. in this case the issue is gone because ReadWriteOnce
mode set on the PVC will work.
github runner in kubernetes mode expects kubernetesModeWorkVolumeClaim, the default is accessModes: ["ReadWriteOnce"], and in most of the doc it's always accessModes: ["ReadWriteOnce"].
However, in kubernetes mode, the runner container hook is expected to create new pod, get the volume from the runner pod and use it there, but it does't work since the ReadWriteOnce allows just 1 pod to be mounting the pvc, which is the runner pod, I've seen a lot of examples where aws gp3 storage class is used and no one has complained about the issue I'm facing!
I've tried to use EFS, it works fine, but the point is it's X15 slower than ebs, regardless, how the default is ReadWriteOnce and you expect it to work fine although, by definition, ReadWriteOnce doesn't work with more than one pod but the hook actually uses the same PVC of the runner in https://github.com/actions/runner-container-hooks/blob/9705deeb083452f326cb790231645a2618955bfa/packages/k8s/src/k8s/index.ts#L103