actions / runner-container-hooks

Runner Container Hooks for GitHub Actions
MIT License
76 stars 46 forks source link

Workflow Pods Fail Immediately When ENV Variable ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE is Enabled #174

Closed kanakaraju17 closed 3 months ago

kanakaraju17 commented 3 months ago

Hey, I'm trying to deploy the GHA runner scale set with the ENV variable ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE This variable is used to add resource limits and requests for the workflow pods as the container hooks which is mounted as a config map.

Here is the configuration where the runner scale set is deployed with the specified settings:

## template is the PodSpec for each runner Pod
## For reference: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#PodSpec
template:
  template:
    spec:
      containers:
      - name: runner
        image: ghcr.io/actions/actions-runner:latest
        command: ["/home/runner/run.sh"]
        env:
          - name: ACTIONS_RUNNER_CONTAINER_HOOKS
            value: /home/runner/k8s/index.js
          - name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
            value: /etc/config/runner-template.yaml
          - name: ACTIONS_RUNNER_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
            value: "true"
        volumeMounts:
          - name: work
            mountPath: /home/runner/_work
          - mountPath: /etc/config
            name: hook-template
      volumes:
        - name: hook-template
          configMap:
            name: runner-config
        - name: work
          ephemeral:
            volumeClaimTemplate:
              spec:
                accessModes: [ "ReadWriteOnce" ]
                storageClassName: "local-path"
                resources:
                  requests:
                    storage: 1Gi          
  spec:
    securityContext:
      fsGroup: 1001
    containers:
      - name: runner
        image: ghcr.io/actions/actions-runner:latest
        command: ["/home/runner/run.sh"]
        env:
        - name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
          value: "false"
    nodeSelector:
      purpose: github-actions
    tolerations:
      - key: purpose
        operator: Equal
        value: github-actions
        effect: NoSchedule       

I added an ENV variable to enable resource requests and limits using the container hooks:

          - name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
            value: /etc/config/runner-template.yaml

However, when the workflow runs, the pods fail immediately, entering a terminating state with the following error:

Screenshot 2024-07-19 at 1 00 31 PM

The error in the workflow section shows this without any proper error message:

Screenshot 2024-07-19 at 1 02 08 PM

Here is the GitHub workflow file I'm trying to run:

name: Build Image and Trigger Webhook

jobs:
  arm-build:
    runs-on: [test-runners]
    container:
      image: gcr.io/kaniko-project/executor:debug 
    permissions:
      contents: read
      packages: write

    steps:
      - name: sleep
        run: |
          sleep 100000