actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.76k stars 1.12k forks source link

GHA Runner scale set pods are not coming up with the desired ENV Variables set (ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE)) #3665

Open kanakaraju17 opened 4 months ago

kanakaraju17 commented 4 months ago

Checks

Controller Version

0.9.3

Deployment Method

Helm

Checks

To Reproduce

markdown 1.Deploying the gha-runner-scale-set by adding the below env variables set to enable resource limits and requests for the workflow pods using the ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE environment variable.

template:
  template:
    spec:
      containers:
      - name: runner
        image: ghcr.io/actions/actions-runner:latest
        command: ["/home/runner/run.sh"]
        env:
          - name: ACTIONS_RUNNER_CONTAINER_HOOKS
            value: /home/runner/k8s/index.js
          - name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
            value: /home/runner/pod-template/content
          - name: ACTIONS_RUNNER_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
            value: "true" 
Additional ENV Variable Added:
          - name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
            value: /home/runner/pod-template/content

The workflow pods should include the ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE environment variable and volume mount but it doesn't show up when describing the pods. Currently, the output is missing this variable.

Steps to Reproduce:

Deploy the gha-runner-scale-set with the above configuration. Describe the workflow pods to check the environment variables.

Expected Result: The ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE environment variable should be present along with the volume mounts in the workflow pods.

Describe the bug

I'm trying to add resource requests and limits for the runner pods by setting the ACTIONS_RUNNER_CONTAINER_HOOKS environment variable. However, the pods are not being updated with the desired values

Below is the values.yaml template used to append the environment variable:

template:
  template:
    spec:
      containers:
      - name: runner
        image: ghcr.io/actions/actions-runner:latest
        command: ["/home/runner/run.sh"]
        env:
          - name: ACTIONS_RUNNER_CONTAINER_HOOKS
            value: /home/runner/k8s/index.js
          - name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
            value: /home/runner/pod-template/content
          - name: ACTIONS_RUNNER_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
            value: "true"
        volumeMounts:
          - name: work
            mountPath: /home/runner/_work
          - name: pod-template
            mountPath: /home/runner/pod-template
            readOnly: true  
      volumes:
        - name: work
          ephemeral:
            volumeClaimTemplate:
              spec:
                accessModes: [ "ReadWriteOnce" ]
                storageClassName: "local-path"
                resources:
                  requests:
                    storage: 1Gi
        - name: pod-template
          configMap:
            name: hook-extension           

Problem: The pods should have the volumes mounted with the config map and the specified environment variables added. However, this is not happening as expected.

Current Output:

When describing the workflow pods, the environment variables and volumes are missing:

Screenshot 2024-07-16 at 4 27 23 PM

While Describing the AutoscalingRunnerSet doesn't show the ENV variables added either.

Name:         arc-runner-kubernetes-ci-arm-large
Namespace:    arc-runners-kubernetes-arm
Labels:       actions.github.com/organization=curefit
              actions.github.com/scale-set-name=arc-runner-kubernetes-ci-arm-large
              actions.github.com/scale-set-namespace=arc-runners-kubernetes-arm
              app.kubernetes.io/component=autoscaling-runner-set
              app.kubernetes.io/instance=arc-runner-kubernetes-ci-arm-large
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=arc-runner-kubernetes-ci-arm-large
              app.kubernetes.io/part-of=gha-rs
              app.kubernetes.io/version=0.9.3
              helm.sh/chart=gha-rs-0.9.3
Annotations:  actions.github.com/cleanup-kubernetes-mode-role-binding-name: arc-runner-kubernetes-ci-arm-large-gha-rs-kube-mode
              actions.github.com/cleanup-kubernetes-mode-role-name: arc-runner-kubernetes-ci-arm-large-gha-rs-kube-mode
              actions.github.com/cleanup-kubernetes-mode-service-account-name: arc-runner-kubernetes-ci-arm-large-gha-rs-kube-mode
              actions.github.com/cleanup-manager-role-binding: arc-runner-kubernetes-ci-arm-large-gha-rs-manager
              actions.github.com/cleanup-manager-role-name: arc-runner-kubernetes-ci-arm-large-gha-rs-manager
              actions.github.com/runner-group-name: arc-runner-kubernetes-ci-arm-large
              actions.github.com/runner-scale-set-name: arc-runner-kubernetes-ci-arm-large
              actions.github.com/values-hash: 8b5caae634d958cc7d295b3166c151d036c7896d2b6165bf908a6a24aec5320
              meta.helm.sh/release-name: arc-runner-set-kubernetes-arm-large
              meta.helm.sh/release-namespace: arc-runners-kubernetes-arm
              runner-scale-set-id: 76
API Version:  actions.github.com/v1alpha1
Kind:         AutoscalingRunnerSet
Metadata:
  Creation Timestamp:  2024-07-16T09:49:56Z
  Finalizers:
    autoscalingrunnerset.actions.github.com/finalizer
  Generation:        1
  Resource Version:  577760766
  UID:               165f74f7-875c-4b8f-a214-96948ec38467
Spec:
  Github Config Secret:  github-token
  Github Config URL:     https://github.com/curefit
  Listener Template:
    Spec:
      Containers:
        Name:  listener
        Resources:
          Limits:
            Cpu:     500m
            Memory:  500Mi
          Requests:
            Cpu:     250m
            Memory:  250Mi
      Node Selector:
        Purpose:  github-actions
      Tolerations:
        Effect:           NoSchedule
        Key:              purpose
        Operator:         Equal
        Value:            github-actions
  Min Runners:            2
  Runner Group:           arc-runner-kubernetes-ci-arm-large
  Runner Scale Set Name:  arc-runner-kubernetes-ci-arm-large
  Template:
    Spec:
      Containers:
        Command:
          /home/runner/run.sh
        Env:
          Name:   ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
          Value:  false
          Name:   ACTIONS_RUNNER_CONTAINER_HOOKS
          Value:  /home/runner/k8s/index.js
          Name:   ACTIONS_RUNNER_POD_NAME
          Value From:
            Field Ref:
              Field Path:  metadata.name
        Image:             ghcr.io/actions/actions-runner:latest
        Name:              runner
        Volume Mounts:
          Mount Path:  /home/runner/_work
          Name:        work
      Node Selector:
        Purpose:       github-actions
      Restart Policy:  Never
      Security Context:
        Fs Group:            1001
      Service Account Name:  arc-runner-kubernetes-ci-arm-large-gha-rs-kube-mode
      Tolerations:
        Effect:    NoSchedule
        Key:       purpose
        Operator:  Equal
        Value:     github-actions
      Volumes:
        Ephemeral:
          Volume Claim Template:
            Spec:
              Access Modes:
                ReadWriteOnce
              Resources:
                Requests:
                  Storage:         5Gi
              Storage Class Name:  gp3
        Name:                      work
Status:
  Current Runners:            2
  Pending Ephemeral Runners:  2
Events:                       <none>

Below is the configmap file which is being used:

apiVersion: v1
kind: ConfigMap
metadata:
  name: hook-extension
  namespace: arc-runners-kubernetes-arm
data:
  content: |
    spec:
      containers:
        - name: "$job"
          resources:
          limits:
            cpu: "3000m"
          requests:
            cpu: "3000m"

Describe the expected behavior

The ENV variable ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE getting added along with the volume mounts along the pods which will come up.

Additional Context

## githubConfigUrl is the GitHub url for where you want to configure runners
## ex: https://github.com/myorg/myrepo or https://github.com/myorg
githubConfigUrl: "https://github.com/"

## githubConfigSecret is the k8s secrets to use when auth with GitHub API.
## You can choose to use GitHub App or a PAT token
githubConfigSecret:
  ### GitHub Apps Configuration
  ## NOTE: IDs MUST be strings, use quotes
  #github_app_id: ""
  #github_app_installation_id: ""
  #github_app_private_key: |

  ### GitHub PAT Configuration
  # github_token: ""
## If you have a pre-define Kubernetes secret in the same namespace the gha-runner-scale-set is going to deploy,
## you can also reference it via `githubConfigSecret: pre-defined-secret`.
## You need to make sure your predefined secret has all the required secret data set properly.
##   For a pre-defined secret using GitHub PAT, the secret needs to be created like this:
##   > kubectl create secret generic pre-defined-secret --namespace=my_namespace --from-literal=github_token='ghp_your_pat'
##   For a pre-defined secret using GitHub App, the secret needs to be created like this:
##   > kubectl create secret generic pre-defined-secret --namespace=my_namespace --from-literal=github_app_id=123456 --from-literal=github_app_installation_id=654321 --from-literal=github_app_private_key='-----BEGIN CERTIFICATE-----*******'
githubConfigSecret: github-token

## proxy can be used to define proxy settings that will be used by the
## controller, the listener and the runner of this scale set.
#
# proxy:
#   http:
#     url: http://proxy.com:1234
#     credentialSecretRef: proxy-auth # a secret with `username` and `password` keys
#   https:
#     url: http://proxy.com:1234
#     credentialSecretRef: proxy-auth # a secret with `username` and `password` keys
#   noProxy:
#     - example.com
#     - example.org

# maxRunners is the max number of runners the autoscaling runner set will scale up to.
# maxRunners: 5

# minRunners is the min number of idle runners. The target number of runners created will be
# calculated as a sum of minRunners and the number of jobs assigned to the scale set.
minRunners: 2

runnerGroup: "arc-runner-kubernetes-ci-arm-large"

# ## name of the runner scale set to create.  Defaults to the helm release name
runnerScaleSetName: "arc-runner-kubernetes-ci-arm-large"

## A self-signed CA certificate for communication with the GitHub server can be
## provided using a config map key selector. If `runnerMountPath` is set, for
## each runner pod ARC will:
## - create a `github-server-tls-cert` volume containing the certificate
##   specified in `certificateFrom`
## - mount that volume on path `runnerMountPath`/{certificate name}
## - set NODE_EXTRA_CA_CERTS environment variable to that same path
## - set RUNNER_UPDATE_CA_CERTS environment variable to "1" (as of version
##   2.303.0 this will instruct the runner to reload certificates on the host)
##
## If any of the above had already been set by the user in the runner pod
## template, ARC will observe those and not overwrite them.
## Example configuration:
#
# githubServerTLS:
#   certificateFrom:
#     configMapKeyRef:
#       name: config-map-name
#       key: ca.crt
#   runnerMountPath: /usr/local/share/ca-certificates/

## Container mode is an object that provides out-of-box configuration
## for dind and kubernetes mode. Template will be modified as documented under the
## template object.
##
## If any customization is required for dind or kubernetes mode, containerMode should remain
## empty, and configuration should be applied to the template.
containerMode:
  type: "kubernetes"  ## type can be set to dind or kubernetes
  ## the following is required when containerMode.type=kubernetes
  kubernetesModeWorkVolumeClaim:
    accessModes: ["ReadWriteOnce"]
    # For local testing, use https://github.com/openebs/dynamic-localpv-provisioner/blob/develop/docs/quickstart.md to provide dynamic provision volume with storageClassName: openebs-hostpath
    storageClassName: "gp3"
    resources:
      requests:
        storage: 5Gi
#   kubernetesModeServiceAccount:
#     annotations:

## listenerTemplate is the PodSpec for each listener Pod
## For reference: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#PodSpec
listenerTemplate:
  spec:
    nodeSelector:
      purpose: github-actions
    tolerations:
      - key: purpose
        operator: Equal
        value: github-actions
        effect: NoSchedule   
    containers:
    # Use this section to append additional configuration to the listener container.
    # If you change the name of the container, the configuration will not be applied to the listener,
    # and it will be treated as a side-car container.
    - name: listener
      resources:
        limits:
          cpu: "500m"
          memory: "500Mi" 
        requests:
          cpu: "250m"
          memory: "250Mi"
      # securityContext:
        # runAsUser: 1000
#     # Use this section to add the configuration of a side-car container.
#     # Comment it out or remove it if you don't need it.
#     # Spec for this container will be applied as is without any modifications.
#     - name: side-car
#       image: example-sidecar

## template is the PodSpec for each runner Pod
## For reference: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#PodSpec
template:
  template:
    spec:
      containers:
      - name: runner
        image: ghcr.io/actions/actions-runner:latest
        command: ["/home/runner/run.sh"]
        env:
          - name: ACTIONS_RUNNER_CONTAINER_HOOKS
            value: /home/runner/k8s/index.js
          - name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
            value: /home/runner/pod-template/content
          - name: ACTIONS_RUNNER_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
            value: "true"
        volumeMounts:
          - name: work
            mountPath: /home/runner/_work
          - name: pod-template
            mountPath: /home/runner/pod-template
            readOnly: true  
      volumes:
        - name: work
          ephemeral:
            volumeClaimTemplate:
              spec:
                accessModes: [ "ReadWriteOnce" ]
                storageClassName: "local-path"
                resources:
                  requests:
                    storage: 1Gi
        - name: pod-template
          configMap:
            name: hook-extension            
  spec:
    securityContext:
      fsGroup: 1001
    containers:
      - name: runner
        image: ghcr.io/actions/actions-runner:latest
        command: ["/home/runner/run.sh"]
        env:
        - name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
          value: "false"
    nodeSelector:
      purpose: github-actions
    tolerations:
      - key: purpose
        operator: Equal
        value: github-actions
        effect: NoSchedule       

## Optional controller service account that needs to have required Role and RoleBinding
## to operate this gha-runner-scale-set installation.
## The helm chart will try to find the controller deployment and its service account at installation time.
## In case the helm chart can't find the right service account, you can explicitly pass in the following value
## to help it finish RoleBinding with the right service account.
## Note: if your controller is installed to only watch a single namespace, you have to pass these values explicitly.
# controllerServiceAccount:
#   namespace: arc-system
#   name: test-arc-gha-runner-scale-set-controller

Controller Logs

sharing the logs below even though it;s not related to controller being un-available.

https://gist.github.com/kanakaraju17/c5cf9efc50bfad1c97662b17533b9ca5

Runner Pod Logs

https://gist.github.com/kanakaraju17/756a724cb48d5d27aacbf6789a940e30