actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.74k stars 1.12k forks source link

ARC Runners Not Being Created in EKS Cluster #3810

Open omerap12 opened 3 hours ago

omerap12 commented 3 hours ago

Checks

Controller Version

ghcr.io/actions/gha-runner-scale-set-controller:0.9.3

Deployment Method

Helm

Checks

To Reproduce

1. Install the contoller using helm.

NAME    NAMESPACE   REVISION    UPDATED                                 STATUS      CHART                                   APP VERSION
arc     arc-systems 1           2024-07-02 12:48:26.029989 +0300 IDT    deployed    gha-runner-scale-set-controller-0.9.3   0.9.3   
  1. Install simple runner ( values file are attached ).
    NAME                    NAMESPACE   REVISION    UPDATED                                 STATUS      CHART                       APP VERSION
    arc-runner-set-mis      arc-runners 1           2024-11-15 10:04:35.538126 +0200 IST    deployed    gha-runner-scale-set-0.9.3  0.9.3

Describe the bug

We have installed arc runner set in our eks cluster:

 ~/ k get autoscalingrunnersets.actions.github.com -n arc-runners mis                                                                                                                                                           
NAME   MINIMUM RUNNERS   MAXIMUM RUNNERS   CURRENT RUNNERS   STATE   PENDING RUNNERS   RUNNING RUNNERS   FINISHED RUNNERS   DELETING RUNNERS
mis    4                 5                 5                         4                                                      

as you can see the CURRENT RUNNERS is 5. However, when I check the pods, I get the following:

 ~/ k get pods -n arc-runners                                                                                                                                                                                                 
No resources found in arc-runners namespace.

This indicates that the minimum runner count isn't being respected, and runners aren't being created in response to GitHub hooks.

What's puzzling is that I have another runner set, using the same values file (with only a different githubConfigUrl and runnerScaleSetName), and it's working perfectly fine.

Describe the expected behavior

Runners should be created.

Additional Context

USER-SUPPLIED VALUES:
githubConfigSecret: pre-defined-secret-mis
githubConfigUrl: https://github.com/Company-MIS
maxRunners: 5
minRunners: 0
requests:
  cpu: 500m
  memory: 1Gi
runnerScaleSetName: mis
template:
  spec:
    containers:
    - command:
      - /home/runner/run.sh
      env:
      - name: DOCKER_HOST
        value: unix:///var/run/docker.sock
      image: company/gh-actions-runner:latest
      imagePullPolicy: IfNotPresent
      name: runner
      resources:
        requests:
          cpu: 1400m
          memory: 1Gi
      volumeMounts:
      - mountPath: /home/runner/_work
        name: work
      - mountPath: /var/run
        name: dind-sock
      - mountPath: /home/runner/.docker
        name: docker
    imagePullSecrets:
    - name: regsecret
    initContainers:
    - command:
      - cp
      - /home/runner/.docker2/config.json
      - /home/runner/.docker/config.json
      image: ghcr.io/actions/actions-runner:latest
      name: custom-init
      volumeMounts:
      - mountPath: /home/runner/.docker
        name: docker
      - mountPath: /home/runner/.docker2/config.json
        name: regsecret
        subPath: config.json
    - args:
      - dockerd
      - --host=unix:///var/run/docker.sock
      - --group=$(DOCKER_GROUP_GID)
      env:
      - name: DOCKER_GROUP_GID
        value: "123"
      image: docker:dind
      name: dind
      restartPolicy: OnFailure
      securityContext:
        privileged: true
      volumeMounts:
      - mountPath: /home/runner/_work
        name: work
      - mountPath: /var/run
        name: dind-sock
      - mountPath: /home/runner/externals
        name: dind-externals
      - mountPath: /root/.docker/config.json
        name: regsecret
        subPath: config.json
    - command:
      - cp
      - -r
      - -v
      - /home/runner/externals/.
      - /home/runner/tmpDir/
      image: ghcr.io/actions/actions-runner:latest
      name: init-dind-externals
      restartPolicy: Always
      volumeMounts:
      - mountPath: /home/runner/tmpDir
        name: dind-externals
    volumes:
    - emptyDir: {}
      name: work
    - emptyDir: {}
      name: docker
    - emptyDir: {}
      name: dind-sock
    - emptyDir: {}
      name: dind-externals
    - name: regsecret
      secret:
        items:
        - key: .dockerconfigjson
          path: config.json
        secretName: regsecret

Controller Logs

2024-11-15T08:00:45Z    INFO    EphemeralRunner Cleaning up resources after after ephemeral runner termination  {"version": "0.9.3", "ephemeralrunner": {"name":"mis-r4pfg-runner-rxkjl","namespace":"arc-runners"}, "phase": "Failed"}
2024-11-15T08:00:45Z    INFO    EphemeralRunner Cleaning up the runner pod  {"version": "0.9.3", "ephemeralrunner": {"name":"mis-r4pfg-runner-rxkjl","namespace":"arc-runners"}}
2024-11-15T08:00:45Z    INFO    EphemeralRunner Pod is deleted  {"version": "0.9.3", "ephemeralrunner": {"name":"mis-r4pfg-runner-rxkjl","namespace":"arc-runners"}}
2024-11-15T08:00:45Z    INFO    EphemeralRunner Cleaning up the runner jitconfig secret {"version": "0.9.3", "ephemeralrunner": {"name":"mis-r4pfg-runner-rxkjl","namespace":"arc-runners"}}
2024-11-15T08:00:45Z    INFO    EphemeralRunnerSet  Ephemeral runner counts {"version": "0.9.3", "ephemeralrunnerset": {"name":"mis-r4pfg","namespace":"arc-runners"}, "pending": 0, "running": 0, "finished": 0, "failed": 2, "deleting": 0}
2024-11-15T08:00:45Z    INFO    EphemeralRunnerSet  Scaling comparison  {"version": "0.9.3", "ephemeralrunnerset": {"name":"mis-r4pfg","namespace":"arc-runners"}, "current": 2, "desired": 0}
2024-11-15T08:00:45Z    INFO    EphemeralRunnerSet  Deleting ephemeral runners (scale down) {"version": "0.9.3", "ephemeralrunnerset": {"name":"mis-r4pfg","namespace":"arc-runners"}, "count": 2}
2024-11-15T08:00:45Z    INFO    EphemeralRunnerSet  No pending or running ephemeral runners running at this time for scale down {"version": "0.9.3", "ephemeralrunnerset": {"name":"mis-r4pfg","namespace":"arc-runners"}}
2024-11-15T08:00:45Z    INFO    EphemeralRunner Secret is deleted   {"version": "0.9.3", "ephemeralrunner": {"name":"mis-r4pfg-runner-rxkjl","namespace":"arc-runners"}}
2024-11-15T08:00:45Z    INFO    EphemeralRunner EphemeralRunner has already finished. Stopping reconciliation and waiting for EphemeralRunnerSet to clean it up {"version": "0.9.3", "ephemeralrunner": {"name":"mis-r4pfg-runner-rxkjl","namespace":"arc-runners"}, "phase": "Failed"}
2024-11-15T08:00:45Z    INFO    EphemeralRunner Cleaning up resources after after ephemeral runner termination  {"version": "0.9.3", "ephemeralrunner": {"name":"mis-r4pfg-runner-tsw5z","namespace":"arc-runners"}, "phase": "Failed"}
2024-11-15T08:00:45Z    INFO    EphemeralRunner Cleaning up the runner pod  {"version": "0.9.3", "ephemeralrunner": {"name":"mis-r4pfg-runner-tsw5z","namespace":"arc-runners"}}
2024-11-15T08:00:45Z    INFO    EphemeralRunner Pod is deleted  {"version": "0.9.3", "ephemeralrunner": {"name":"mis-r4pfg-runner-tsw5z","namespace":"arc-runners"}}
2024-11-15T08:00:45Z    INFO    EphemeralRunner Cleaning up the runner jitconfig secret {"version": "0.9.3", "ephemeralrunner": {"name":"mis-r4pfg-runner-tsw5z","namespace":"arc-runners"}}
2024-11-15T08:00:45Z    INFO    EphemeralRunner Secret is deleted   {"version": "0.9.3", "ephemeralrunner": {"name":"mis-r4pfg-runner-tsw5z","namespace":"arc-runners"}}
2024-11-15T08:00:45Z    INFO    EphemeralRunner EphemeralRunner has already finished. Stopping reconciliation and waiting for EphemeralRunnerSet to clean it up {"version": "0.9.3", "ephemeralrunner": {"name":"mis-r4pfg-runner-tsw5z","namespace":"arc-runners"}, "phase": "Failed"}
2024-11-15T08:00:46Z    INFO    AutoscalingRunnerSet    Find existing ephemeral runner set  {"version": "0.9.3", "autoscalingrunnerset": {"name":"mis","namespace":"arc-runners"}, "name": "mis-r4pfg", "specHash": "58997d4cf6"}
2024-11-15T08:02:25Z    INFO    AutoscalingListener Listener pod is terminated  {"version": "0.9.3", "autoscalinglistener": {"name":"mis-754b578d-listener","namespace":"arc-systems"}, "namespace": "arc-systems", "name": "mis-754b578d-listener", "reason": "Error", "message": ""}
2024-11-15T08:02:26Z    INFO    AutoscalingListener Listener pod is terminated  {"version": "0.9.3", "autoscalinglistener": {"name":"mis-754b578d-listener","namespace":"arc-systems"}, "namespace": "arc-systems", "name": "mis-754b578d-listener", "reason": "Error", "message": ""}
2024-11-15T08:02:26Z    INFO    AutoscalingListener Listener pod is terminated  {"version": "0.9.3", "autoscalinglistener": {"name":"mis-754b578d-listener","namespace":"arc-systems"}, "namespace": "arc-systems", "name": "mis-754b578d-listener", "reason": "Error", "message": ""}
2024-11-15T08:02:26Z    INFO    AutoscalingListener Creating a listener pod {"version": "0.9.3", "autoscalinglistener": {"name":"mis-754b578d-listener","namespace":"arc-systems"}}
2024-11-15T08:19:26Z    INFO    EphemeralRunnerSet  Scaling comparison  {"version": "0.9.3", "ephemeralrunnerset": {"name":"mis-v7mzn","namespace":"arc-runners"}, "current": 5, "desired": 4}
2024-11-15T08:19:26Z    INFO    EphemeralRunnerSet  Deleting ephemeral runners (scale down) {"version": "0.9.3", "ephemeralrunnerset": {"name":"mis-v7mzn","namespace":"arc-runners"}, "count": 1}
2024-11-15T08:19:26Z    INFO    EphemeralRunnerSet  No pending or running ephemeral runners running at this time for scale down {"version": "0.9.3", "ephemeralrunnerset": {"name":"mis-v7mzn","namespace":"arc-runners"}}
2024-11-15T08:19:26Z    INFO    AutoscalingRunnerSet    Find existing ephemeral runner set  {"version": "0.9.3", "autoscalingrunnerset": {"name":"mis","namespace":"arc-runners"}, "name": "mis-v7mzn", "specHash": "58997d4cf6"}

### Runner Pod Logs

```shell
2024-11-15T08:08:22Z    INFO    listener-app    app initialized
2024-11-15T08:08:22Z    INFO    listener-app    Starting listener
2024-11-15T08:08:22Z    INFO    listener-app    refreshing token    {"githubConfigUrl": "https://github.com/Company-MIS"}
2024-11-15T08:08:22Z    INFO    listener-app    getting access token for GitHub App auth    {"accessTokenURL": "https://api.github.com/app/installations/52408526/access_tokens"}
2024-11-15T08:08:22Z    INFO    listener-app    getting runner registration token   {"registrationTokenURL": "https://api.github.com/orgs/Company-MIS/actions/runners/registration-token"}
2024-11-15T08:08:22Z    INFO    listener-app    getting Actions tenant URL and JWT  {"registrationURL": "https://api.github.com/actions/runner-registration"}
2024-11-15T08:08:22Z    INFO    listener-app.listener   Current runner scale set statistics.    {"statistics": "{\"totalAvailableJobs\":0,\"totalAcquiredJobs\":0,\"totalAssignedJobs\":1,\"totalRunningJobs\":0,\"totalRegisteredRunners\":0,\"totalBusyRunners\":0,\"totalIdleRunners\":0}"}
2024-11-15T08:08:22Z    INFO    listener-app.worker.kubernetesworker    Calculated target runner count  {"assigned job": 1, "decision": 5, "min": 4, "max": 5, "currentRunnerCount": 5, "jobsCompleted": 0}
2024-11-15T08:08:22Z    INFO    listener-app.worker.kubernetesworker    Compare {"original": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}", "patch": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":5,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-15T08:08:22Z    INFO    listener-app.worker.kubernetesworker    Preparing EphemeralRunnerSet update {"json": "{\"spec\":{\"patchID\":0,\"replicas\":5}}"}
2024-11-15T08:08:22Z    INFO    listener-app.worker.kubernetesworker    Ephemeral runner set scaled.    {"namespace": "arc-runners", "name": "mis-v7mzn", "replicas": 5}
2024-11-15T08:08:22Z    INFO    listener-app.listener   Getting next message    {"lastMessageID": 0}
2024-11-15T08:28:36Z    INFO    listener-app.worker.kubernetesworker    Calculated target runner count  {"assigned job": 0, "decision": 4, "min": 4, "max": 5, "currentRunnerCount": 4, "jobsCompleted": 0}
2024-11-15T08:28:36Z    INFO    listener-app.worker.kubernetesworker    Compare {"original": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}", "patch": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":4,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-15T08:28:36Z    INFO    listener-app.worker.kubernetesworker    Preparing EphemeralRunnerSet update {"json": "{\"spec\":{\"patchID\":0,\"replicas\":4}}"}
2024-11-15T08:28:36Z    INFO    listener-app.worker.kubernetesworker    Ephemeral runner set scaled.    {"namespace": "arc-runners", "name": "mis-v7mzn", "replicas": 4}
2024-11-15T08:28:36Z    INFO    listener-app.listener   Getting next message    {"lastMessageID": 2}
2024-11-15T08:29:26Z    INFO    listener-app.worker.kubernetesworker    Calculated target runner count  {"assigned job": 0, "decision": 4, "min": 4, "max": 5, "currentRunnerCount": 4, "jobsCompleted": 0}
2024-11-15T08:29:26Z    INFO    listener-app.worker.kubernetesworker    Compare {"original": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}", "patch": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":4,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-15T08:29:26Z    INFO    listener-app.worker.kubernetesworker    Preparing EphemeralRunnerSet update {"json": "{\"spec\":{\"patchID\":0,\"replicas\":4}}"}
2024-11-15T08:29:27Z    INFO    listener-app.worker.kubernetesworker    Ephemeral runner set scaled.    {"namespace": "arc-runners", "name": "mis-v7mzn", "replicas": 4}
2024-11-15T08:29:27Z    INFO    listener-app.listener   Getting next message    {"lastMessageID": 2}
omerap12 commented 3 hours ago

How can I find the reason for the following status?

 ~/ k describe autoscalingrunnersets.actions.github.com -n arc-runners mis
Status:
  Current Runners:            5
  Failed Ephemeral Runners:   1
  Pending Ephemeral Runners:  4