actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.6k stars 1.09k forks source link

RunnerDeployment pods in NotReady state after GHA workflow completion #3254

Open gera-aldama opened 7 months ago

gera-aldama commented 7 months ago

Checks

Controller Version

0.27.6

Deployment Method

Helm

Checks

To Reproduce

1. Runners are deployed using `RunnerDeployment` & `HorizontalRunnerAutoscaler`.
2. Runners pick up and execute workflow.
2. The workflow finishes successfully.
3. Pods of the executions get stuck on `NotReady` state.

Describe the bug

It's the same behavior explained on https://github.com/actions/actions-runner-controller/issues/1515. After workflow completion the runners are in Running state, but pods are staying in NotReady. I'm using RunnerDeployment resource, Helm chart version 0.23.7 and ARC 0.27.6 The CRDs were also upgraded.

 kubectl get pods | grep -i notready
runner-deployment-ksfjw-c5nnd           1/2     NotReady   0          4h11m
runner-deployment-ksfjw-ffzbm           1/2     NotReady   0          3h45m
runner-deployment-ksfjw-hc7b5           1/2     NotReady   0          4h11m
runner-deployment-ksfjw-khf6b           1/2     NotReady   0          4h11m
runner-deployment-ksfjw-rnws2           1/2     NotReady   0          3h45m
runner-deployment-ksfjw-w7bln           1/2     NotReady   0          3h51m

Describe the expected behavior

Pods should be terminated after execution.

Additional Context

Pod `yaml` output example:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    actions-runner-controller/token-expires-at: "2024-01-29T13:29:33-06:00"
    actions-runner/id: "1453"
    kubernetes.io/psp: privileged
    sync-time: "2024-01-29T18:29:33Z"
  creationTimestamp: "2024-01-29T18:29:33Z"
  finalizers:
  - actions.summerwind.dev/runner-pod
  labels:
    actions-runner: ""
    actions-runner-controller/inject-registration-token: "true"
    pod-template-hash: f8546db97
    runner-deployment-name: runner-deployment
    runner-template-hash: f7674645d
  name: runner-deployment-ksfjw-c5nnd
  ownerReferences:
  - apiVersion: actions.summerwind.dev/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: Runner
    name: runner-deployment-ksfjw-c5nnd
    uid: 800c2f97-9ce8-4e14-8733-734254047e58
  resourceVersion: "434077453"
  uid: 273eba64-3fa3-4358-b3a2-b770ae4c8ab6
spec:
  containers:
  - env:
    - name: RUNNER_ORG
    - name: RUNNER_REPO
      value: my_repo
    - name: RUNNER_ENTERPRISE
    - name: RUNNER_LABELS
      value: label_1,label_2
    - name: RUNNER_GROUP
    - name: DOCKER_ENABLED
      value: "true"
    - name: DOCKERD_IN_RUNNER
      value: "false"
    - name: GITHUB_URL
      value: https://github.com/
    - name: RUNNER_WORKDIR
      value: /runner/_work
    - name: RUNNER_EPHEMERAL
      value: "true"
    - name: RUNNER_STATUS_UPDATE_HOOK
      value: "false"
    - name: GITHUB_ACTIONS_RUNNER_EXTRA_USER_AGENT
      value: actions-runner-controller/v0.27.6
    - name: DOCKER_HOST
      value: unix:///run/docker.sock
    - name: RUNNER_NAME
      value: runner-deployment-ksfjw-c5nnd
    - name: RUNNER_TOKEN
      value: token
    image: summerwind/actions-runner:latest
    imagePullPolicy: IfNotPresent
    name: runner
    resources: {}
    securityContext: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /runner
      name: runner
    - mountPath: /runner/_work
      name: work
    - mountPath: /run
      name: var-run
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-rnt9m
      readOnly: true
    image: docker:dind
    imagePullPolicy: IfNotPresent
    lifecycle:
      preStop:
        exec:
          command:
          - /bin/sh
          - -c
          - timeout "${RUNNER_GRACEFUL_STOP_TIMEOUT:-15}" /bin/sh -c "echo 'Prestop
            hook started'; while [ -f /runner/.runner ]; do sleep 1; done; echo 'Waiting
            for dockerd to start'; while ! pgrep -x dockerd; do sleep 1; done; echo
            'Prestop hook stopped'" >/proc/1/fd/1 2>&1
    name: docker
    resources: {}
    securityContext:
      privileged: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/docker/certs.d
      name: certs
    - mountPath: /runner
      name: runner
    - mountPath: /run
      name: var-run
    - mountPath: /runner/_work
      name: work
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-rnt9m
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: aia-sdp-gnr-689773
  nodeSelector:
    node-type: gnr
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - emptyDir: {}
    name: runner
  - emptyDir: {}
    name: work
  - emptyDir:
      medium: Memory
      sizeLimit: 1M
    name: var-run
  - emptyDir: {}
    name: certs
  - name: default-token-rnt9m
    projected:
      defaultMode: 420

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2024-01-29T18:29:33Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2024-01-29T18:49:22Z"
    message: 'containers with unready status: [runner]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2024-01-29T18:49:22Z"
    message: 'containers with unready status: [runner]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2024-01-29T18:29:33Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://49fbad503a4e71112578e4d5b5d16e2c4b095145d11548cecc6a755f63218b51
    image: docker:dind
    imageID: docker-pullable://docker@sha256:1dfc375736e448806602211e09a9b1390eb110548dcb839eef374da357ca5f5d
    lastState: {}
    name: docker
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2024-01-29T18:29:35Z"
  - containerID: docker://5170c36d647c50340f579f2fe19afb5b1f80e27f6020686dcd588f89c9097a34
    image: summerwind/actions-runner:latest
    imageID: docker-pullable://summerwind/actions-runner@sha256:4b0eb7ec68aec459ce5d69585675f40a2dd13eb69646fa786ab9809aaf33b75e
    lastState: {}
    name: runner
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: docker://5170c36d647c50340f579f2fe19afb5b1f80e27f6020686dcd588f89c9097a34
        exitCode: 0
        finishedAt: "2024-01-29T18:49:21Z"
        reason: Completed
        startedAt: "2024-01-29T18:29:35Z"
  hostIP: 10.23.152.242
  phase: Running
  podIP: 100.64.6.127
  podIPs:
  - ip: 100.64.6.127
  qosClass: BestEffort
  startTime: "2024-01-29T18:29:33Z"

Controller Logs

kubectl get pods -n actions-runner-system
NAME                                               READY   STATUS    RESTARTS        AGE
actions-runner-controller-74988b64f9-st5rz         2/2     Running   4 (3d17h ago)   3d21h

Runner Pod Logs

kubectl logs runner-deployment-ksfjw-c5nnd
Defaulted container "runner" out of: runner, docker
2024-01-29 18:29:35.235  NOTICE --- Runner init started with pid 7
2024-01-29 18:29:35.245  DEBUG --- Github endpoint URL https://github.com/
2024-01-29 18:29:38.97  DEBUG --- Passing --ephemeral to config.sh to enable the ephemeral runner.
2024-01-29 18:29:38.102  DEBUG --- Configuring the runner.

--------------------------------------------------------------------------------
|        ____ _ _   _   _       _          _        _   _                      |
|       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |
|      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |
|      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |
|       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |
|                                                                              |
|                       Self-hosted runner registration                        |
|                                                                              |
--------------------------------------------------------------------------------

# Authentication

√ Connected to GitHub

# Runner Registration

√ Runner successfully added
√ Runner connection is good

# Runner settings

√ Settings Saved.

2024-01-29 18:29:43.981  DEBUG --- Runner successfully configured.
{
  "agentId": 1453,
  "agentName": "runner-deployment-any-ksfjw-c5nnd",
  "poolId": 1,
  "poolName": "Default",
  "ephemeral": true,
  "serverUrl": "https://pipelinesghubeus26.actions.githubusercontent.com/4k72J58r6zbOt2ltvDDZQpRxMDTHuVQKd5NYXjBRmfUGlMtUVy/",
  "gitHubUrl": "https://github.com/my_repo/runner-deployment",
  "workFolder": "/runner/_work"
2024-01-29 18:29:43.993  DEBUG --- Docker enabled runner detected and Docker daemon wait is enabled
2024-01-29 18:29:43.997  DEBUG --- Waiting until Docker is available or the timeout of 120 seconds is reached
}CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
2024-01-29 18:29:44.43  NOTICE --- WARNING LATEST TAG HAS BEEN DEPRECATED. SEE GITHUB ISSUE FOR DETAILS:
2024-01-29 18:29:44.46  NOTICE --- https://github.com/actions/actions-runner-controller/issues/2056

√ Connected to GitHub

Current runner version: '2.311.0'
2024-01-29 18:29:48Z: Listening for Jobs
Runner update in progress, do not shutdown runner.
Downloading 2.312.0 runner
Waiting for current job finish running.
Generate and execute update script.
Runner will exit shortly for update, should be back online within 10 seconds.
Runner update process finished.
Runner listener exit because of updating, re-launch runner after successful update
Update finished successfully.
Restarting runner...

√ Connected to GitHub

Current runner version: '2.312.0'
2024-01-29 18:30:27Z: Listening for Jobs
2024-01-29 18:30:29Z: Running job: my-job
2024-01-29 18:49:20Z: Job my-job completed with result: Succeeded
√ Removed .credentials
√ Removed .runner
Runner listener exit with 0 return code, stop the service, no retry needed.
Exiting runner...
2024-01-29 18:49:21.336  NOTICE --- Runner init exited. Exiting this process with code 0 so that the container and the pod is GC'ed Kubernetes soon.
github-actions[bot] commented 7 months ago

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.