actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.7k stars 1.11k forks source link

Listener pod failing after scale-set upgrade #3726

Open albertollamaso opened 2 months ago

albertollamaso commented 2 months ago

Checks

Controller Version

2.318.0

Deployment Method

Helm

Checks

To Reproduce

1. Upgrade `gha-runner-scale-set` from any version to another, example: 2.317.0 -> 2.318.0
2. Check logs of the listener pod, example:

kubectl logs -f self-hosted-hide-7ff847bf-listener

Logs:

2024-08-28T09:43:33Z    INFO    listener-app.listener   Current runner scale set statistics.    {"statistics": "{\"totalAvailableJobs\":0,\"totalAcquiredJobs\":1,\"totalAssignedJobs\":1,\"totalRunningJobs\":0,\"totalRegisteredRunners\":0,\"totalBusyRunners\":0,\"totalIdleRunners\":0}"}
2024-08-28T09:43:33Z    INFO    listener-app.worker.kubernetesworker    Calculated target runner count  {"assigned job": 1, "decision": 1, "min": 0, "max": 5, "currentRunnerCount": 1, "jobsCompleted": 0}
2024-08-28T09:43:33Z    INFO    listener-app.worker.kubernetesworker    Compare {"original": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}", "patch": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":1,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-08-28T09:43:33Z    INFO    listener-app.worker.kubernetesworker    Preparing EphemeralRunnerSet update {"json": "{\"spec\":{\"patchID\":0,\"replicas\":1}}"}
2024-08-28T09:43:33Z    INFO    listener-app.listener   Deleting message session
2024/08/28 09:43:34 Application returned an error: handling initial message failed: could not patch ephemeral runner set , patch JSON: {"spec":{"patchID":0,"replicas":1}}, error: ephemeralrunnersets.actions.github.com "self-hosted-hide-rhtjx" not found

### Describe the bug

It looks like that the listener is looking for a `ephemeralrunnersets` that does not exist. Checking the properties of CRD `autoscalinglisteners` I could confirm that this resource is tied to the `ephemeralrunnersets.actions.github.com "self-hosted-hide-rhtjx"`

kubectl describe autoscalinglisteners self-hosted-hide-7ff847bf-listener -n github-self-hosted-runners

Name: self-hosted-hide-7ff847bf-listener Namespace: github-self-hosted-runners Labels: actions.github.com/organization=hidehide actions.github.com/scale-set-name=self-hosted-hide actions.github.com/scale-set-namespace=github-self-hosted-scale-set app.kubernetes.io/component=runner-scale-set-listener app.kubernetes.io/instance=self-hosted-hide app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=self-hosted-hide app.kubernetes.io/part-of=gha-runner-scale-set app.kubernetes.io/version=0.9.3 helm.sh/chart=gha-rs-0.9.3 ...

Ephemeral Runner Set Name: self-hosted-hide-rhtjx


Currently to fix the issue I have to delete the `autoscalinglisteners` every time I upgrade a version.

kubectl delete autoscalinglisteners self-hosted-appsupport-7ff847bf-listener


### Describe the expected behavior

The listener does not fail after a version upgrade of the scale-set

### Additional Context

```yaml
n/a

Controller Logs

2024-08-28T09:14:16Z    INFO    AutoscalingListener Listener pod is terminated  {"version": "0.9.3", "autoscalinglistener": {"name":"self-hosted-hide-7ff847bf-listener","namespace":"github-self-hosted-runners"}, "namespace": "github-self-hosted-runners", "name": "self-hosted-hide-7ff847bf-listener", "reason": "Error", "message": ""}
2024-08-28T09:14:17Z    INFO    AutoscalingListener Listener pod is terminated  {"version": "0.9.3", "autoscalinglistener": {"name":"self-hosted-hide-7ff847bf-listener","namespace":"github-self-hosted-runners"}, "namespace": "github-self-hosted-runners", "name": "self-hosted-hide-7ff847bf-listener", "reason": "Error", "message": ""}
2024-08-28T09:14:18Z    INFO    AutoscalingListener Listener pod is terminated  {"version": "0.9.3", "autoscalinglistener": {"name":"self-hosted-hide-7ff847bf-listener","namespace":"github-self-hosted-runners"}, "namespace": "github-self-hosted-runners", "name": "self-hosted-hide-7ff847bf-listener", "reason": "Error", "message": ""}

Runner Pod Logs

it actually does not start any runner due the listener crashing
github-actions[bot] commented 2 months ago

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.