actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.42k stars 1.04k forks source link

Unable to run in high-availability mode #3382

Closed joshuabaird closed 2 months ago

joshuabaird commented 3 months ago

Checks

Controller Version

0.6.1

Deployment Method

ArgoCD

Checks

To Reproduce

1. Install a new deployment of ARC in a new cluster with a unique `runnerGroup` and the same `runnerScaleSetName` as an existing ARC Deployment
2. See this error in the new ARC deployment's listener log:

platform-utility-754b578d-listener autoscaler 2024-03-25T16:49:04Z      INFO    auto_scaler     unable to create message session. Will try again in 30 seconds  {"error": "409 - had issue communicating with Actions backend: The runner scale set platform-utility already has an active session for owner platform-utility-754b578d-listener."}

In this log from the new ARC instance, "platform-utility-754b578d-listener" is the name of the listener on the old instance.

Describe the bug

I'm trying to deploy an HA instance of ARC in a new cluster following these instructions.

My first instance of ARC scalesets (which is already running) uses this config:

runnerGroup: platform-use1-utility-01
runnerScaleSetName: platform-utility

My second instance of ARC scalesets uses this config:

runnerGroup: platform-use1-sandbox-01
runnerScaleSetName: platform-utility

As described in the docs, they have the same runnerScaleSetName but different runnerGroups.

In this configuration, the listener on the new/second instance of ARC fails to start with the following error:

2024-03-25T16:52:01Z      INFO    auto_scaler     unable to create message session. Will try again in 30 seconds  {"error": "409 - had issue communicating with Actions backend: The runner scale set platform-utility already has an active session for owner platform-utility-754b578d-listener."}

platform-utility-754b578d-listener is the name of the listener on the first instance of ARC. It seems the two can not co-exist with each other.

What am I doing incorrectly here?

Describe the expected behavior

The listener should start on the second instance of ARC and GH should distribute jobs across the first and second instances.

Additional Context

N/A

Controller Logs

N/A

Runner Pod Logs

N/A
joshuabaird commented 3 months ago

I just noticed that the second instance of ARC just came online (no changes were made) -- but now both runners are configured in the same runner group -- even though the second instance of ARC is configured to be in a separate runner group (platform-use1-sandbox-01):

image

In this configuration (runner groups are apparently the same) -- jobs seem to be getting distributed across both workers. This contradicts what is in the documentation -- so I'm a bit confused.

nikola-jokic commented 3 months ago

Hey @joshuabaird,

I'm failing to reproduce the issue. Can you please post the output of the kubectl get autoscalingrunnersets -n $NS -o yaml for both of your scale sets? I created 2 kind clusters, applied controllers on both of them and installed scale sets with different runner groups. image image

nikola-jokic commented 2 months ago

Closing this one until we hear back from you :relaxed: Please let us know if this issue is resolved.