actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.62k stars 1.1k forks source link

Listener pod names conflict when using the same runnerScaleSetName in multiple orgs. #3587

Open akhilp6 opened 3 months ago

akhilp6 commented 3 months ago

Checks

Controller Version

0.9.0

Deployment Method

Helm

Checks

To Reproduce

gha-scale-set-controller installed with helm chart defaults in action-runner-system namespace. gha-runner-scale-set chart installed with for org1 in runners namespace with values.yaml:

githubConfigUrl: 'https://github.com/org1'
runnerGroup: staging
runnerScaleSetName: staging-centos7

gha-runner-scale-set chart installed with for org2 in runners namespace with values.yaml:

githubConfigUrl: 'https://github.com/org2'
runnerGroup: staging
runnerScaleSetName: staging-centos7

Describe the bug

The controller creates AutoscalingListener/staging-centos7-7c5bdbdc-listener in the local action-runner-system namespace for org1 but this gets stuck in a error/crash loop for the org2 listener as it creates it with the same name. I believe due to a continuous conflict in which org its trying to register the listener.

listener name is computed here. it basically takes in hash of the namespace, since both the scalesets are deployed in runner namespace, it basically gets the same name. Listener pod logs https://gist.github.com/akhilp6/9f22d2da2da9a52ed4b02e9aeb8ae0d1

This happens even when we have multiple cluster which different controllers and we try to deploy scalesets with same name, it basically creates on one cluster and then throws an error on the other cluster. Application returned an error: createSession failed: failed to create session: 409 - had issue communicating with Actions backend: The runner scale set staging-centos7 already has an active session for owner

We probably can workaround with having these scalesets creating in different namespace, but we are having some issue with this approach, one issue is we need to copy the secrets in all the namespace which we create(which we would like to avoid)

Describe the expected behavior

We would like to have "Adding org to the AutoscalingListener object name" or "placing the listener in the runner namespace instead of the local namespace"

Additional Context

org1 values.yaml

githubConfigUrl: 'https://github.com/org1'
runnerGroup: staging
runnerScaleSetName: staging-centos7

org2 values.yaml

githubConfigUrl: 'https://github.com/org2'
runnerGroup: staging
runnerScaleSetName: staging-centos7

Controller Logs

https://gist.github.com/akhilp6/6d759fa1f1bd26b427e8c3c225dc0e73

Runner Pod Logs

N/A
github-actions[bot] commented 3 months ago

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

aarondstone commented 2 months ago

Hi team! LinkedIn is asking for an update on where things stand with this issue.

Thanks!

maeghan-porter commented 2 weeks ago

I'll throw my hat in and say I'm also having an issue that this is the root cause of (namespace as the hash in the name). We don't have two listeners, but we do canary releases of our cluster that hosts the runners so when the new cluster comes up the listener is named the exact same and so won't register with github due to the name conflict. Making this hash unique would be ideal.