Closed katarzynainit closed 6 months ago
Hello! Thank you for filing an issue.
The maintainers will triage your issue shortly.
In the meantime, please take a look at the troubleshooting guide for bug reports.
If this is a feature request, please review our contribution guidelines.
Hey @katarzynainit,
Can you please show the controller values.yaml file, so I can try to reproduce this issue.
Hi, we are using forked arc-controller - code changes relate to skipping controller and listeners SA and RBAC creation based on three flags. Code related to processing ephemeral runners is unchanged vs 0.9.0.
https://gist.github.com/katarzynainit/d9e6ed4d3c6b95e929d73e2b1e8f7cc1 (flags for internal changes are marked in the values)
We started to observe this issue on faster cluster, we didn't see them before (the same configuration, but different and slower cluster).
It also happens from time to time only, so might be difficult to observe.
Checks
Controller Version
0.9.0
Deployment Method
Helm
Checks
To Reproduce
Describe the bug
In controller logs I see that it already "Found the runner with the same name" - it looks like the controller is performing reconcile twice for the same ephemeralrunner in almost the same time, the second run "removes" runner and makes it hung.
The runner is eventually not created, and the ephemeral runner gets to stage Succeeded and stuck until workflow is cancelled.
We started to observe this behavior when we moved to faster cluster.
Describe the expected behavior
The controller should create runner always on ephemeral runner creation.
Additional Context
Controller Logs
Runner Pod Logs