Closed alexgaganashvili closed 5 months ago
Hello! Thank you for filing an issue.
The maintainers will triage your issue shortly.
In the meantime, please take a look at the troubleshooting guide for bug reports.
If this is a feature request, please review our contribution guidelines.
cc: @nikola-jokic
Hey @alexgaganashvili,
The steps you took to upgrade ARC were wrong. Please follow the guide we documented here.
Basically, you need to uninstall every scale set, wait for resources to be cleaned up, uninstall the controller and begin the installation at the target version. I don't understand the way you deployed your application on argocd, but it would be worth mentioning that the controller and the scale set should be managed separately.
Closing this issue now, but feel free to comment on it if you need more information :relaxed:
Thanks, @nikola-jokic . If I deploy ARC in HA mode in two different clusters and later upgrade it in one of the K8s clusters, will the jobs executing on runners of the scaleset being uninstalled be simply aborted?
That is a good question. When you uninstall the scale set, we will start by removing the listener. The second cluster keeps acquiring jobs and will continue to work normally. The cluster where you are removing the scale set from will keep the running ephemeral runners up until they are finished, and it will kill the runners that are not busy. The controller should never abort runners if they are busy.
Thanks for clarifying that, @nikola-jokic. Btw, when I was upgrading ARC and hit that issue the first time, I did uninstall both the runner controller and the scaleset. However, when I installed the new version, I could still observe the same error (in light of this, step 2 Wait for resources cleanup in the upgrade instructions makes me wonder whether I should wait longer for all relevant resources to be cleaned up; one such custom resource would be autoscalingrunnerset?; which seems to sit there for a while until I end up setting its finalizer to an empty array). I then decided to uninstall the CRDs and install them again, even though I diffed the CRDs from the two versions and did not see any changes. But that allowed me to upgrade.
Hey @alexgaganashvili,
Yes, that is a known issue that has been fixed with this PR and will be part of the next release. The controller was slow to react on resource deletion event. This can cause some confusing behavior especially if running in containerMode=kubernetes
. Hopefully, after the next release, this issue will be resolved completely :relaxed:
@nikola-jokic , I have reinstalled CRDs and tried installing version 0.9.3 of ARC. After I installed a scaleset, the runner reported the same error. I even previously deleted the namespaces where I deploy the controller and scaleset.
@nikola-jokic , could this issue be reopned and addressed? I'm running into the same problem with version 0.9.3. Thx.
Checks
Controller Version
0.9.2
Deployment Method
Helm
Checks
To Reproduce
2024-05-31T22:07:05Z ERROR AutoscalingRunnerSet Failed to update autoscaling runner set with finalizer added {"version": "0.9.2", "autoscalingrunnerset": {"name":"my-runner-scaleset","namespace":"my-namespace"}, "error": "autoscalingrunnersets.actions.github.com \"my-runner-scaleset\" not found"} github.com/actions/actions-runner-controller/controllers/actions%2egithub%2ecom.(AutoscalingRunnerSetReconciler).Reconcile github.com/actions/actions-runner-controller/controllers/actions.github.com/autoscalingrunnerset_controller.go:182 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Reconcile sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:119 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:316 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Start.func2.2 sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227 2024-05-31T22:07:05Z ERROR Reconciler error {"controller": "autoscalingrunnerset", "controllerGroup": "actions.github.com", "controllerKind": "AutoscalingRunnerSet", "AutoscalingRunnerSet": {"name":"my-runner-scaleset","namespace":"my-namespace"}, "namespace": "my-namespace", "name": "my-runner-scaleset", "reconcileID": "ac974287-9a8c-4477-bf87-78713c03104d", "error": "autoscalingrunnersets.actions.github.com \"my-runner-scaleset\" not found"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:329 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Start.func2.2 sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227
Describe the bug
See the "To Reproduce" field + no listener or runner pods are created (I have minRunners # set to 1).
Describe the expected behavior
I should be able to transparently upgrade the controller and the scaleset.
Additional Context
Controller Logs
2024-05-31T22:07:05Z ERROR AutoscalingRunnerSet Failed to update autoscaling runner set with finalizer added {"version": "0.9.2", "autoscalingrunnerset": {"name":"my-runner-scaleset","namespace":"my-namespace"}, "error": "autoscalingrunnersets.actions.github.com \"my-runner-scaleset\" not found"} github.com/actions/actions-runner-controller/controllers/actions%2egithub%2ecom.(AutoscalingRunnerSetReconciler).Reconcile github.com/actions/actions-runner-controller/controllers/actions.github.com/autoscalingrunnerset_controller.go:182 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Reconcile sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:119 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:316 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Start.func2.2 sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227 2024-05-31T22:07:05Z ERROR Reconciler error {"controller": "autoscalingrunnerset", "controllerGroup": "actions.github.com", "controllerKind": "AutoscalingRunnerSet", "AutoscalingRunnerSet": {"name":"my-runner-scaleset","namespace":"my-namespace"}, "namespace": "my-namespace", "name": "my-runner-scaleset", "reconcileID": "ac974287-9a8c-4477-bf87-78713c03104d", "error": "autoscalingrunnersets.actions.github.com \"my-runner-scaleset\" not found"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:329 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Start.func2.2 sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227
Runner Pod Logs