Closed rwlove closed 3 months ago
Hello! Thank you for filing an issue.
The maintainers will triage your issue shortly.
In the meantime, please take a look at the troubleshooting guide for bug reports.
If this is a feature request, please review our contribution guidelines.
CRDs wouldn't delete, but I tried upgrading to 0.9.0 anyway, listener wouldn't start, so I tried deleting again, CRDs still stuck. This is why the timestamp for the CRDs is today.
How can I clean this mess up?
Hey @rwlove,
Usually, CRDs will be left hanging if there exists a resource that is described by that CRD. Could you please inspect if there are resources left in the cluster? Most likely, there exists an ephemeral runner with the finalizer that is interrupted during the cleanup.
@nikola-jokic I'm unsure how to list the instantiated resources in a CRD.
I guess it's this stuff:
Names:
Kind: AutoscalingRunnerSet
List Kind: AutoscalingRunnerSetList
Plural: autoscalingrunnersets
Singular: autoscalingrunnerset
Scope: Namespaced
➜ home-ops git:(main) ✗ kubectl -n dev get AutoscalingRunnerSet
NAME MINIMUM RUNNERS MAXIMUM RUNNERS CURRENT RUNNERS STATE PENDING RUNNERS RUNNING RUNNERS FINISHED RUNNERS DELETING RUNNERS
arc-runner-set-home-ops 1 6 1 0 1
➜ home-ops git:(main) ✗ kubectl -n dev delete AutoscalingRunnerSet arc-runner-set-home-ops
autoscalingrunnerset.actions.github.com "arc-runner-set-home-ops" deleted (HANGS)
kubectl -n dev delete ephemeralrunnersets.actions.github.com arc-runner-set-home-ops-xxkww
ephemeralrunnerset.actions.github.com "arc-runner-set-home-ops-xxkww" deleted (HANGS)
➜ home-ops git:(main) ✗ kubectl -n dev edit ephemeralrunnersets.actions.github.com arc-runner-set-home-ops-xxkww
error: ephemeralrunnersets.actions.github.com "arc-runner-set-home-ops-xxkww" is invalid
A copy of your changes has been stored to "/tmp/kubectl-edit-2170329029.yaml"
error: Edit cancelled, no valid changes were saved.
If I remove the finalizer, save/exit the editor, it re-opens the editor. Save/exit again and it gives the above error.
I don't actually see any ephemeralrunners
➜ home-ops git:(main) ✗ kubectl get ephemeralrunners -A
error: the server doesn't have a resource type "ephemeralrunners"
however,
➜ home-ops git:(main) ✗ kubectl -n dev get ephemeralrunnersets.actions.github.com
NAME DESIREDREPLICAS CURRENTREPLICAS PENDING RUNNERS RUNNING RUNNERS FINISHED RUNNERS DELETING RUNNERS
arc-runner-set-home-ops-xxkww 1 1 0 1
Should I remove the runner and set from the GH Settings UI?
I think you would have to patch it. The problem is that you removed the controller before it was able to finalize things. So try something like: kubectl patch $RESOURCE -n $NS -p '{"metadata":{"finalizers":[]}}' --type=merge
.
The cluster is in a really bad state right now and cleaning it up is going to be difficult. You may try re-installing everything, and then issue a delete request for each scale set, but I can't say for sure if it is going to work with this many manual interventions...
Removing them from the GitHub UI would not solve it on the cluster level... You still have to clean up resources from your cluster.
@nikola-jokic ack, I was worried I'd have to rebuild, but thanks for your pointers.
I got rid of the ephemeralrunnerset with this command.
➜ home-ops git:(main) ✗ kubectl patch ephemeralrunnersets.actions.github.com arc-runner-set-home-ops-xxkww -n dev -p '{"metadata":{"finalizers":[]}, "spec":{"patchID":1}}' --type=merge
ephemeralrunnerset.actions.github.com/arc-runner-set-home-ops-xxkww patched
I was able to edit away the AutoscalingRunnerSet arc-runner-set-home-ops
I just ensured all the CRDs are deleted with kubectl delete -f <4x CRD yaml files>
I guess now I'll try to create again.
Every 2.0s: kubectl -n dev get all rover: Mon Apr 1 13:05:43 2024
NAME READY STATUS RESTARTS AGE
pod/actions-runner-controller-gha-rs-controller-f4d5889c7-6hn4l 1/1 Running 0 38s
pod/arc-runner-set-home-ops-79cf85d9-listener 1/1 Running 0 19s
pod/arc-runner-set-home-ops-t6xjg-runner-sq5xx 2/2 Running 0 17s
pod/arc-runner-set-home-ops-t6xjg-runner-ttzpt 2/2 Running 0 9s
Thanks a ton, @nikola-jokic!
No problem! I'm glad you resolved it :relaxed: !
Hi @rwlove @nikola-jokic , I am also encountering the exact issue. After running the kubectl patch command, I am running into another issue, The controller deployed in new namespace still trying to connect to the old listener which was running before upgrade in old namespace. How to remove these old listener scaleset references from controller so it create new one from the code ?
I had to create a false patchID
kubectl patch ephemeralrunnersets.actions.github.com arc-runner-set-home-ops-xxkww -n dev -p '{"metadata":{"finalizers":[]}, "spec":{"patchID":1}}' --type=merge
@rwlove , that worked for me as well. Now I am running into new issue.
Before knowing this false patchID approach, yesterday I tried to rebuild everything in new namespace on same cluster. Now I am running into the issue where the controller deployed in new namespace still trying to connect to the scaleset listener which were in old ns.
kubectl -n new-ns get ephemeralrunnersets.actions.github.com
No resources found in new-ns namespace.
kubectl -n old-ns get ephemeralrunnersets.actions.github.com
No resources found in old-ns namespace.
I'm not sure about your specific error, I'm just a user and that's a bit outside of my understanding.
What I would say, and maybe this is already obvious to you, is that you need to make sure you can delete all four of the CRD files before doing the install.
Checks
Controller Version
0.8.3
Deployment Method
Helm
Checks
To Reproduce
Describe the bug
Can't remove CRDs to upgrade.
Describe the expected behavior
CRDs remove cleanly.
Additional Context
Controller Logs
Runner Pod Logs