actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.42k stars 1.04k forks source link

Can't Remove autoscalingrunnersets or ephemeralrunnersets CRDs #3395

Closed rwlove closed 3 months ago

rwlove commented 3 months ago

Checks

Controller Version

0.8.3

Deployment Method

Helm

Checks

To Reproduce

  1. delete helm charts
  2. delete CRDs
➜  home-ops git:(main) ✗ kubectl delete -f actions-runner-controller-crds-delme/actions.github.com_autoscalinglisteners.yaml 
customresourcedefinition.apiextensions.k8s.io "autoscalinglisteners.actions.github.com" deleted (SUCCEEDS)
➜  home-ops git:(main) ✗ kubectl delete -f actions-runner-controller-crds-delme/actions.github.com_autoscalingrunnersets.yaml 
customresourcedefinition.apiextensions.k8s.io "autoscalingrunnersets.actions.github.com" deleted (HANGS)
➜  home-ops git:(main) ✗ kubectl delete -f actions-runner-controller-crds-delme/actions.github.com_ephemeralrunners.yaml (SUCCEEDS)
➜  home-ops git:(main) ✗ kubectl delete -f actions-runner-controller-crds-delme/actions.github.com_ephemeralrunnersets.yaml 
customresourcedefinition.apiextensions.k8s.io "ephemeralrunnersets.actions.github.com" deleted (HANGS)

Describe the bug

Can't remove CRDs to upgrade.

Describe the expected behavior

CRDs remove cleanly.

Additional Context

Name:         ephemeralrunnersets.actions.github.com
Namespace:    
Labels:       helm.toolkit.fluxcd.io/name=actions-runner-controller
              helm.toolkit.fluxcd.io/namespace=dev
Annotations:  controller-gen.kubebuilder.io/version: v0.14.0
API Version:  apiextensions.k8s.io/v1
Kind:         CustomResourceDefinition
Metadata:
  Creation Timestamp:  2024-04-01T14:13:32Z
  Deletion Timestamp:  2024-04-01T14:39:25Z
  Finalizers:
    customresourcecleanup.apiextensions.k8s.io
  Generation:        1
  Resource Version:  25131759
  UID:               4bb87b29-0c91-4b58-95b6-25045b8904a6
Name:         autoscalingrunnersets.actions.github.com
Namespace:    
Labels:       helm.toolkit.fluxcd.io/name=actions-runner-controller
              helm.toolkit.fluxcd.io/namespace=dev
Annotations:  controller-gen.kubebuilder.io/version: v0.14.0
API Version:  apiextensions.k8s.io/v1
Kind:         CustomResourceDefinition
Metadata:
  Creation Timestamp:  2024-04-01T14:13:31Z
  Deletion Timestamp:  2024-04-01T14:39:06Z
  Finalizers:
    customresourcecleanup.apiextensions.k8s.io
  Generation:        1
  Resource Version:  25132657
  UID:               c0242b49-793a-404e-a7db-a0b99ba3ba51

Controller Logs

n/a

Runner Pod Logs

n/a
github-actions[bot] commented 3 months ago

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

rwlove commented 3 months ago

CRDs wouldn't delete, but I tried upgrading to 0.9.0 anyway, listener wouldn't start, so I tried deleting again, CRDs still stuck. This is why the timestamp for the CRDs is today.

How can I clean this mess up?

nikola-jokic commented 3 months ago

Hey @rwlove,

Usually, CRDs will be left hanging if there exists a resource that is described by that CRD. Could you please inspect if there are resources left in the cluster? Most likely, there exists an ephemeral runner with the finalizer that is interrupted during the cleanup.

rwlove commented 3 months ago

@nikola-jokic I'm unsure how to list the instantiated resources in a CRD.

rwlove commented 3 months ago

I guess it's this stuff:

  Names:
    Kind:       AutoscalingRunnerSet
    List Kind:  AutoscalingRunnerSetList
    Plural:     autoscalingrunnersets
    Singular:   autoscalingrunnerset
  Scope:        Namespaced
rwlove commented 3 months ago
➜  home-ops git:(main) ✗ kubectl -n dev get AutoscalingRunnerSet                                                                               
NAME                      MINIMUM RUNNERS   MAXIMUM RUNNERS   CURRENT RUNNERS   STATE   PENDING RUNNERS   RUNNING RUNNERS   FINISHED RUNNERS   DELETING RUNNERS
arc-runner-set-home-ops   1                 6                 1                         0                 1                                    
➜  home-ops git:(main) ✗ kubectl -n dev delete AutoscalingRunnerSet arc-runner-set-home-ops                                                    
autoscalingrunnerset.actions.github.com "arc-runner-set-home-ops" deleted (HANGS)
kubectl -n dev delete ephemeralrunnersets.actions.github.com arc-runner-set-home-ops-xxkww 
ephemeralrunnerset.actions.github.com "arc-runner-set-home-ops-xxkww" deleted (HANGS)
rwlove commented 3 months ago
➜  home-ops git:(main) ✗ kubectl -n dev edit ephemeralrunnersets.actions.github.com arc-runner-set-home-ops-xxkww
error: ephemeralrunnersets.actions.github.com "arc-runner-set-home-ops-xxkww" is invalid
A copy of your changes has been stored to "/tmp/kubectl-edit-2170329029.yaml"
error: Edit cancelled, no valid changes were saved.

If I remove the finalizer, save/exit the editor, it re-opens the editor. Save/exit again and it gives the above error.

I don't actually see any ephemeralrunners

➜  home-ops git:(main) ✗ kubectl get ephemeralrunners -A 
error: the server doesn't have a resource type "ephemeralrunners"

however,

➜  home-ops git:(main) ✗ kubectl -n dev get ephemeralrunnersets.actions.github.com                      
NAME                            DESIREDREPLICAS   CURRENTREPLICAS   PENDING RUNNERS   RUNNING RUNNERS   FINISHED RUNNERS   DELETING RUNNERS
arc-runner-set-home-ops-xxkww   1                 1                 0                 1                                    
rwlove commented 3 months ago

Should I remove the runner and set from the GH Settings UI?

nikola-jokic commented 3 months ago

I think you would have to patch it. The problem is that you removed the controller before it was able to finalize things. So try something like: kubectl patch $RESOURCE -n $NS -p '{"metadata":{"finalizers":[]}}' --type=merge.

The cluster is in a really bad state right now and cleaning it up is going to be difficult. You may try re-installing everything, and then issue a delete request for each scale set, but I can't say for sure if it is going to work with this many manual interventions...

Removing them from the GitHub UI would not solve it on the cluster level... You still have to clean up resources from your cluster.

rwlove commented 3 months ago

@nikola-jokic ack, I was worried I'd have to rebuild, but thanks for your pointers.

I got rid of the ephemeralrunnerset with this command.

➜  home-ops git:(main) ✗ kubectl patch ephemeralrunnersets.actions.github.com arc-runner-set-home-ops-xxkww -n dev -p '{"metadata":{"finalizers":[]}, "spec":{"patchID":1}}' --type=merge 
ephemeralrunnerset.actions.github.com/arc-runner-set-home-ops-xxkww patched

I was able to edit away the AutoscalingRunnerSet arc-runner-set-home-ops

I just ensured all the CRDs are deleted with kubectl delete -f <4x CRD yaml files>

I guess now I'll try to create again.

rwlove commented 3 months ago
Every 2.0s: kubectl -n dev get all                                                                                                             rover: Mon Apr  1 13:05:43 2024

NAME                                                              READY   STATUS    RESTARTS        AGE
pod/actions-runner-controller-gha-rs-controller-f4d5889c7-6hn4l   1/1     Running   0               38s
pod/arc-runner-set-home-ops-79cf85d9-listener                     1/1     Running   0               19s
pod/arc-runner-set-home-ops-t6xjg-runner-sq5xx                    2/2     Running   0               17s
pod/arc-runner-set-home-ops-t6xjg-runner-ttzpt                    2/2     Running   0               9s

Thanks a ton, @nikola-jokic!

nikola-jokic commented 3 months ago

No problem! I'm glad you resolved it :relaxed: !

rekha-prakash-maersk commented 3 months ago

Hi @rwlove @nikola-jokic , I am also encountering the exact issue. After running the kubectl patch command, I am running into another issue, The controller deployed in new namespace still trying to connect to the old listener which was running before upgrade in old namespace. How to remove these old listener scaleset references from controller so it create new one from the code ?

rwlove commented 3 months ago

I had to create a false patchID

kubectl patch ephemeralrunnersets.actions.github.com arc-runner-set-home-ops-xxkww -n dev -p '{"metadata":{"finalizers":[]}, "spec":{"patchID":1}}' --type=merge
rekha-prakash-maersk commented 3 months ago

@rwlove , that worked for me as well. Now I am running into new issue.

Before knowing this false patchID approach, yesterday I tried to rebuild everything in new namespace on same cluster. Now I am running into the issue where the controller deployed in new namespace still trying to connect to the scaleset listener which were in old ns.

kubectl -n new-ns get ephemeralrunnersets.actions.github.com
No resources found in new-ns namespace.
kubectl -n old-ns get ephemeralrunnersets.actions.github.com
No resources found in old-ns namespace.
rwlove commented 3 months ago

I'm not sure about your specific error, I'm just a user and that's a bit outside of my understanding.

What I would say, and maybe this is already obvious to you, is that you need to make sure you can delete all four of the CRD files before doing the install.