actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.77k stars 1.13k forks source link

Helm uninstall of the runner set results in controller not able to delete all resources for the set #3512

Closed samip5 closed 6 months ago

samip5 commented 6 months ago

Checks

Controller Version

0.9.1

Deployment Method

Helm

Checks

To Reproduce

1. Install FluxCD2
2. Install actions runner controller and sets via Helm
3. Try to uninstall any of the sets
4. See that it indefinitely is waiting on "dependent resources to be deleted"

Describe the bug

Controller seems to be stuck on "waiting for dependent resources to be deleted" for the set.

Describe the expected behavior

I expected it to have no issue deleting the sets.

Additional Context

---
apiVersion: helm.toolkit.fluxcd.io/v2beta2
kind: HelmRelease
metadata:
  name: arc-sol
  namespace: ci
spec:
  interval: 30m
  chart:
    spec:
      chart: gha-runner-scale-set
      version: 0.9.1
      sourceRef:
        kind: HelmRepository
        name: actions-runner-controller
        namespace: flux-system
      interval: 30m

  values:
    # Cannot have nice things because of https://github.com/actions/actions-runner-controller/issues/2697
    runnerScaleSetName: arc-sol

    githubConfigUrl: https://github.com/samipsolutions

    minRunners: 1
    maxRunners: 5

    containerMode:
      type: "dind"

    template:
      spec:
        containers:
          - name: runner
            image: ghcr.io/joryirving/actions-runner:2.316.1
            command: ["/home/runner/run.sh"]
        nodeSelector:
          kubernetes.io/arch: amd64

  valuesFrom:
    - kind: Secret
      name: actions-runner-controller-auth-secret
      valuesKey: github_app_id
      targetPath: githubConfigSecret.github_app_id
    - kind: Secret
      name: actions-runner-controller-auth-secret
      valuesKey: github_app_installation_id
      targetPath: githubConfigSecret.github_app_installation_id
    - kind: Secret
      name: actions-runner-controller-auth-secret
      valuesKey: github_app_private_key
      targetPath: githubConfigSecret.github_app_private_key

Controller Logs

https://gist.github.com/samip5/71eac7eb80d41d74111f2c404082231f

Runner Pod Logs

N/A
nikola-jokic commented 6 months ago

Hey @samip5,

Can you please check what resources are still present when you are in this loop? When you uninstall the chart, all resources are deleted, so we just remove the finalizer. Is it possible that some resources that we apply with helm are not deleted, causing this loop?

samip5 commented 6 months ago

The issue seems to be with the different RBAC related objects, like roles, service accounts, role bindings, some secrets AND autoscalingrunnersets themself. I always have to manually patch them to remove finaliser(s) before it will actually uninstall.

rteeling-evernorth commented 6 months ago

The same issue exists when installing the chart via ArgoCD (see #3440)

In Argo, you can annotate the manifests with an argo-specific annotation that defines the order of what resources to apply/destroy. I've proposed the ArgoCD fix in #3447. I've never gotten to work with Flux, but I'd guess it has some similar functionality.

To add more context, i've found when you delete the scaleset, ARC (the controller) will delete the resources with the finalizer along the scaleset CR, so I added the argo annotations to ignore the finalizer resources, and make the scaleset be applied last and deleted first.

The process also works manually, although it is considerably more tedious - Disable flux/argo auto sync, delete the scale set, the RBAC resources will get destroyed by the ARC controller, so then uninstall the chart/app in its entirety and it will go quietly

samip5 commented 6 months ago

I've never gotten to work with Flux, but I'd guess it has some similar functionality.

I'm not sure about that, as to my understanding it really doesn't have similar functionality in the aspect of order of deletion/apply.