evryfs / github-actions-runner-operator

K8S operator for scheduling github actions runner pods
Apache License 2.0
433 stars 53 forks source link

Finalization of pods not run when CR is deleted #212

Open aroemen opened 3 years ago

aroemen commented 3 years ago

Running kubectl apply -f .\gh-runners-linux.yaml creates the runners as expected in my GitHub organization. When I delete them though (using kubectl delete -f .\gh-runners-linux.yaml), the pods that contained the runners get stuck in a "Terminating" status.

NAMESPACE                        NAME                                              READY   STATUS        RESTARTS   AGE
github-action-runners            runner-pool-pod-fhthp                             0/3     Terminating   0          4m50s
github-action-runners            runner-pool-pod-wfs62                             0/3     Terminating   0          4m50s
github-actions-runner-operator   github-actions-runner-operator-59b9d486b5-t2p62   1/1     Running       0          5m26s

If I edit the pod and remove the finalizer (garo.tietoevry.com/runner-registration), the pod successfully deletes after saving that change. The runner is not being removed from my list of GitHub self hosted runners though as I would expect. Am I missing something here?

davidkarlsen commented 3 years ago

Then there is a problem with unregistration, please provide logs from the operator to enable me to help you.

aroemen commented 3 years ago

@davidkarlsen I don't see any mention of the delete in the operator log. The delete command was issued at 12:34:23 which is the last time there is anything in the operator logs here:

2021-03-11T18:30:34.050Z    INFO    controller-runtime.metrics  metrics server is starting to listen    {"addr": ":8080"}
2021-03-11T18:30:34.051Z    INFO    controller-runtime.injectors-warning    Injectors are deprecated, and will be removed in v0.10.x
2021-03-11T18:30:34.051Z    INFO    controller-runtime.injectors-warning    Injectors are deprecated, and will be removed in v0.10.x
2021-03-11T18:30:34.051Z    INFO    controller-runtime.injectors-warning    Injectors are deprecated, and will be removed in v0.10.x
2021-03-11T18:30:34.051Z    INFO    controller-runtime.injectors-warning    Injectors are deprecated, and will be removed in v0.10.x
2021-03-11T18:30:34.051Z    INFO    controller-runtime.injectors-warning    Injectors are deprecated, and will be removed in v0.10.x
2021-03-11T18:30:34.051Z    INFO    controller-runtime.injectors-warning    Injectors are deprecated, and will be removed in v0.10.x
2021-03-11T18:30:34.051Z    INFO    controller-runtime.injectors-warning    Injectors are deprecated, and will be removed in v0.10.x
2021-03-11T18:30:34.051Z    INFO    controller-runtime.injectors-warning    Injectors are deprecated, and will be removed in v0.10.x
2021-03-11T18:30:34.051Z    INFO    setup   starting manager
I0311 18:30:34.052860       1 leaderelection.go:243] attempting to acquire leader lease github-actions-runner-operator/4ef9cd91.tietoevry.com...
2021-03-11T18:30:34.052Z    INFO    controller-runtime.manager  starting metrics server {"path": "/metrics"}
I0311 18:30:51.471375       1 leaderelection.go:253] successfully acquired lease github-actions-runner-operator/4ef9cd91.tietoevry.com
2021-03-11T18:30:51.471Z    DEBUG   controller-runtime.manager.events   Normal  {"object": {"kind":"ConfigMap","namespace":"github-actions-runner-operator","name":"4ef9cd91.tietoevry.com","uid":"830a98c7-1d79-4fd4-8b16-27048338c333","apiVersion":"v1","resourceVersion":"156761"}, "reason": "LeaderElection", "message": "github-actions-runner-operator-59b9d486b5-hbsrz_a1bc3d27-328e-490c-86e3-4e6033887fbf became leader"}
2021-03-11T18:30:51.472Z    INFO    controller-runtime.manager.controller.githubactionrunner    Starting EventSource    {"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "source": "kind source: /, Kind="}
2021-03-11T18:30:51.573Z    INFO    controller-runtime.manager.controller.githubactionrunner    Starting EventSource    {"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "source": "kind source: /, Kind="}
2021-03-11T18:30:51.674Z    INFO    controller-runtime.manager.controller.githubactionrunner    Starting EventSource    {"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "source": "kind source: /, Kind="}
2021-03-11T18:30:51.775Z    INFO    controller-runtime.manager.controller.githubactionrunner    Starting Controller {"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner"}
2021-03-11T18:30:51.775Z    INFO    controller-runtime.manager.controller.githubactionrunner    Starting workers    {"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "worker count": 1}
2021-03-11T18:30:51.775Z    INFO    controllers.GithubActionRunner  Reconciling GithubActionRunner  {"githubactionrunner": "github-action-runners/runner-pool"}
2021-03-11T18:30:52.172Z    INFO    controllers.GithubActionRunner  Scaling up  {"githubactionrunner": "github-action-runners/runner-pool", "numInstances": 2}
2021-03-11T18:30:52.182Z    INFO    controllers.GithubActionRunner  Creating a new Pod  {"githubactionrunner": "github-action-runners/runner-pool", "Pod.Namespace": "github-action-runners", "Pod.Name": "runner-pool-pod-4ts8j", "result": "created"}
2021-03-11T18:30:52.182Z    DEBUG   controller-runtime.manager.events   Normal  {"object": {"kind":"GithubActionRunner","namespace":"github-action-runners","name":"runner-pool","uid":"377cc688-b76c-4862-b268-3e306e2dc484","apiVersion":"garo.tietoevry.com/v1alpha1","resourceVersion":"156732"}, "reason": "Scaling", "message": "Created pod github-action-runners/runner-pool-pod-4ts8j"}
2021-03-11T18:30:52.186Z    INFO    controllers.GithubActionRunner  Creating a new Pod  {"githubactionrunner": "github-action-runners/runner-pool", "Pod.Namespace": "github-action-runners", "Pod.Name": "runner-pool-pod-779pp", "result": "created"}
2021-03-11T18:30:52.186Z    DEBUG   controller-runtime.manager.events   Normal  {"object": {"kind":"GithubActionRunner","namespace":"github-action-runners","name":"runner-pool","uid":"377cc688-b76c-4862-b268-3e306e2dc484","apiVersion":"garo.tietoevry.com/v1alpha1","resourceVersion":"156732"}, "reason": "Scaling", "message": "Created pod github-action-runners/runner-pool-pod-779pp"}
2021-03-11T18:30:52.256Z    INFO    controllers.GithubActionRunner  Reconciling GithubActionRunner  {"githubactionrunner": "github-action-runners/runner-pool"}
2021-03-11T18:30:52.401Z    INFO    controllers.GithubActionRunner  Pods and runner API not in sync, returning early    {"githubactionrunner": "github-action-runners/runner-pool"}
2021-03-11T18:31:52.256Z    INFO    controllers.GithubActionRunner  Reconciling GithubActionRunner  {"githubactionrunner": "github-action-runners/runner-pool"}
2021-03-11T18:32:52.502Z    INFO    controllers.GithubActionRunner  Reconciling GithubActionRunner  {"githubactionrunner": "github-action-runners/runner-pool"}
2021-03-11T18:33:52.687Z    INFO    controllers.GithubActionRunner  Reconciling GithubActionRunner  {"githubactionrunner": "github-action-runners/runner-pool"}
2021-03-11T18:34:22.734Z    INFO    controllers.GithubActionRunner  Reconciling GithubActionRunner  {"githubactionrunner": "github-action-runners/runner-pool"}
2021-03-11T18:34:23.141Z    INFO    controllers.GithubActionRunner  Reconciling GithubActionRunner  {"githubactionrunner": "github-action-runners/runner-pool"}
2021-03-11T18:34:52.876Z    INFO    controllers.GithubActionRunner  Reconciling GithubActionRunner  {"githubactionrunner": "github-action-runners/runner-pool"}
aroemen commented 3 years ago

Sorry, I just noticed I put this on the wrong project. This should probably be on the github-actions-runner-operator project than here. Let me know if you want me to move it.

davidkarlsen commented 3 years ago

that's strange, what version are you running of the operator? can you provide the CR for the runner pool?

aroemen commented 3 years ago

I'm running the latest version from helm charts 2.5.10. I'm just testing locally in my k8s environment in docker on win10.

apiVersion: garo.tietoevry.com/v1alpha1
kind: GithubActionRunner
metadata:
  name: runner-pool
  namespace: github-action-runners
spec:
  minRunners: 2                # minimum running pods, required
  maxRunners: 6                # max number of pods, required
  reconciliationPeriod: 1m     # How often it will reconcile, optional, default 1m
  organization: MYORG  # the github org, required
  # repository: "theRepoName"  # if runner for repo, optional
  tokenRef:
    key: GH_TOKEN
    name: actions-runner
  podTemplateSpec:
    metadata:
      annotations:
        "prometheus.io/scrape": "true"
        "prometheus.io/port": "3903"
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                topologyKey: kubernetes.io/hostname
                labelSelector:
                  matchExpressions:
                    - key: garo.tietoevry.com/pool
                      operator: In
                      values:
                        - runner-pool
      containers:
        - name: runner
          env:
            - name: RUNNER_DEBUG
              value: "true"
            - name: DOCKER_TLS_CERTDIR
              value: /certs
            - name: DOCKER_HOST
              value: tcp://localhost:2376
            - name: DOCKER_TLS_VERIFY
              value: "1"
            - name: DOCKER_CERT_PATH
              value: /certs/client
            - name: ACTIONS_RUNNER_INPUT_LABELS
              value: linux,x64
            - name: ACTIONS_RUNNER_INPUT_RUNNERGROUP
              value: "Internal"
            - name: GH_ORG
              value: MYORG
            # if runner for repo:
            # - name: GH_REPO
            #   value: theRepoName
          envFrom:
            - secretRef:
                name: runner-pool-regtoken
          # find the fixed-in-time tags at https://quay.io/repository/evryfs/github-actions-runner?tab=tags if you want to avoid pulling on a moving tag
          # due to https://github.com/actions/runner/issues/246 the runner sw needs to be recent
          # you can subscribe to release-feeds at https://github.com/evryfs/github-actions-runner/releases.atom
          image: quay.io/evryfs/github-actions-runner:latest
          imagePullPolicy: Always
          resources: {}
          volumeMounts:
            - mountPath: /certs
              name: docker-certs
            - mountPath: /home/runner/_diag
              name: runner-diag
            - mountPath: /home/runner/_work
              name: runner-work
            # - mountPath: /home/runner/.m2
            #   name: mvn-repo
            # - mountPath: /home/runner/.m2/settings.xml
            #   name: settings-xml
        - name: docker
          env:
            - name: DOCKER_TLS_CERTDIR
              value: /certs
          image: docker:stable-dind
          imagePullPolicy: Always
          args:
            # See linked issues from: https://github.com/evryfs/github-actions-runner-operator/issues/39
            - --mtu=1430
          resources: {}
          securityContext:
            privileged: true
          volumeMounts:
            - mountPath: /var/lib/docker
              name: docker-storage
            - mountPath: /certs
              name: docker-certs
            - mountPath: /home/runner/_work
              name: runner-work
        - name: exporter
          image: quay.io/evryfs/github-actions-runner-metrics:v0.0.3
          ports:
            - containerPort: 3903
              protocol: TCP
          volumeMounts:
            - name: runner-diag
              mountPath: /_diag
              readOnly: true
      volumes:
        - emptyDir: {}
          name: runner-work
        - emptyDir: {}
          name: runner-diag
        - emptyDir: {}
          name: mvn-repo
        - emptyDir: {}
          name: docker-storage
        - emptyDir: {}
          name: docker-certs
        # - configMap:
        #     defaultMode: 420
        #     name: settings-xml
        #   name: settings-xml
davidkarlsen commented 3 years ago

I was able to reproduce it. It's an edge case when you delete the actual cr. In this case it's gone and the cleanup step handling the finalization https://github.com/evryfs/github-actions-runner-operator/blob/master/controllers/githubactionrunner_controller.go#L116 is not reached.

GitHub
evryfs/github-actions-runner-operator
K8S operator for scheduling github actions runner pods - evryfs/github-actions-runner-operator
aroemen commented 3 years ago

What would be another way to tear down these resources then?

duyhenryer commented 3 years ago

Hi there, I have the same issue here

NAME                    READY   STATUS        RESTARTS   AGE
runner-pool-pod-7qhqc   0/3     Terminating   0          4d6h
runner-pool-pod-d96bw   0/3     Terminating   0          4h38m
runner-pool-pod-w278v   0/3     Terminating   0          4h38m
runner-pool-pod-xbmww   0/3     Terminating   0          4h47m

I can't remove them. Thank you.

gabriellemadden commented 3 years ago

@aroemen @duyhenryer I was able to delete them by removing the finalizers field. Patch the finalizers list to be null:

kubectl patch pod <POD_NAME> -n <NAMESPACE> -p '{"metadata":{"finalizers":null}}'
davidkarlsen commented 3 years ago

yes, and that's what the operator does after de-registering them from github - which is why I am curious what the operator logs.

aroemen commented 3 years ago

@davidkarlsen I posted the operator logs back in March. Do you need additional data?

davidkarlsen commented 3 years ago

@aroemen sorry, commented on the wrong issue, I was thinking of https://github.com/evryfs/github-actions-runner-operator/issues/232 which was fixed recently. Still need this to fix this one (deleting CR)

davidkarlsen commented 3 years ago

@aroemen #264 will solve this, as you can scale the pool to zero, then delete the CR.

zhsj commented 3 years ago

Maybe the CR should have finalizer as well.

tonywildey-valstro commented 2 years ago

I'm trying to make this work on latest build but cant seem to make it...
$ kubectl patch githubactionrunners.garo.tietoevry.com runner-pool --namespace actions-runner --patch '{"spec":{"minRunners":0}}' --type=merge Results in The GithubActionRunner "runner-pool" is invalid: spec.minRunners: Invalid value: 0: spec.minRunners in body should be greater than or equal to 1

I suspect that either the image i'm pulling is not the latest - or i'm pulling the image wrong, the operator image i'm pulling using the published helm charts :
helm upgrade --install --wait github-actions-runner-operator evryfs-oss/github-actions-runner-operator --namespace actions-runner-operator --set githubapp.existingSecret=github-runner-app --set githubapp.enabled=true

The runner image is this one : quay.io/evryfs/github-actions-runner:latest

What am i missing ?

Thx Tony

davidkarlsen commented 2 years ago

@tonywildey-valstro you probably don't have the lastest crd: https://raw.githubusercontent.com/evryfs/github-actions-runner-operator/v0.10.0/config/crd/bases/garo.tietoevry.com_githubactionrunners.yaml

tonywildey-valstro commented 2 years ago

@tonywildey-valstro you probably don't have the lastest crd: https://raw.githubusercontent.com/evryfs/github-actions-runner-operator/v0.10.0/config/crd/bases/garo.tietoevry.com_githubactionrunners.yaml

Ah - there we go - I installed using the helm chart: https://github.com/evryfs/helm-charts/blob/master/charts/github-actions-runner-operator/crds/garo.tietoevry.com_githubactionrunners.yaml which does not have the min runners change

Thx Tony

GitHub
helm-charts/garo.tietoevry.com_githubactionrunners.yaml at master · evryfs/helm-charts
OpenSourced Helm charts. Contribute to evryfs/helm-charts development by creating an account on GitHub.