evryfs / github-actions-runner-operator

K8S operator for scheduling github actions runner pods
Apache License 2.0
432 stars 53 forks source link

Multiple runner operators on different GKE clusters #454

Open kasey-weirich opened 2 years ago

kasey-weirich commented 2 years ago

We have multiple environments set up for development/staging work and I am trying to migrate our runner operator and runner pool to a new GKE cluster. Currently our development ecosystem (on GKE) is working as expected with the runner operator scheduling pods as new jobs come up.

This is all configured on the same GH org.

I am trying to migrate to a new cluster using the same GitHub app as a K8s secrets in the new cluster (assuming the GH app can be reused). I have installed the runner operator via Helm and have supplied the GitHub app secrets in the values file. The operator installs with no issues observed in the logs.

When I install the runner pool on the new cluster it shows ReconcileSuccess however, the Current size is always 0;

I have tried:

The runner operator logs do not give any indication of why I am not seeing any runners, everything appears to be working:

2022-07-06T14:58:57.622Z    INFO    controller-runtime.metrics  metrics server is starting to listen    {"addr": ":8080"}
2022-07-06T14:58:57.622Z    INFO    setup   starting manager
I0706 14:58:57.622640       1 leaderelection.go:248] attempting to acquire leader lease runner-operator/4ef9cd91.tietoevry.com...
2022-07-06T14:58:57.622Z    INFO    starting metrics server {"path": "/metrics"}
I0706 14:59:13.859538       1 leaderelection.go:258] successfully acquired lease runner-operator/4ef9cd91.tietoevry.com
2022-07-06T14:59:13.859Z    DEBUG   events  Normal  {"object": {"kind":"ConfigMap","namespace":"runner-operator","name":"4ef9cd91.tietoevry.com","uid":"33346577-5c1c-4d78-82b4-79d1e191147b","apiVersion":"v1","resourceVersion":"14561"}, "reason": "LeaderElection", "message": "github-actions-runner-operator-fd84696f-2l2x2_2f916b5b-f9e3-4982-85f8-501181d79b2d became leader"}
2022-07-06T14:59:13.859Z    INFO    controller.githubactionrunner   Starting EventSource    {"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "source": "kind source: /, Kind="}
2022-07-06T14:59:13.859Z    INFO    controller.githubactionrunner   Starting EventSource    {"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "source": "kind source: /, Kind="}
2022-07-06T14:59:13.859Z    INFO    controller.githubactionrunner   Starting EventSource    {"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "source": "kind source: /, Kind="}
2022-07-06T14:59:13.859Z    INFO    controller.githubactionrunner   Starting Controller {"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner"}
2022-07-06T14:59:13.961Z    INFO    controller.githubactionrunner   Starting workers    {"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "worker count": 1}
2022-07-06T14:59:19.414Z    INFO    controllers.GithubActionRunner  Reconciling GithubActionRunner  {"githubactionrunner": "runner-operator/runner-pool"}
2022-07-06T14:59:19.760Z    INFO    controllers.GithubActionRunner  Registration secret not found, creating {"githubactionrunner": "runner-operator/runner-pool"}
2022-07-06T14:59:19.976Z    INFO    controllers.GithubActionRunner  Pods and runner API not in sync, returning early    {"githubactionrunner": "runner-operator/runner-pool"}
2022-07-06T14:59:20.136Z    INFO    controllers.GithubActionRunner  Reconciling GithubActionRunner  {"githubactionrunner": "runner-operator/runner-pool"}
2022-07-06T14:59:20.325Z    INFO    controllers.GithubActionRunner  Pods and runner API not in sync, returning early    {"githubactionrunner": "runner-operator/runner-pool"}
2022-07-06T14:59:50.136Z    INFO    controllers.GithubActionRunner  Reconciling GithubActionRunner  {"githubactionrunner": "runner-operator/runner-pool"}
2022-07-06T14:59:50.316Z    INFO    controllers.GithubActionRunner  Pods and runner API not in sync, returning early    {"githubactionrunner": "runner-operator/runner-pool"}
2022-07-06T15:00:20.367Z    INFO    controllers.GithubActionRunner  Reconciling GithubActionRunner  {"githubactionrunner": "runner-operator/runner-pool"}
2022-07-06T15:00:20.585Z    INFO    controllers.GithubActionRunner  Pods and runner API not in sync, returning early    {"githubactionrunner": "runner-operator/runner-pool"}
2022-07-06T15:00:50.604Z    INFO    controllers.GithubActionRunner  Reconciling GithubActionRunner  {"githubactionrunner": "runner-operator/runner-pool"}
2022-07-06T15:00:50.863Z    INFO    controllers.GithubActionRunner  Pods and runner API not in sync, returning early    {"githubactionrunner": "runner-operator/runner-pool"}

Thank you for any pointers you can provide.

kasey-weirich commented 2 years ago

I believe I finally figured this out, in my runner spec yaml file, I changed the name of the runner pool and am now seeing runner pods in my new cluster. Previously named runner-pool.

apiVersion: garo.tietoevry.com/v1alpha1
kind: GithubActionRunner
metadata:
  name: runner-pool-test-01
  namespace: runner-operator
spec:
  minRunners: 2
  maxRunners: 20
  organization: myOrgo
  reconciliationPeriod: 30s
  podTemplateSpec:
    metadata:
      annotations:
        "prometheus.io/scrape": "true"
        "prometheus.io/port": "3903"
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                topologyKey: kubernetes.io/hostname
                labelSelector:
                  matchExpressions:
                    - key: garo.tietoevry.com/pool
                      operator: In
                      values:
                        - runner-pool-test-01

Even if the operator is on a completely different cluster, using the same runner pool name in the runner spec results in zero runner pods getting created.

Would it be useful to update the README with something mentioning using the operator/runner pool across multiple clusters/lifecycles/env's?