evryfs / github-actions-runner-operator

K8S operator for scheduling github actions runner pods
Apache License 2.0
432 stars 53 forks source link

Runner deployment Help #166

Closed Anusha-Kolli closed 3 years ago

Anusha-Kolli commented 3 years ago


Need help with Runner deployment. I deployed operator and when try to deploy runner pods, pods are crashlooping with error RUNNER_TOKEN variable is needed.

I already created a secret with PAT.


davidkarlsen commented 3 years ago

It needs to be referenced like this: https://github.com/evryfs/github-actions-runner-operator/blob/master/config/samples/garo_v1alpha1_githubactionrunner.yaml#L56

K8S operator for scheduling github actions runner pods - evryfs/github-actions-runner-operator
Anusha-Kolli commented 3 years ago

Here is my crd. I have referenced same as above still I am getting that error test.pdf

Now it says secret "Error: secret "runner-pool-regtoken" not found"

What should be the value in secret "runner-pool-regtoken" ?

davidkarlsen commented 3 years ago

@Anusha-Kolli That secret will be created automatically (and refreshed) by the controller. It's a registration token for the runner. Change the reference in the crd to be named runner-pool-regtoken where you have:

 - secretRef:
     name: actions-runner <-- here

like in the reference and you should be good to go, given that you run a recent version. also the lifecycle element should not be needed.

What version of the controller do you run?

Anusha-Kolli commented 3 years ago

@davidkarlsen I am running latest version. And runner-pool-regtoken didn't work but I see it created runner-regtoken secret and referenced that in crd, I am able to deploy.

But the pod is crash looping. I have the runner, docker, exporter logs.

ldd: ./bin/System.Security.Cryptography.Native.OpenSsl.so: No such file or directory
ldd: ./bin/System.IO.Compression.Native.so: No such file or directory

|        ____ _ _   _   _       _          _        _   _                      |
|       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |
|      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |
|      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |
|       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |
|                                                                              |
|                       Self-hosted runner registration                        |
|                                                                              |

# Authentication

√ Connected to GitHub

# Runner Registration

√ Runner successfully added
√ Runner connection is good

# Runner settings

√ Settings Saved.

√ Connected to GitHub

2021-01-21 04:07:12Z: Listening for Jobs ```

**But had problem with docker container logs:**

``` kubectl logs -f  runner-pod-c9kr8 -c docker  --namespace actions-runner
Must define RUNNER_TOKEN variable ```

**exporter logs:**   

```kubectl logs -f  runner-pod-c9kr8 -c exporter  --namespace actions-runner
I0121 04:07:07.473507       1 main.go:103] mtail version v3.0.0-rc36 git revision 7825f115dd3ed9f623377821c0351d1eb7aa3a5a go version go1.14.4 go arch amd64 go os linux
I0121 04:07:07.473763       1 main.go:104] Commandline: ["mtail" "-logtostderr" "-logs" "/_diag/*" "-progs" "/progs"]
I0121 04:07:07.474447       1 log_watcher.go:249] No abspath in watched list, added new one for /progs
I0121 04:07:07.474761       1 loader.go:229] Loaded program jobmetrics.mtail
I0121 04:07:07.474812       1 log_watcher.go:249] No abspath in watched list, added new one for /_diag
I0121 04:07:07.474888       1 log_watcher.go:249] No abspath in watched list, added new one for /_diag/Runner_20210121-040707-utc.log
I0121 04:07:07.474931       1 log_watcher.go:254] Found this processor in watched list
I0121 04:07:07.474983       1 log_watcher.go:254] Found this processor in watched list
I0121 04:07:07.475023       1 tail.go:315] Tailing /_diag/Runner_20210121-040707-utc.log
I0121 04:07:07.475114       1 store.go:136] Starting metric store expiry loop every 1h0m0s
I0121 04:07:07.475364       1 mtail.go:341] Listening on [::]:3903
I0121 04:07:07.475519       1 tail.go:461] Starting log handle expiry loop every 1h0m0s
I0121 04:07:10.728620       1 log_watcher.go:254] Found this processor in watched list
I0121 04:07:10.728705       1 log_watcher.go:249] No abspath in watched list, added new one for /_diag/Runner_20210121-040710-utc.log
I0121 04:07:10.728728       1 tail.go:315] Tailing /_diag/Runner_20210121-040710-utc.log```

Not sure how to proceed from here
Anusha-Kolli commented 3 years ago

Thank you So much Felix for the new image.

I tried using this image and still my runner pod is crashlooping with error:

Runner container logs looks fine and I am able to see runner in GitHub repo but for docker conatiner Iam getting this

kubectl logs runner-pod-pfxpf -c docker --namespace actions-runner

|        ____ _ _   _   _       _          _        _   _                      |
|       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |
|      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |
|      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |
|       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |
|                                                                              |
|                       Self-hosted runner registration                        |
|                                                                              |

# Authentication

Http response code: NotFound from 'POST https://api.github.com/actions/runner-registration'
{"message":"Not Found","documentation_url":"https://docs.github.com/rest"}
Response status code does not indicate success: 404 (Not Found).```

Please help me to proceed from here.

davidkarlsen commented 3 years ago

I think the runner-image is not good, if you follow the example docs then use this: https://quay.io/repository/evryfs/github-actions-runner?tab=tags which is tailored to suit the operator.

Quay is the best place to build, store, and distribute your containers. Public repositories are always free.
davidkarlsen commented 3 years ago

did that work out for you?

yaron-idan commented 3 years ago

Hey @davidkarlsen, just chiming in here to report that we are seeing the same error as the one @Anusha-Kolli last reported.
In our case the operator worked well when we first deployed it a few months ago (using the image you've suggested in the previous comment) and broke some time ago. I'm not entirely sure why.
I also can't find any documentation about the api endpoint returning the error in Github's API reference, and since it returns a 404 error I have a slight suspicion something changed in Github's Actions API and the client should be updated. WDYT?

davidkarlsen commented 3 years ago

@yaron-idan please run latest operator version and latest runner image and it should work fine.

yaron-idan commented 3 years ago

Thanks @davidkarlsen, I'm actually running the latest image of the runner and the latests operator image and the same error is still happening. Github support suggested this is an authentication error so I'm trying to understand if something is wrong with the PAT I'm supplying the runner, but I can't find a way to figure this out.

Here are some details of the versions I'm running, in case I've missed something - runner image - latest (digest: @sha256:92a71e96865f4066cca8e08a7ea0ef2f5216bf164848f41d3348c8090fe3d5c9) operator version - v0.8.3 chart version - 2.5.4

Any idea how can I further troubleshoot this issue?

davidkarlsen commented 3 years ago

Those look correct, thanks, the PAT you use in https://github.com/evryfs/github-actions-runner-operator/blob/master/config/samples/garo_v1alpha1_githubactionrunner.yaml#L17 needs to have sufficient permissions/scopes to register a runner: https://docs.github.com/en/rest/reference/actions#self-hosted-runners.

Is the runner repo scoped or org-scoped? Note that org needs to be defined in the CR in both cases.

Also - how does the logs of the operator look like?

K8S operator for scheduling github actions runner pods - evryfs/github-actions-runner-operator
Actions - GitHub Docs
Anusha-Kolli commented 3 years ago

did that work out for you?

@davidkarlsen yes

davidkarlsen commented 3 years ago

how about you @yaron-idan ?

yaron-idan commented 3 years ago

Still having issues. I've produced a PAT with all required permissions and I'm still meeting the same error. The org is specified in the operator spec like so -

- name: runner
     - name: GH_ORG
        value: {{ $githubOrg }}

The $githubOrg variable is assigned a value from our values.yaml file earlier in the template file.

The operator throws these errors when scaling up another runner -

2021-01-28T13:29:22.230Z    INFO    controllers.GithubActionRunner  Reconciling GithubActionRunner  {"githubactionrunner": "github-runners/kubernetes-500m-cores-1024mi"}
2021-01-28T13:29:22.721Z    INFO    controllers.GithubActionRunner  Scaling up  {"githubactionrunner": "github-runners/kubernetes-500m-cores-1024mi", "numInstances": 1}
2021-01-28T13:29:22.721Z    INFO    controllers.GithubActionRunner  Registration token expired, updating    {"githubactionrunner": "github-runners/kubernetes-500m-cores-1024mi"}
2021-01-28T13:29:29.417Z    INFO    controllers.GithubActionRunner  Creating a new Pod  {"githubactionrunner": "github-runners/kubernetes-500m-cores-1024mi", "Pod.Namespace": "github-runners", "Pod.Name": "kubernetes-500m-cores-1024mi-pod-wmf2k", "result": "created"}
2021-01-28T13:29:29.417Z    DEBUG   controller-runtime.manager.events   Normal  {"object": {"kind":"GithubActionRunner","namespace":"github-runners","name":"kubernetes-500m-cores-1024mi","uid":"c09508c2-0c62-4d1b-ab5f-602fc404450e","apiVersion":"garo.tietoevry.com/v1alpha1","resourceVersion":"483312812"}, "reason": "Scaling", "message": "Created pod github-runners/kubernetes-500m-cores-1024mi-pod-wmf2k"}
2021-01-28T13:29:29.427Z    DEBUG   controller-runtime.manager.events   Warning {"object": {"kind":"GithubActionRunner","namespace":"github-runners","name":"kubernetes-500m-cores-1024mi","uid":"c09508c2-0c62-4d1b-ab5f-602fc404450e","apiVersion":"garo.tietoevry.com/v1alpha1","resourceVersion":"483312812"}, "reason": "ProcessingError", "message": "Operation cannot be fulfilled on githubactionrunners.garo.tietoevry.com \"kubernetes-500m-cores-1024mi\": the object has been modified; please apply your changes to the latest version and try again"}
2021-01-28T13:29:29.431Z    ERROR   util.api    unable to update status {"error": "Operation cannot be fulfilled on githubactionrunners.garo.tietoevry.com \"kubernetes-500m-cores-1024mi\": the object has been modified; please apply your changes to the latest version and try again"}
2021-01-28T13:29:29.432Z    ERROR   controller-runtime.manager.controller.githubactionrunner    Reconciler error    {"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "name": "kubernetes-500m-cores-1024mi", "namespace": "github-runners", "error": "Operation cannot be fulfilled on githubactionrunners.garo.tietoevry.com \"kubernetes-500m-cores-1024mi\": the object has been modified; please apply your changes to the latest version and try again"}
2021-01-28T13:29:29.432Z    INFO    controllers.GithubActionRunner  Reconciling GithubActionRunner  {"githubactionrunner": "github-runners/kubernetes-500m-cores-1024mi"}
2021-01-28T13:29:29.828Z    INFO    controllers.GithubActionRunner  Pods and runner API not in sync, returning early    {"githubactionrunner": "github-runners/kubernetes-500m-cores-1024mi"}

Any idea what's the issue here, @davidkarlsen?

Thanks for all these swift responses! We really appreciate you devoting time to our issues, and the product you've written. Can't wait for it to work properly again.

davidkarlsen commented 3 years ago

@yaron-idan those "unable to update status" are not relevant and can be ignored.

So the operator will:

  1. create/update a token in the namespace where the CR is defined called <NameOfRunner>-regtoken, this token needs to be funneled into the pod like here: https://github.com/evryfs/github-actions-runner-operator/blob/master/config/samples/garo_v1alpha1_githubactionrunner.yaml#L56
  2. that in turn will be picked up by the scheduled runner pod: https://github.com/evryfs/github-actions-runner/blob/master/entrypoint.sh#L4

for the operator to be able to create/refresh that token, it is important that you have defined your PAT in the secret referenced here: https://github.com/evryfs/github-actions-runner-operator/blob/master/config/samples/garo_v1alpha1_githubactionrunner.yaml#L17

for org-wide runners that's all needed.

If you however have a repo-scoped runner, you have to set the repo (and same value), both at:

  1. https://github.com/evryfs/github-actions-runner-operator/blob/master/config/samples/garo_v1alpha1_githubactionrunner.yaml#L16 and
  2. https://github.com/evryfs/github-actions-runner-operator/blob/master/config/samples/garo_v1alpha1_githubactionrunner.yaml#L54

full logs from operator and/or runners will tell us what is going on (you can refer some gists)

Hope this helps!

K8S operator for scheduling github actions runner pods - evryfs/github-actions-runner-operator
Contribute to evryfs/github-actions-runner development by creating an account on GitHub.
K8S operator for scheduling github actions runner pods - evryfs/github-actions-runner-operator
K8S operator for scheduling github actions runner pods - evryfs/github-actions-runner-operator
K8S operator for scheduling github actions runner pods - evryfs/github-actions-runner-operator
Anusha-Kolli commented 3 years ago

@davidkarlsen , I am able to deploy GH operator and runner in my local on docker-desktop and I am deploying runner crd using helm chart and used volume claims instead of empty directories and used subpaths. It worked fine in my local So when I try to deploy in my org kubernetes cluster docker conatiner and exporter container are running fine while runner container is crashlooping with the below error.

I sm using ubuntu20-20201210.0-2.276.1 image

ldd: ./bin/libSystem.Security.Cryptography.Native.OpenSsl.so: No such file or directory
ldd: ./bin/libSystem.IO.Compression.Native.so: No such file or directory
Unhandled exception. System.UnauthorizedAccessException: Access to the path '/home/runner/_diag/Runner_20210128-164015-utc.log' is denied.
---> System.IO.IOException: Permission denied
--- End of inner exception stack trace ---
at Interop.ThrowExceptionForIoErrno(ErrorInfo errorInfo, String path, Boolean isDirectory, Func`2 errorRewriter)
at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String path, OpenFlags flags, Int32 mode)
at System.IO.FileStream.OpenHandle(FileMode mode, FileShare share, FileOptions options)
at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options)
at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize)
at GitHub.Runner.Common.HostTraceListener.CreatePageLogWriter()
at GitHub.Runner.Common.HostTraceListener..ctor(String logFileDirectory, String logFilePrefix, Int32 pageSizeLimit, Int32 retentionDays)
at GitHub.Runner.Common.HostContext..ctor(String hostType, String logFile)
at GitHub.Runner.Listener.Program.Main(String[] args)
./config.sh: line 81: 30 Aborted (core dumped) ./bin/Runner.Listener configure "$@" 

Anyhelp would be appreciated. Thanks

yaron-idan commented 3 years ago
  1. create/update a token in the namespace where the CR is defined called <NameOfRunner>-regtoken, this token needs to be funneled into the pod like here: https://github.com/evryfs/github-actions-runner-operator/blob/master/config/samples/garo_v1alpha1_githubactionrunner.yaml#L56

That was it, It's working again!!!

Can't thank you enough for having the patience to troubleshoot and follow up on this with me.

K8S operator for scheduling github actions runner pods - evryfs/github-actions-runner-operator
davidkarlsen commented 3 years ago

Great to hear 🎉 For the. NET error that was reported and fixed in the recent runner version from github: https://github.com/actions/runner/releases. This runner sw is included in my image: https://github.com/evryfs/github-actions-runner/blob/master/Dockerfile#L3 Therefore I recommend running with the tagged versions: https://quay.io/repository/evryfs/github-actions-runner?tab=tags. If going Yolo with master/latest be sure to have imagePullPolicy Always

Happy building! ;-)

Sandeepb-nextcar commented 2 years ago

Hi @davidkarlsen @Anusha-Kolli I'm seeing the same error Pods are crashing with error "Must define RUNNER_TOKEN variable"

I created secret kubectl create secret generic github-runner-app --from-literal=GITHUB_APP_INTEGRATION_ID= --from-file=GITHUB_APP_PRIVATE_KEY=.pem -n namespace github-actions-runner-operator

referenced this envFrom:

I'm not using PAT and want to use GitHub App Method of authentication.

I believe this is for PAT. tokenRef: key: GH_TOKEN name: actions-runner

using PAT worked fine but when testing with GitHub App by creating secrets and ref from envFrom: is failing

where do i define RUNNER_TOKEN variable?

Appreciate any kind of help here

deeco commented 2 years ago

Hi @davidkarlsen @Anusha-Kolli I'm seeing the same error Pods are crashing with error "Must define RUNNER_TOKEN variable"

I created secret kubectl create secret generic github-runner-app --from-literal=GITHUB_APP_INTEGRATION_ID= --from-file=GITHUB_APP_PRIVATE_KEY=.pem -n namespace github-actions-runner-operator

referenced this envFrom: - secretRef: name: github-runner-app

I'm not using PAT and want to use GitHub App Method of authentication.

I believe this is for PAT. tokenRef: key: GH_TOKEN name: actions-runner

using PAT worked fine but when testing with GitHub App by creating secrets and ref from envFrom: is failing

where do i define RUNNER_TOKEN variable?

Appreciate any kind of help here

I am also facing this exact issue, do not want to use PAT tokens as user associated and people leave, cannot use service accounts or long lived tokens either

davidkarlsen commented 2 years ago

For using github app authentication, the config needs to be passed to the operator itself: https://github.com/evryfs/helm-charts/blob/master/charts/github-actions-runner-operator/values.yaml#L70 https://github.com/evryfs/helm-charts/blob/master/charts/github-actions-runner-operator/templates/deployment.yaml#L34

helm-charts/values.yaml at master · evryfs/helm-charts
OpenSourced Helm charts. Contribute to evryfs/helm-charts development by creating an account on GitHub.
helm-charts/deployment.yaml at master · evryfs/helm-charts
OpenSourced Helm charts. Contribute to evryfs/helm-charts development by creating an account on GitHub.