Open pathcl opened 5 months ago
It used to have MySQL, but it became somewhat difficult to maintain, so we kind of dropped MySQL support.
But if you're trying to use it in k8s, I encourage you to use the garm-operator. The operator pretty much treats GARM as stateless and syncs the sqlite DB using the info it has stored in etcd.
The current push to move some things from the config to the DB is being done in order to eventually have GARM scale-out. So scaling out GARM is on the TODO list and we're working towards that, but even in the current state, it handles a large amount of runners with ease.
Great! thanks for the quick reply. I tried the k8s operator but I understood it would also require a garm instance outside of the cluster or being reachable. Is this correct?
You can have GARM run inside k8s without a problem. Have a look here:
https://github.com/mercedes-benz/garm-provider-k8s/blob/main/DEVELOPMENT.md
The instructions use tilt to bootstrap a local development environment along with garm, the operator and the k8s provider. You can use that as a starting point and expand to other providers you may need.
We need to add some proper docs in one place that gives a nice walk-through for the various cases.
You can have GARM run inside k8s without a problem. Have a look here:
https://github.com/mercedes-benz/garm-provider-k8s/blob/main/DEVELOPMENT.md
The instructions use tilt to bootstrap a local development environment along with garm, the operator and the k8s provider. You can use that as a starting point and expand to other providers you may need.
We need to add some proper docs in one place that gives a nice walk-through for the various cases.
thanks for sharing that! would you say is the only thing and enough to start? I can improve docs once I get familiar with it
That should bring up up and running with a fully functional GARM on k8s + operator. I usually run it as stand-alone, but I did manage to get it running using that guide.
@bavarianbidi may be able to chime in with more details. His wonderful team develops the k8s integration (operator and provider)
That should bring up up and running with a fully functional GARM on k8s + operator. I usually run it as stand-alone, but I did manage to get it running using that guide.
@bavarianbidi may be able to chime in with more details. His wonderful team develops the k8s integration (operator and provider)
Are you using any specific commit? I can't get garm deployed.
garm-provider-k8s $ make tilt-up
hack/scripts/kind-with-registry.sh
No kind clusters found.
Creating cluster "garm" ...
β Ensuring node image (kindest/node:v1.28.7) πΌ
β Preparing nodes π¦
β Writing configuration π
β Starting control-plane πΉοΈ
β Installing CNI π
β Installing StorageClass πΎ
Set kubectl context to "kind-garm"
You can now use your cluster with:
kubectl cluster-info --context kind-garm
Thanks for using kind! π
configmap/local-registry-hosting created
tilt up
Tilt started on http://localhost:10350/
v0.33.16, built 2024-06-07
(space) to open the browser
(s) to stream logs (--stream=true)
(t) to open legacy terminal mode (--legacy=true)
(ctrl-c) to exit
garm-provider-k8s $ git remote -v
origin git@github.com:mercedes-benz/garm-provider-k8s (fetch)
origin git@github.com:mercedes-benz/garm-provider-k8s (push)
garm-provider-k8s $ git rev-parse HEAD
b45a9889943b80d5d6e8222ab6c22a5f59e02157
garm-provider-k8s $ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-5dd5756b68-72csd 1/1 Running 0 10s
kube-system coredns-5dd5756b68-mmlqp 1/1 Running 0 10s
kube-system etcd-garm-control-plane 1/1 Running 0 25s
kube-system kindnet-5m6t5 1/1 Running 0 11s
kube-system kube-apiserver-garm-control-plane 1/1 Running 0 27s
kube-system kube-controller-manager-garm-control-plane 1/1 Running 0 25s
kube-system kube-proxy-r5ffl 1/1 Running 0 11s
kube-system kube-scheduler-garm-control-plane 1/1 Running 0 25s
local-path-storage local-path-provisioner-7577fdbbfb-9q9ks 1/1 Running 0 10s
Garm should be deployed according to step 3) in https://github.com/mercedes-benz/garm-provider-k8s/blob/main/DEVELOPMENT.md#getting-started π€
I used main
but you may need to add description = "garm credentials"
here:
But other than that, I just installed, docker, kubectl, tilt, go and went through the steps.
you can also edit the existing config map:
kubectl -n garm-server edit configmap garm-configuration
and add it. Then remove the failing containers.
At the end you should have something like:
root@garm-deleteme:~# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
cert-manager cert-manager-5bd57786d4-jmwdj 1/1 Running 0 58m
cert-manager cert-manager-cainjector-57657d5754-89fwt 1/1 Running 0 58m
cert-manager cert-manager-webhook-7d9f8748d4-npk9b 1/1 Running 0 58m
garm-operator-system garm-operator-controller-manager-69fbd5c478-ctlqt 1/1 Running 0 47m
garm-server garm-server-5b84b7f66-r7mxp 1/1 Running 0 48m
kube-system coredns-5dd5756b68-g8k87 1/1 Running 0 58m
kube-system coredns-5dd5756b68-wzxwj 1/1 Running 0 58m
kube-system etcd-garm-control-plane 1/1 Running 0 59m
kube-system kindnet-7r7mh 1/1 Running 0 58m
kube-system kube-apiserver-garm-control-plane 1/1 Running 0 59m
kube-system kube-controller-manager-garm-control-plane 1/1 Running 0 59m
kube-system kube-proxy-jz67s 1/1 Running 0 58m
kube-system kube-scheduler-garm-control-plane 1/1 Running 0 59m
local-path-storage local-path-provisioner-7577fdbbfb-9bpx4 1/1 Running 0 58m
you can also edit the existing config map:
kubectl -n garm-server edit configmap garm-configuration
and add it. Then remove the failing containers.
At the end you should have something like:
root@garm-deleteme:~# kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE cert-manager cert-manager-5bd57786d4-jmwdj 1/1 Running 0 58m cert-manager cert-manager-cainjector-57657d5754-89fwt 1/1 Running 0 58m cert-manager cert-manager-webhook-7d9f8748d4-npk9b 1/1 Running 0 58m garm-operator-system garm-operator-controller-manager-69fbd5c478-ctlqt 1/1 Running 0 47m garm-server garm-server-5b84b7f66-r7mxp 1/1 Running 0 48m kube-system coredns-5dd5756b68-g8k87 1/1 Running 0 58m kube-system coredns-5dd5756b68-wzxwj 1/1 Running 0 58m kube-system etcd-garm-control-plane 1/1 Running 0 59m kube-system kindnet-7r7mh 1/1 Running 0 58m kube-system kube-apiserver-garm-control-plane 1/1 Running 0 59m kube-system kube-controller-manager-garm-control-plane 1/1 Running 0 59m kube-system kube-proxy-jz67s 1/1 Running 0 58m kube-system kube-scheduler-garm-control-plane 1/1 Running 0 59m local-path-storage local-path-provisioner-7577fdbbfb-9bpx4 1/1 Running 0 58m
Thanks for the tip about configmap. I was able to fix that part. But now Im getting a different error:
$ kubectl get pool -n garm-operator-system -o yaml
status:
id: ""
lastSyncError: referenced GitHubScopeRef Organization/my-org-here not ready yet
kind: List
metadata:
resourceVersion: ""
selfLink: ""
I created a PAT (classical token) but not sure what's going on. I followed this: https://github.com/mercedes-benz/garm-operator/blob/main/DEVELOPMENT.md#%EF%B8%8F-bootstrap-garm-server-with-garm-provider-k8s-for-local-development. Did you use a Github App for authentication?
I used PAT auth. Make sure that the PAT you're using has access to the org/repo/enterprise you're creating and that you enabled the required scopes when creating the PAT. See:
https://github.com/cloudbase/garm/blob/main/doc/github_credentials.md
ahh. I think I know what's happening. The operator is not yet updated to take into account the recent changes to GARM regarding the URLs. Try adding:
webhook_url = "http://garm-server.garm-server.svc:9997/webhooks"
here:
If you can connect using garm-cli to the garm server, you can also update using garm-cli controller update
.
I think it would be best if you switch garm to v0.1.4. The main
branch has a bunch of updates and the operator has not caught up yet.
You can set v0.1.4
here:
also, to get webhooks from GitHub, you'll most likely need an ingress controller and a cluster IP set on the GARM server. Then you'll need to add your webhook in GitHub to point to your GARM webhook URL.
See: https://github.com/cloudbase/garm/blob/v0.1.4/doc/webhooks.md
also, to get webhooks from GitHub, you'll most likely need an ingress controller and a cluster IP set on the GARM server. Then you'll need to add your webhook in GitHub to point to your GARM webhook URL.
that's right but would I need a webhook to have a pool of runners working? I'm not sure.
BTW thanks to your help it worked! Im noticing these are configured as ephemeral runners by default:
garm-provider-k8s $ kubectl get pods -n runner
NAME READY STATUS RESTARTS AGE
garm-d7appm3tvuks 0/1 Completed 0 5m28s
garm-evctriwqbovn 0/1 Completed 0 5m28s
Do you think we could have Github Apps supported? it looks like it's already from garm-server side there but we're missing some bits between the release of v0.1.5 and garm-provider-k8s
You don't need webhooks for pools to work, but you do need them to know when to spin up a runner and when to delete it. Otherwise you'll have huge delays between when a job is started and when a runner is spun up.
Github app support will probably be added once 0.1.5 is released, depending on how much time the nice folks from mercedes-benz have.
GARM only spins up ephemeral runners. No persistent runners.
GARM only spins up ephemeral runners. No persistent runners.
in any case those runners spawn didn't run anything. Log:
An error occurred: Not configured. Run config.(sh/cmd) to configure the runner.
Runner listener exit with terminated error, stop the service, no retry needed.
Exiting runner...
They did registered to github.com but they were not able to run any workflow π’
I thinknthe summerwind
image used by default can't handle JIT configs. You need to either disable JIT or build and use the "upstream" image.
The upstream image:
https://github.com/mercedes-benz/garm-provider-k8s/tree/main/runner/upstream
To disable JIT, add:
disable_jit_config = true
In the provider section of the config:
Context for the image:
@pathcl you will most likely need to apply this patch as well:
https://github.com/mercedes-benz/garm-provider-k8s/pull/52
to build:
cd garm-provider-k8s/runner/upstream
docker build -t localhost:5000/runner-default:latest .
docker push localhost:5000/runner-default:latest
Then just apply the new image:
kubectl -n garm-operator-system patch image runner-default --type=merge --patch '{"spec": { "tag": "localhost:5000/runner-default:latest"}}'
And you should be fine with both JIT and registration token.
@pathcl you will most likely need to apply this patch as well:
mercedes-benz/garm-provider-k8s#52
to build:
cd garm-provider-k8s/runner/upstream docker build -t localhost:5000/runner-default:latest . docker push localhost:5000/runner-default:latest
Then just apply the new image:
kubectl -n garm-operator-system patch image runner-default --type=merge --patch '{"spec": { "tag": "localhost:5000/runner-default:latest"}}'
And you should be fine with both JIT and registration token.
Thanks! it worked now I can see idle runners. However I don't see jobs being picked up. I used runs-on: [self-hosted, Linux, kubernetes]
for labels. This is my pool def:
apiVersion: garm-operator.mercedes-benz.com/v1alpha1
kind: Pool
metadata:
labels:
app.kubernetes.io/instance: pool-sample
app.kubernetes.io/name: pool
app.kubernetes.io/part-of: garm-operator
name: k8s-pool
namespace: garm-operator-system
spec:
githubScopeRef:
apiGroup: garm-operator.mercedes-benz.com
kind: Organization
name: labs
enabled: true
extraSpecs: "{}"
flavor: medium
githubRunnerGroup: ""
imageName: runner-default
maxRunners: 4
minIdleRunners: 2
osArch: amd64
osType: linux
providerName: kubernetes_external # this is the name defined in your garm server
runnerBootstrapTimeout: 20
runnerPrefix: ""
tags:
- linux
- kubernetes
---
Did you have to change anything else?
try targeting just: linux
or kubernetes
(or both) as tags in your workflows. Don't target self-hosted
and Linux
(capital letter)
FYI, until you set up the webhook endpoint, GARM won't be able to autoscale. You'll still get some cleanup/min-idle-runners. But it will be only when GARM consolidates instead of reacting right away.
FYI, until you set up the webhook endpoint, GARM won't be able to autoscale. You'll still get some cleanup/min-idle-runners. But it will be only when GARM consolidates instead of reacting right away.
I was finally able to run a workflow! thanks so much. Do we have docs for configuring webhook endpoint? at this point I only see two things in my setup
FYI, until you set up the webhook endpoint, GARM won't be able to autoscale. You'll still get some cleanup/min-idle-runners. But it will be only when GARM consolidates instead of reacting right away.
I'm expecting these runners to be ephemeral but it seems idle runners are not being recreated once they've been used. Shouldn't we have always some runners waiting for jobs?
GARM doesn't know that the runner has finished running a job if webhooks don't work. They will eventually be reaped by the consolidation loop that looks in github and locally and kills used runners. Then the same consolidation loop will create missing runners based on min-idle-runners
.
If you set up your webhooks, this will happen automatically, right away.
There are 2 ways to set up webhooks:
in both cases, your webhook endpoint must be accessible by GitHub.
You can access the GARM API directly by running the following steps:
Get the GARM admin password:
grep 'garm-password=' ~/garm-provider-k8s/hack/local-development/kubernetes/garm-operator-all.yaml | sed 's/.*=//g'
Exec into the garm-server pod
kubectl -n garm-server exec -it garm-server-5b84b7f66-rxxxp sh
Replace the pod name with your own. Then, log into the GARM server using the GARM CLI:
garm-cli profile add --name garm --password <your_garm_password> --url http://garm-server.garm-server.svc:9997/ --username admin
then you can view info about your controller, install webhooks, etc:
garm-cli controller-info show
Make sure that the Controller Webhook URL
is accessible by GitHub. If you're on v0.1.4, you will need to edit the config map to set the webhook_url
. You will most likely need an ingress controller and to expose that URL to the internet via a reverse proxy or port forwarding.
if your webhook url is already accessible by GitHub and your PAT allows webhook management, you can run
garm-cli org webhook install <org_id>
There is an explanation about the URLs here: https://github.com/cloudbase/garm/blob/main/doc/using_garm.md#controller-operations
If you're using kind
, you'll most likely want to expose the service using a NodePort
or LoadBalancer
type. Then set up something like ngrok to create a tunnel to the node IP/port. If you're using a production k8s with a proper load balancer, once you expose the deployment, you'll most likely want to use the external IP/port as a base URL for all 3:
This will allow you to use the same GARM instance with multiple providers like Azure, GCP, OpenStack, OCI, etc.
If you're using
kind
, you'll most likely want to expose the service using aNodePort
orLoadBalancer
type. Then set up something like ngrok to create a tunnel to the node IP/port. If you're using a production k8s with a proper load balancer, once you expose the deployment, you'll most likely want to use the external IP/port as a base URL for all 3:* callback_url - needs to be accessible by runners (regardless of provider) * metadata_url - needs to be accessible by runners (regardless of provider) * webhook_url - needs to be accessible by GitHub
This will allow you to use the same GARM instance with multiple providers like Azure, GCP, OpenStack, OCI, etc.
Thanks for the detailed explanation! . By any chance have you tried garm k8s operator using runner image in a private registry? Im trying to figure if the Image crd needs imagePullSecrets
I have not tried, but I see there is an issue open here:
https://github.com/mercedes-benz/garm-provider-k8s/issues/6
You might try to add a comment there with your use case.
sorry, didn't follow the entire conversation here :see_no_evil:
ngrok
for integration tests in garm
repov0.1.4
for now (https://github.com/mercedes-benz/garm-provider-k8s/pull/57)@pathcl are there any other questions open regarding the garm-operator
in combination with garm
?
Dear folks,
Im reading through garm codebase and already spotted thereβs support for MySQL. Is it enough to configure garm as highly available control plane? my use case is on top of k8s.