Open frbk opened 3 years ago
@frbk Hey!
registration-only pod does not inherit labels from spec:template which causes it to be stuck in the limbo. I was able to apply those labels using argocd.
This is working as intended but might be affecting your use-case, as Fargate requires your pods to have certain labels so that Fargate can discover which pods to be deployed onto it. Perhaps we need to fix how actions-runner-controller creates a registration-only runner pod, in a way that it doesn't rely on empty labels. Or perhaps you can wait for GitHub to add some API and system changes so that we can scale from/to zero without having a registration-only runner. https://github.com/actions-runner-controller/actions-runner-controller/issues/470#issuecomment-841428853
When both runner and registration-only pods come up they seem to crash with this error:
The error says that you're trying to deploy it as a GitHub app and the private key you've provided was invalid. Check the content of the K8s secret that contains the private key.
And most importantly, does Fargate supports deploying privileged containers today? In a standard setup, your runner pods and containers need to be privileged to work, especially for docker-in-docker. I thought there's some way to run dind without privileges but you need to set privileged: false
on your runner spec and figure other settings out to make it work on Fargate, I think.
Hey @mumoshu . Thanks for the reply. Is privileged: false
part of the helm chart? Also, I am reusing the same token if I dont use fargate. I deployed two types of runners fargate one and normal one which just uses machines and that one worked fine with that token but I will investigate. Fargate doesn't work with privileged sadly. For my use case I dont need it because I am trying to run a bunch of rspec tests in the runner with some services and was planning on adding those services as sidecars.
I kinda assumed that I can replica what gitlab ci doing.
For my use case I dont need it because I am trying to run a bunch of rspec tests in the runner with some services and was planning on adding those services as sidecars.
@frbk Ah, gotcha! Then it should theoretically work if you set dockerEnabled: false
https://github.com/actions-runner-controller/actions-runner-controller/blob/dc5f90025cdf5382d8d1b347483dacf0f3d3757b/api/v1alpha1/runner_types.go#L100-L101
But the issue on empty private key would still be a blocker. BTW, to be extra clear- which pod showed the Error: Client creation failed. authentication failed:
log? actions-runner-controller
, or a runner pod?
privileged: false part of the helm chart?
Nope. It's computed depending on the runner spec provided by you. https://github.com/actions-runner-controller/actions-runner-controller/blob/dc5f90025cdf5382d8d1b347483dacf0f3d3757b/controllers/runner_controller.go#L705
I get this error on the runner pod. actions-runner-controller
is good. It seems that the secret is not being mounted when I use fargate. I am going to try mounting it in RunnerDeployment
and see if that works.
I have done a bit more investigating and these are the findings. It looks like runner pod is not mounting secrets when running on fargate. I was able to solve this by mounting this secrets in the RunnerDeployment
and it looks like this now:
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: 4-10-fargate
namespace: github
spec:
template:
metadata:
labels:
fargate: "true"
eks.amazonaws.com/fargate-profile: "github"
spec:
serviceAccountName: "actions-runner-controller"
repository: <some/repo>
labels:
- 4-10-fargate
resources:
requests:
cpu: "4.0"
memory: "10Gi"
ephemeral-storage: "5Gi"
dockerEnabled: false
image: summerwind/actions-runner-controller
env:
- name: GITHUB_TOKEN
valueFrom:
secretKeyRef:
name: controller-manager
key: github_token
optional: true
- name: GITHUB_APP_ID
valueFrom:
secretKeyRef:
name: controller-manager
key: github_app_id
optional: true
- name: GITHUB_APP_INSTALLATION_ID
valueFrom:
secretKeyRef:
name: controller-manager
key: github_app_installation_id
optional: true
- name: GITHUB_APP_PRIVATE_KEY
value: /etc/actions-runner-controller/github_app_private_key
volumeMounts:
- name: controller-manager
mountPath: "/etc/actions-runner-controller"
readOnly: true
- mountPath: /tmp/k8s-webhook-server/serving-certs
name: cert
readOnly: true
volumes:
- name: controller-manager
secret:
secretName: controller-manager
- name: cert
secret:
defaultMode: 420
secretName: webhook-server-cert
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
name: 4-10-fargate
namespace: github
spec:
scaleTargetRef:
name: 4-10-fargate
minReplicas: 0
maxReplicas: 64
metrics:
- type: TotalNumberOfQueuedAndInProgressWorkflowRuns
repositoryNames:
- <some/repo>
However, this doesn't seem to work because runner get stuck on authentication, also it looks like that the runner gets converted into a manager. Here is an example of the log:
2021-06-16T15:34:37.876Z INFO controller-runtime.metrics metrics server is starting to listen {"addr": ":8080"}
2021-06-16T15:34:37.877Z INFO actions-runner-controller Initializing actions-runner-controller {"github-api-cahce-duration": "9m50s", "sync-period": "10m0s", "runner-image": "summerwind/actions-runner:latest", "docker-image": "docker:dind", "common-runnner-labels": null, "watch-namespace": ""}
2021-06-16T15:34:37.877Z INFO controller-runtime.builder Registering a mutating webhook {"GVK": "actions.summerwind.dev/v1alpha1, Kind=Runner", "path": "/mutate-actions-summerwind-dev-v1alpha1-runner"}
2021-06-16T15:34:37.877Z INFO controller-runtime.webhook registering webhook {"path": "/mutate-actions-summerwind-dev-v1alpha1-runner"}
2021-06-16T15:34:37.877Z INFO controller-runtime.builder Registering a validating webhook {"GVK": "actions.summerwind.dev/v1alpha1, Kind=Runner", "path": "/validate-actions-summerwind-dev-v1alpha1-runner"}
2021-06-16T15:34:37.877Z INFO controller-runtime.webhook registering webhook {"path": "/validate-actions-summerwind-dev-v1alpha1-runner"}
2021-06-16T15:34:37.877Z INFO controller-runtime.builder Registering a mutating webhook {"GVK": "actions.summerwind.dev/v1alpha1, Kind=RunnerDeployment", "path": "/mutate-actions-summerwind-dev-v1alpha1-runnerdeployment"}
2021-06-16T15:34:37.877Z INFO controller-runtime.webhook registering webhook {"path": "/mutate-actions-summerwind-dev-v1alpha1-runnerdeployment"}
2021-06-16T15:34:37.877Z INFO controller-runtime.builder Registering a validating webhook {"GVK": "actions.summerwind.dev/v1alpha1, Kind=RunnerDeployment", "path": "/validate-actions-summerwind-dev-v1alpha1-runnerdeployment"}
2021-06-16T15:34:37.877Z INFO controller-runtime.webhook registering webhook {"path": "/validate-actions-summerwind-dev-v1alpha1-runnerdeployment"}
2021-06-16T15:34:37.877Z INFO controller-runtime.builder Registering a mutating webhook {"GVK": "actions.summerwind.dev/v1alpha1, Kind=RunnerReplicaSet", "path": "/mutate-actions-summerwind-dev-v1alpha1-runnerreplicaset"}
2021-06-16T15:34:37.877Z INFO controller-runtime.webhook registering webhook {"path": "/mutate-actions-summerwind-dev-v1alpha1-runnerreplicaset"}
2021-06-16T15:34:37.877Z INFO controller-runtime.builder Registering a validating webhook {"GVK": "actions.summerwind.dev/v1alpha1, Kind=RunnerReplicaSet", "path": "/validate-actions-summerwind-dev-v1alpha1-runnerreplicaset"}
2021-06-16T15:34:37.877Z INFO controller-runtime.webhook registering webhook {"path": "/validate-actions-summerwind-dev-v1alpha1-runnerreplicaset"}
2021-06-16T15:34:37.877Z INFO actions-runner-controller starting manager
2021-06-16T15:34:37.877Z INFO controller-runtime.manager starting metrics server {"path": "/metrics"}
2021-06-16T15:34:37.977Z INFO controller-runtime.webhook.webhooks starting webhook server
2021-06-16T15:34:37.977Z INFO controller-runtime.controller Starting EventSource {"controller": "runnerreplicaset-controller", "source": "kind source: /, Kind="}
2021-06-16T15:34:37.978Z INFO controller-runtime.controller Starting EventSource {"controller": "runnerreplicaset-controller", "source": "kind source: /, Kind="}
2021-06-16T15:34:37.978Z INFO controller-runtime.controller Starting EventSource {"controller": "horizontalrunnerautoscaler-controller", "source": "kind source: /, Kind="}
2021-06-16T15:34:37.978Z INFO controller-runtime.controller Starting EventSource {"controller": "runner-controller", "source": "kind source: /, Kind="}
2021-06-16T15:34:37.979Z INFO controller-runtime.controller Starting EventSource {"controller": "runnerdeployment-controller", "source": "kind source: /, Kind="}
2021-06-16T15:34:37.978Z INFO controller-runtime.certwatcher Updated current TLS certificate
2021-06-16T15:34:37.979Z INFO controller-runtime.webhook serving webhook server {"host": "", "port": 9443}
2021-06-16T15:34:37.979Z INFO controller-runtime.certwatcher Starting certificate watcher
2021-06-16T15:34:38.078Z INFO controller-runtime.controller Starting Controller {"controller": "runnerreplicaset-controller"}
2021-06-16T15:34:38.078Z INFO controller-runtime.controller Starting Controller {"controller": "horizontalrunnerautoscaler-controller"}
2021-06-16T15:34:38.079Z INFO controller-runtime.controller Starting EventSource {"controller": "runner-controller", "source": "kind source: /, Kind="}
2021-06-16T15:34:38.079Z INFO controller-runtime.controller Starting EventSource {"controller": "runnerdeployment-controller", "source": "kind source: /, Kind="}
2021-06-16T15:34:38.079Z INFO controller-runtime.controller Starting Controller {"controller": "runnerdeployment-controller"}
2021-06-16T15:34:38.179Z INFO controller-runtime.controller Starting workers {"controller": "horizontalrunnerautoscaler-controller", "worker count": 1}
2021-06-16T15:34:38.179Z INFO controller-runtime.controller Starting workers {"controller": "runnerreplicaset-controller", "worker count": 1}
2021-06-16T15:34:38.179Z DEBUG actions-runner-controller.horizontalrunnerautoscaler Calculated desired replicas of 1 {"horizontalrunnerautoscaler": "github/4-10-fargate", "suggested": 1, "reserved": 0, "min": 1, "cached": 1, "max": 64}
2021-06-16T15:34:38.179Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "horizontalrunnerautoscaler-controller", "request": "github/4-10-fargate"}
2021-06-16T15:34:38.179Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "runnerreplicaset-controller", "request": "github/4-10-fargate-mfrgk"}
2021-06-16T15:34:38.279Z INFO controller-runtime.controller Starting Controller {"controller": "runner-controller"}
2021-06-16T15:34:38.279Z INFO controller-runtime.controller Starting workers {"controller": "runnerdeployment-controller", "worker count": 1}
2021-06-16T15:34:38.280Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "runnerdeployment-controller", "request": "github/4-10-fargate"}
2021-06-16T15:34:38.379Z INFO controller-runtime.controller Starting workers {"controller": "runner-controller", "worker count": 1}
2021-06-16T15:34:38.380Z INFO actions-runner-controller.runner Skipped registration check because it's deferred until 2021-06-16 15:35:29 +0000 UTC. Retrying in 49.619892818s at latest {"runner": "github/4-10-fargate-mfrgk-9sprx", "lastRegistrationCheckTime": "2021-06-16 15:34:29 +0000 UTC", "registrationCheckInterval": "1m0s"}
2021-06-16T15:35:28.125Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "runnerreplicaset-controller", "request": "github/4-10-fargate-mfrgk"}
2021-06-16T15:35:28.276Z DEBUG actions-runner-controller.runner Runner pod exists but we failed to check if runner is busy. Apparently it still needs more time. {"runner": "github/4-10-fargate-mfrgk-9sprx", "runnerName": "4-10-fargate-mfrgk-9sprx"}
2021-06-16T15:35:28.276Z DEBUG actions-runner-controller.runner Rechecking the runner registration in 1m10.468889844s {"runner": "github/4-10-fargate-mfrgk-9sprx"}
2021-06-16T15:35:28.288Z INFO actions-runner-controller.runner Skipped registration check because it's deferred until 2021-06-16 15:36:28 +0000 UTC. Retrying in 58.711814172s at latest {"runner": "github/4-10-fargate-mfrgk-9sprx", "lastRegistrationCheckTime": "2021-06-16 15:35:28 +0000 UTC", "registrationCheckInterval": "1m0s"}
2021-06-16T15:36:27.136Z DEBUG actions-runner-controller.runner Runner pod exists but we failed to check if runner is busy. Apparently it still needs more time. {"runner": "github/4-10-fargate-mfrgk-9sprx", "runnerName": "4-10-fargate-mfrgk-9sprx"}
2021-06-16T15:36:27.136Z DEBUG actions-runner-controller.runner Rechecking the runner registration in 1m10.283034151s {"runner": "github/4-10-fargate-mfrgk-9sprx"}
2021-06-16T15:36:27.139Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "runnerreplicaset-controller", "request": "github/4-10-fargate-mfrgk"}
2021-06-16T15:36:27.149Z INFO actions-runner-controller.runner Skipped registration check because it's deferred until 2021-06-16 15:37:27 +0000 UTC. Retrying in 58.850736308s at latest {"runner": "github/4-10-fargate-mfrgk-9sprx", "lastRegistrationCheckTime": "2021-06-16 15:36:27 +0000 UTC", "registrationCheckInterval": "1m0s"}
@frbk Thanks. At glance, image: summerwind/actions-runner-controller
you've written in RunnerDeployment spec is indeed wrong, as you are basically saying use this controller image to run this runner
which results in what you see. Or are you saying that Fargate is somehow setting image: summerwind/actions-runner-controller
?
FYI, you can use summerwind/actions-runner
images https://hub.docker.com/r/summerwind/actions-runner/tags?page=1&ordering=last_updated
OMG! Thanks for pointing out that I was using the wrong image. I am going to update it and redeploy. Will update you shortly!
@frbk Thanks for confirming! To be extra sure, let me point out that you should omit env like GITHUB_TOKEN
. Necessary envs are configured by the controller so you shouldn't be required to do it yourself. Please share your latest RunnerDeployment YAML and I can verify if its good/bad!
@mumoshu Here is an updated config which seem to work on fargate:
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: 4-10-fargate
namespace: github
spec:
template:
metadata:
labels:
fargate: "true"
eks.amazonaws.com/fargate-profile: "github"
spec:
repository: <some/repo>
labels:
- 4-10-fargate
resources:
requests:
cpu: "4.0"
memory: "10Gi"
ephemeral-storage: "5Gi"
dockerEnabled: false
image: summerwind/actions-runner
sidecarContainers:
- name: mysql
image: mysql:latest
env:
- name: MYSQL_USER
value: root
- name: MYSQL_ALLOW_EMPTY_PASSWORD
value: "true"
- name: elasticsearch
image: elasticsearch:latest
- name: redis
image: redis:latest
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
name: 4-10-fargate
namespace: github
spec:
scaleTargetRef:
name: 4-10-fargate
minReplicas: 0
maxReplicas: 64
metrics:
- type: TotalNumberOfQueuedAndInProgressWorkflowRuns
repositoryNames:
- <some/repo>
I only had to manually update config for registration-only pod to include labels as I mentioned before.
@frbk Awesome! Thanks a lot for sharing your experience!
I only had to manually update config for registration-only pod to include labels as I mentioned before.
I was thinking about this a bit- this can possibly be automated by just removing this line from actions-runner-controller code:
It would be great if you could try removing the code, building and pushing a custom image by running DOCKER_USER=$YOUR_DOCKERHUB_ACCOUNT_NAME make docker-build docker-push
, and redeploying your controller to see if it resolves your issue 🙏
FYI, you can find definitions for docker-build
and docker-push
targets at https://github.com/actions-runner-controller/actions-runner-controller/blob/f2e2060ff8cbba6ab18e898e240ddf4afd65eb27/Makefile#L120-L122 and https://github.com/actions-runner-controller/actions-runner-controller/blob/f2e2060ff8cbba6ab18e898e240ddf4afd65eb27/Makefile#L137-L139.
Thanks for the info! Will give this a shot.
@mumoshu Tried your suggestions and removing runnerForScaleFromToZero.ObjectMeta.Labels = nil
seemed to work! :tada:
@frbk Awesome! Jus to be sure, did scale to/from zero both worked and replicas numbers shown in kubectl get runnerdeployment
seem correct?
Looks like it @mumoshu . Example of nothing running on the ci:
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
4-10-fargate 0 0 0 0 7m27s
Executed one job on the ci:
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
4-10-fargate 1 1 1 0 18m
For testing purposes I set maxReplicas to 1.
Just finished running a pipeline with 18 jobs in it and it was able to scale up and down with no issues :tada:
@frbk Thanks a lot for confirming! Let me add this to our documentation with a big "thanks to @frbk" note, and also apply the patch https://github.com/actions-runner-controller/actions-runner-controller/issues/631#issuecomment-862959111 to our main branch so that you no longer need to use the fork just for the one-line change.
As this being an open-source and open-development project, I would also appreciate it very much if you could submit any pull request for any of (or even both) changes yourself!
Going to open a pr related to everything we talked about in this issue. Was gathering some info for documentation.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@mumoshu Tried your suggestions and removing runnerForScaleFromToZero.ObjectMeta.Labels = nil seemed to work!
Probably this code change on runnerForScaleFromToZero isn't needed anymore. We no longer create registration-only runners for scale-from-to-zero in recent versions of ARC.
@frbk Hey! How have your fargated-based runners been working since then?
Hey @mumoshu. I have moved away to another company from then, however they were working fine when you didnt need to use docker in docker. I will try to provide a bit more info later this week. Just need to go over my old notes. Also, I see you changes the implementation for scaling from zero. I will try this over the weekend and will let you know.
Give me couple more days. Had to setup a test eks cluster and it took a bit longer than I was expecting. Will update after I try out latest version of controller.
Didn't forget about this. Schedule is a bit all over the place at the moment. 😭
@frbk Thanks! I'm looking forward to your report ☺️
Hi @frbk @mumoshu
I'm looking to implement the runner on fargate as well, anything I should be aware of? does the
runnerForScaleFromToZero.ObjectMeta.Labels = nil
still needed?
@NoamGoren Honestly, I have never tried it myself so I'm afraid I have nothing to share with you! What I can say, FWIW, is that ARC does not rely on registration-only runners anymore. So there may be a chance that it would work without any modifications now.
hi @mumoshu the fargate is still not supporting the privileged containers
, is way around to make the docker work? If disable the docker, then the use cases are very limited.
I thought the privileged containers were only necessary when using the docker sidecar? DinD has many implementations that do not require privileged?
@mumoshu We are trying to use ARC with fargate, and I've come up with a very simple working hello-world deployment config that works, but it requires dockerEnabled: false
in order to run, and I'm not clear on what exactly that entails. What is the Docker in Docker implementation on ARC and why is it important? Will I be able to run docker on my pods at all with this configuration? Here's the config:
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: gg-test-org-deployment-0
spec:
replicas: 1
template:
metadata:
labels:
provider: fargate
spec:
organization: gg-test-org
dockerEnabled: false
Without dind, you cannot use service containers, container-based actions, and container-based steps in GitHub Actions! However, dind requires privileged containers, which are not available in Fargate. Have you already tried Kubernetes container mode in ARC? Perhaps it has more possibility of success, although I have never tried it with Fargate.
After setting dockerEnabled: false, we can use ARC in AWS EKS fargate. Though the runner can do less job without privileged permission, it works well for some simple job such as sync objects between AWS partitions.
I have been trying to get runners deployed on fargate and wasn't able to find any info. So far I encountered couple of issues:
spec:template
which causes it to be stuck in the limbo. I was able to apply those labels using argocd.Here is an example of my config for fargate:
Please let me know if you have any suggestions.