Open kanakaraju17 opened 4 months ago
you need to define those in your podtemplate after declaring the podtemplate yml in the scalesetrunner values.yaml. (terraform below btw)
Hey @jonathan-fileread, is there a way to configure this in the default values.yaml file provided with the gha-runner-scale-set charts?
@kanakaraju17 Hey Kanaka, unfortunately not. you need to create a seperate podtemplate in order to define the workflow pod, as the values.yaml only defines the runner pod settings.
@jonathan-fileread, any idea why the file is not getting mounted in the runner pods? I'm using the following configuration and encountering the error below:
## template is the PodSpec for each runner Pod
## For reference: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#PodSpec
template:
# with containerMode.type=kubernetes, we will populate the template.spec with following pod spec
template:
spec:
securityContext:
fsGroup: 123
containers:
- name: runner
image: ghcr.io/actions/actions-runner:latest
command: ["/home/runner/run.sh"]
env:
- name: ACTIONS_RUNNER_CONTAINER_HOOKS
value: /home/runner/pod-templates/default.yml
- name: ACTIONS_RUNNER_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
value: "false"
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: pod-templates
mountPath: /home/runner/pod-templates
readOnly: true
volumes:
- name: work
ephemeral:
volumeClaimTemplate:
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "gp3"
resources:
requests:
storage: 1Gi
- name: pod-templates
configMap:
name: runner-pod-template
ConfigMap Configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: runner-pod-template
data:
default.yml: |
apiVersion: v1
kind: PodTemplate
metadata:
name: runner-pod-template
spec:
containers:
- name: "$job"
resources:
limits:
cpu: "3000m"
requests:
cpu: "3000m"
The pods fail and end up with the below error:
Error: Error: ENOENT: no such file or directory, open '/home/runner/pod-templates/default.yml'
Error: Process completed with exit code 1.
Have you tried recreating it in your environment? Have you come across this error before? It seems to be a mounting issue where the file is not found.
@kanakaraju17 You can follow the official guide which worked for me at least :)
In your case that would be something like:
ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: hook-extension
data:
content: |
spec:
containers:
- name: "$job"
resources:
limits:
cpu: "3000m"
requests:
cpu: "3000m"
Usage:
template:
spec:
containers:
- name: runner
...
env:
...
- name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
value: /home/runner/pod-template/content
volumeMounts:
...
- name: pod-template
mountPath: /home/runner/pod-template
readOnly: true
volumes:
...
- name: pod-template
configMap:
name: hook-extension
Hey @georgblumenschein, Deploying the gha-runner-scale-set by adding the below env variables doesn't seem to reflect.
template:
template:
spec:
containers:
- name: runner
image: ghcr.io/actions/actions-runner:latest
command: ["/home/runner/run.sh"]
env:
- name: ACTIONS_RUNNER_CONTAINER_HOOKS
value: /home/runner/k8s/index.js
- name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
value: /home/runner/pod-template/content
- name: ACTIONS_RUNNER_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
value: "true"
Additional ENV Variable Added:
- name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
value: /home/runner/pod-template/content
The workflow pods should include the ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE environment variable and volume mount but it doesn't appear when describing the pods. Currently, the output is missing this variable.
Expected Result:
The ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
environment variable and the volume mounts in the workflow pods should be present.
Below are the values.yaml template used to append the environment variable:
template:
template:
spec:
containers:
- name: runner
image: ghcr.io/actions/actions-runner:latest
command: ["/home/runner/run.sh"]
env:
- name: ACTIONS_RUNNER_CONTAINER_HOOKS
value: /home/runner/k8s/index.js
- name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
value: /home/runner/pod-template/content
- name: ACTIONS_RUNNER_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
value: "true"
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: pod-template
mountPath: /home/runner/pod-template
readOnly: true
volumes:
- name: work
ephemeral:
volumeClaimTemplate:
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "local-path"
resources:
requests:
storage: 1Gi
- name: pod-template
configMap:
name: hook-extension
Problem: The pods should have the volumes mounted with the config map and the specified environment variables added. However, this is not happening as expected.
Current Output:
While Describing the AutoscalingRunnerSet doesn't show the ENV variables added either.
Name: arc-runner-kubernetes-ci-arm-large
Namespace: arc-runners-kubernetes-arm
Labels: actions.github.com/organization=curefit
actions.github.com/scale-set-name=arc-runner-kubernetes-ci-arm-large
actions.github.com/scale-set-namespace=arc-runners-kubernetes-arm
app.kubernetes.io/component=autoscaling-runner-set
app.kubernetes.io/instance=arc-runner-kubernetes-ci-arm-large
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=arc-runner-kubernetes-ci-arm-large
app.kubernetes.io/part-of=gha-rs
app.kubernetes.io/version=0.9.3
helm.sh/chart=gha-rs-0.9.3
Annotations: actions.github.com/cleanup-kubernetes-mode-role-binding-name: arc-runner-kubernetes-ci-arm-large-gha-rs-kube-mode
actions.github.com/cleanup-kubernetes-mode-role-name: arc-runner-kubernetes-ci-arm-large-gha-rs-kube-mode
actions.github.com/cleanup-kubernetes-mode-service-account-name: arc-runner-kubernetes-ci-arm-large-gha-rs-kube-mode
actions.github.com/cleanup-manager-role-binding: arc-runner-kubernetes-ci-arm-large-gha-rs-manager
actions.github.com/cleanup-manager-role-name: arc-runner-kubernetes-ci-arm-large-gha-rs-manager
actions.github.com/runner-group-name: arc-runner-kubernetes-ci-arm-large
actions.github.com/runner-scale-set-name: arc-runner-kubernetes-ci-arm-large
actions.github.com/values-hash: 8b5caae634d958cc7d295b3166c151d036c7896d2b6165bf908a6a24aec5320
meta.helm.sh/release-name: arc-runner-set-kubernetes-arm-large
meta.helm.sh/release-namespace: arc-runners-kubernetes-arm
runner-scale-set-id: 76
API Version: actions.github.com/v1alpha1
Kind: AutoscalingRunnerSet
Metadata:
Creation Timestamp: 2024-07-16T09:49:56Z
Finalizers:
autoscalingrunnerset.actions.github.com/finalizer
Generation: 1
Resource Version: 577760766
UID: 165f74f7-875c-4b8f-a214-96948ec38467
Spec:
Github Config Secret: github-token
Github Config URL: https://github.com/curefit
Listener Template:
Spec:
Containers:
Name: listener
Resources:
Limits:
Cpu: 500m
Memory: 500Mi
Requests:
Cpu: 250m
Memory: 250Mi
Node Selector:
Purpose: github-actions
Tolerations:
Effect: NoSchedule
Key: purpose
Operator: Equal
Value: github-actions
Min Runners: 2
Runner Group: arc-runner-kubernetes-ci-arm-large
Runner Scale Set Name: arc-runner-kubernetes-ci-arm-large
Template:
Spec:
Containers:
Command:
/home/runner/run.sh
Env:
Name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
Value: false
Name: ACTIONS_RUNNER_CONTAINER_HOOKS
Value: /home/runner/k8s/index.js
Name: ACTIONS_RUNNER_POD_NAME
Value From:
Field Ref:
Field Path: metadata.name
Image: ghcr.io/actions/actions-runner:latest
Name: runner
Volume Mounts:
Mount Path: /home/runner/_work
Name: work
Node Selector:
Purpose: github-actions
Restart Policy: Never
Security Context:
Fs Group: 1001
Service Account Name: arc-runner-kubernetes-ci-arm-large-gha-rs-kube-mode
Tolerations:
Effect: NoSchedule
Key: purpose
Operator: Equal
Value: github-actions
Volumes:
Ephemeral:
Volume Claim Template:
Spec:
Access Modes:
ReadWriteOnce
Resources:
Requests:
Storage: 5Gi
Storage Class Name: gp3
Name: work
Status:
Current Runners: 2
Pending Ephemeral Runners: 2
Events: <none>
Below is the configmap file which is being used:
apiVersion: v1
kind: ConfigMap
metadata:
name: hook-extension
namespace: arc-runners-kubernetes-arm
data:
content: |
spec:
containers:
- name: "$job"
resources:
limits:
cpu: "3000m"
requests:
cpu: "3000m"
expected behavior: The ENV variable ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE getting added along with the volume mounts along the pods which will come up.
Hey @kanakaraju17 ,
After 2 days of trail and error I managed to get a working scenario with resource limits applied. Funny thing is we were overcomplicating it using the "hook-exensions". All we need to is add it in the template.spec.containers[0].resources.requests/limits
section.
Below is a snippet of the values to pass into Helm (although I am using a HelmRelease with FluxCD, the principle still applies):
values:
containerMode:
type: "kubernetes"
kubernetesModeWorkVolumeClaim:
accessModes: ["ReadWriteOnce"]
storageClassName: "standard"
resources:
requests:
storage: 10Gi
githubConfigSecret: gh-secret
githubConfigUrl : "https://github.com/<Organisation>"
runnerGroup: "k8s-nonprod"
runnerScaleSetName: "self-hosted-k8s" # used as a runner label
minRunners: 1
maxRunners: 10
template:
spec:
securityContext:
fsGroup: 1001
imagePullSecrets:
- name: cr-secret
containers:
- name: runner
image: ghcr.io/actions/actions-runner:latest
command: ["/home/runner/run.sh"]
resources:
limits:
cpu: "2000m"
memory: "5Gi"
requests:
cpu: "200m"
memory: "512Mi"
I have confirmed that this has been working for me with some CodeQL workflows failing due to "insufficient RAM" lol.
Hope it helps.
@marcomarques-bt, I assume that the above configuration works only for runner pods and not the pods where the workflow runs i.e. the workflow pods. The above only works for runner pods.
Refer to the image below, the configuration works for the first pod and not the second pod where the actual job runs.
It seems that, similar to the issue mentioned earlier, toleration cannot be configured either.
Checks
Controller Version
0.9.2
Deployment Method
Helm
Checks
To Reproduce
Describe the bug
The runner pods, which have names ending with "workflow," should have the specified resource requests and limits for CPU and memory when they are created.
Describe the expected behavior
The workflow pod that is created during the pipeline execution should have specific CPU and memory limits and requests set. However, it is not starting with the specified resources and limits.
Additionally, an extra pod is being created when the pipeline runs, alongside the existing runner pods. We need to understand the purpose of the existing runner pod if a new pod is also being initiated. Added the detail of the extra pod in the screenshot below.
Additional Context
Controller Logs
Runner Pod Logs