Open omniproc opened 8 months ago
k8s hooks are technically (means there is no documentation, the docker compose examples use dind + docker hooks) already usable with Gitea Actions see this third party runner adapter https://gitea.com/gitea/awesome-gitea/pulls/149
Actions-Runner-Controller would require emulation of a bigger set of internal githib actions api
I actually find this interesting to reverse engineer that product too, but I never dealt with k8s myself.
_act_runner
with it's act
backend doesn't support container hooks or k8s for the time beeing_
Interesting. I wasn't aware you could change the runner implementation just like that. Def will look into it. However given what you said about DinD still being a requirement I don't think it will change much (we already have our runners on K8s with DinD using a adopted version of gitea/act-runner for k8s but as mentioned, this comes with many headaches).
The goal IMHO would be to be able to start workflows on k8s directly. Possible implementations:
Option one (every job is it's own pod) seems like the most promissing option in my opinion.
However given what you said about DinD still being a requirement I don't think it will change much
I meant, I didn't create any k8s mode examples / actually tried it yet. Sorry for confusion here.
The docker container hooks only allow dind for k8s. While the k8s hooks should use kubernetes api for container management, I still need to look into creating a test setup running.
I can imagine
_Well not using actrunner has limitations when you try to use Gitea Actions Extensions (using features not present in GitHub Actions)
I think option 1 is more likly to happen than option 2. Job scheduling is based on jobs not on workflows.
k8shooks works for me using these files on minikube (arm64)
actions-runner-k8s-gitea-sample-files.zip
With clever sharing of the runner credentials volume, you could start a lot of replicas for more parallel runners
This works without dind
Test workflow
on: push
jobs:
_:
runs-on: k8s # <-- Used runner label
container: ubuntu:latest # <-- Required, maybe the Gitea Actions adapter could insert a default
steps:
# Git is needed for actions/checkout to work for Gitea, rest api is not compatible
- run: apt update && apt install -y git
- uses: https://github.com/actions/checkout@v3 # <-- The almost only Gitea Extension supported
- run: ls -la
- run: ls -la .github/workflows
The runner-pod-workflow is the job container pod, running directly via k8s.
Looks promising. I'll give it a shot and share my findings.
Okay, so... there seems to be some issues with the current setup. Let me share my findings:
- name: GITEA_RUNNER_REGISTRATION_TOKEN
valueFrom:
secretKeyRef:
name: secret_name
key: secret_key
and creating your secret with (take care: K8s is case sensitive):
apiVersion: v1
kind: Secret
metadata:
name: secret_name
type: Opaque
stringData:
secret_key: "s3cr3t"
You shouldn't start pods in K8s directly but rather wrap them into a higher level resource such as a deployment which will make it benefit from the (deployment) controller logic when updating or self-healing the pod. I did that so the result looks something like this:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: runner
name: runner
spec:
replicas: 1
selector:
matchLabels:
app: runner
template:
metadata:
labels:
app: runner
spec:
strategy:
type: Recreate
restartPolicy: Always
serviceAccountName: ci-builder
#securityContext:
# runAsNonRoot: true
# runAsUser: 1000
# runAsGroup: 1000
# seccompProfile:
# type: RuntimeDefault
volumes:
- name: workspace
emptyDir:
sizeLimit: 5Gi
containers:
- name: runner
image: ghcr.io/christopherhx/gitea-actions-runner:v0.0.11
#securityContext:
# readOnlyRootFilesystem: true
# allowPrivilegeEscalation: false
# capabilities:
# drop:
# - ALL
volumeMounts:
- mountPath: /home/runner/_work
name: workspace
env:
- name: ACTIONS_RUNNER_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
value: "true"
- name: ACTIONS_RUNNER_CONTAINER_HOOKS
value: /home/runner/k8s/index.js
- name: GITEA_INSTANCE_URL
value: https://foo.bar
- name: GITEA_RUNNER_REGISTRATION_TOKEN
valueFrom:
secretKeyRef:
name: gitea
key: token
- name: GITEA_RUNNER_LABELS
value: k8s
resources:
requests:
cpu: 500m
memory: 2Gi
limits:
cpu: 1000m
memory: 8Gi
Few changes I made here:
emptyDir
can act as a temporary volume to share data between containers of a pod and write data to a well known location. securityContext
but needed to disable it for now for trouble shooting since it currently can't work as needed because of some issues with the current runner setup:
runner
user using it's name in USER runner
. K8s doesn't like that if runAsNonRoot
is specified but no runAsUser
is given in the security context and the image is using a "non-numeric" user. I'd opt in for using USER 1000
in the Dockerfile instead, which should make this easier in the future.allowPrivilegeEscalation: false
can't currently be used because start.sh makes use of sudo to create the folder layout: sudo chown -R runner:docker /home/runner/_work
and sudo chown -R runner:docker /data
. I think a better approach would be to just create those folders within the mounted EmptyDir volume. The running user should already have all permissions there to create the folders so no sudo would be needed but I'm not sure what those folders are currently used for and how hardcoded those paths are. readOnlyRootFilesystem
will pbl. also cause issues in the future when other paths then the mounted volume is used and again, I think the easiest way to allow for max. container security in k8s would be to simply not use the root fs at all but simply do everything on the mounted volume.So, those are simply improvement suggestions for the future. For now as you can see I've been trying to keep it as simple as possible, but I still run into a issue. The runner starts and registers, but when using the job you provided I run into the following error returned by the job:
[WORKER 2024-03-12 15:59:08Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2024-03-12 15:59:08Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2024-03-12 15:59:08Z INFO HostContext] Well known config file 'Credentials': '/home/runner/.credentials'
[WORKER 2024-03-12 15:59:08Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2024-03-12 15:59:08Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2024-03-12 15:59:08Z INFO HostContext] Well known config file 'Runner': '/home/runner/.runner'
[WORKER 2024-03-12 15:59:08Z INFO Worker] Version: 2.314.0
[WORKER 2024-03-12 15:59:08Z INFO Worker] Commit: bc79e859d7b66e8018716bc94160656f6c6948fc
[WORKER 2024-03-12 15:59:08Z INFO Worker] Culture:
[WORKER 2024-03-12 15:59:08Z INFO Worker] UI Culture:
[WORKER 2024-03-12 15:59:08Z INFO Worker] Waiting to receive the job message from the channel.
[WORKER 2024-03-12 15:59:08Z INFO ProcessChannel] Receiving message of length 6322, with hash '30564f1b4d3e28c3d9cc39d17eca1132cc026a2abeb6ab1be6736d80cf019ea9'
[WORKER 2024-03-12 15:59:08Z INFO Worker] Message received.
Newtonsoft.Json.JsonReaderException: Invalid character after parsing property name. Expected ':' but got: . Path 'ContextData.github.d[20].v.d[5].v.d[14].v.d[11].v', line 1, position 6322.
at Newtonsoft.Json.JsonTextReader.ParseProperty()
at Newtonsoft.Json.JsonTextReader.ParseObject()
at Newtonsoft.Json.Linq.JContainer.ReadContentFrom(JsonReader r, JsonLoadSettings settings)
at Newtonsoft.Json.Linq.JContainer.ReadTokenFrom(JsonReader reader, JsonLoadSettings options)
at Newtonsoft.Json.Linq.JObject.Load(JsonReader reader, JsonLoadSettings settings)
at Newtonsoft.Json.Linq.JObject.Load(JsonReader reader)
at GitHub.DistributedTask.Pipelines.ContextData.PipelineContextDataJsonConverter.ReadJson(JsonReader reader, Type objectType, Object existingValue, JsonSerializer serializer)
at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.DeserializeConvertable(JsonConverter converter, JsonReader reader, Type objectType, Object existingValue)
at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.PopulateDictionary(IDictionary dictionary, JsonReader reader, JsonDictionaryContract contract, JsonProperty containerProperty, String id)
at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateObject(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateValueInternal(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.SetPropertyValue(JsonProperty property, JsonConverter propertyConverter, JsonContainerContract containerContract, JsonProperty containerProperty, JsonReader reader, Object target)
at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.PopulateObject(Object newObject, JsonReader reader, JsonObjectContract contract, JsonProperty member, String id)
at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateObject(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateValueInternal(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.Deserialize(JsonReader reader, Type objectType, Boolean checkAdditionalContent)
at Newtonsoft.Json.JsonSerializer.DeserializeInternal(JsonReader reader, Type objectType)
at Newtonsoft.Json.JsonSerializer.Deserialize(JsonReader reader, Type objectType)
at Newtonsoft.Json.JsonConvert.DeserializeObject(String value, Type type, JsonSerializerSettings settings)
at Newtonsoft.Json.JsonConvert.DeserializeObject[T](String value, JsonSerializerSettings settings)
at GitHub.Runner.Sdk.StringUtil.ConvertFromJson[T](String value)
at GitHub.Runner.Worker.Worker.RunAsync(String pipeIn, String pipeOut)
at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
[WORKER 2024-03-12 15:59:09Z ERR Worker] Newtonsoft.Json.JsonReaderException: Invalid character after parsing property name. Expected ':' but got: . Path 'ContextData.github.d[20].v.d[5].v.d[14].v.d[11].v', line 1, position 6322.
[WORKER 2024-03-12 15:59:09Z ERR Worker] at Newtonsoft.Json.JsonTextReader.ParseProperty()
[WORKER 2024-03-12 15:59:09Z ERR Worker] at Newtonsoft.Json.JsonTextReader.ParseObject()
[WORKER 2024-03-12 15:59:09Z ERR Worker] at Newtonsoft.Json.Linq.JContainer.ReadContentFrom(JsonReader r, JsonLoadSettings settings)
[WORKER 2024-03-12 15:59:09Z ERR Worker] at Newtonsoft.Json.Linq.JContainer.ReadTokenFrom(JsonReader reader, JsonLoadSettings options)
[WORKER 2024-03-12 15:59:09Z ERR Worker] at Newtonsoft.Json.Linq.JObject.Load(JsonReader reader, JsonLoadSettings settings)
[WORKER 2024-03-12 15:59:09Z ERR Worker] at Newtonsoft.Json.Linq.JObject.Load(JsonReader reader)
[WORKER 2024-03-12 15:59:09Z ERR Worker] at GitHub.DistributedTask.Pipelines.ContextData.PipelineContextDataJsonConverter.ReadJson(JsonReader reader, Type objectType, Object existingValue, JsonSerializer serializer)
[WORKER 2024-03-12 15:59:09Z ERR Worker] at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.DeserializeConvertable(JsonConverter converter, JsonReader reader, Type objectType, Object existingValue)
[WORKER 2024-03-12 15:59:09Z ERR Worker] at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.PopulateDictionary(IDictionary dictionary, JsonReader reader, JsonDictionaryContract contract, JsonProperty containerProperty, String id)
[WORKER 2024-03-12 15:59:09Z ERR Worker] at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateObject(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
[WORKER 2024-03-12 15:59:09Z ERR Worker] at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateValueInternal(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
[WORKER 2024-03-12 15:59:09Z ERR Worker] at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.SetPropertyValue(JsonProperty property, JsonConverter propertyConverter, JsonContainerContract containerContract, JsonProperty containerProperty, JsonReader reader, Object target)
[WORKER 2024-03-12 15:59:09Z ERR Worker] at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.PopulateObject(Object newObject, JsonReader reader, JsonObjectContract contract, JsonProperty member, String id)
[WORKER 2024-03-12 15:59:09Z ERR Worker] at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateObject(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
[WORKER 2024-03-12 15:59:09Z ERR Worker] at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateValueInternal(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
[WORKER 2024-03-12 15:59:09Z ERR Worker] at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.Deserialize(JsonReader reader, Type objectType, Boolean checkAdditionalContent)
[WORKER 2024-03-12 15:59:09Z ERR Worker] at Newtonsoft.Json.JsonSerializer.DeserializeInternal(JsonReader reader, Type objectType)
[WORKER 2024-03-12 15:59:09Z ERR Worker] at Newtonsoft.Json.JsonSerializer.Deserialize(JsonReader reader, Type objectType)
[WORKER 2024-03-12 15:59:09Z ERR Worker] at Newtonsoft.Json.JsonConvert.DeserializeObject(String value, Type type, JsonSerializerSettings settings)
[WORKER 2024-03-12 15:59:09Z ERR Worker] at Newtonsoft.Json.JsonConvert.DeserializeObject[T](String value, JsonSerializerSettings settings)
[WORKER 2024-03-12 15:59:09Z ERR Worker] at GitHub.Runner.Sdk.StringUtil.ConvertFromJson[T](String value)
[WORKER 2024-03-12 15:59:09Z ERR Worker] at GitHub.Runner.Worker.Worker.RunAsync(String pipeIn, String pipeOut)
[WORKER 2024-03-12 15:59:09Z ERR Worker] at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
##[Error]failed to execute worker exitcode: 1
/Edit so the root cause seems to be somewhere here: https://github.com/actions/runner/blob/v2.314.0/src/Runner.Worker/Program.cs#L20
In addition I found that providing a runner config by mounting one and setting the CONFIG_FILE
env var doesn't seem to work, you'll get a Error: unknown flag: --config
if you try. Root cause seems to be this.
I didn't got this this kind of error before (at least for a year)
Receiving message of length 6322, with hash '30564f1b4d3e28c3d9cc39d17eca1132cc026a2abeb6ab1be6736d80cf019ea9' [WORKER 2024-03-12 15:59:08Z INFO Worker] Message received. Newtonsoft.Json.JsonReaderException: Invalid character after parsing property name. Expected ':' but got: . Path 'ContextData.github.d[20].v.d[5].v.d[14].v.d[11].v', line 1, position 6322.
Sounds like the message inside the container got trimmed before it reached the actions/runner.
Based on the error the begin was sent to the actions/runner successfully
Maybe some data specfic to your test setup might cause this. (even parts not in the repo are stored in the message)
I would need to add more debug logging to diagnose this
If you add the logging I can reproduce the issue if you like. My guess is that's it's maybe proxy related. But can't tell from the error logs.
@omniproc you made changes via the deployment file that are not compatible with actions/runner k8s container hooks and I have no idea if using a deployment is possible. Actions-Runner-Controller might use helm charts + kubernetes api, not shure how they do that.
Unable to attach or mount volumes: unmounted volumes=[work], unattached volumes=[], failed to process volumes=[work]: error processing PVC default/runner-785778b969-v88f8-work: failed to fetch PVC from API server: persistentvolumeclaims "runner-785778b969-v88f8-work" not found
the workspace cannot be an empty dir volume, like in my example files it is required to be a persistentvolumeclaim
You can technically change the name of the pvc via ACTIONS_RUNNER_CLAIM_NAME
env, but I don't know how to get a dynamically generated name of a volume. See https://github.com/actions/runner-container-hooks/blob/main/packages/k8s/README.md, if that doesn't match it will error out.
allowPrivilegeEscalation: false
can't currently be used because start.sh makes use of sudo to create the folder layout:sudo chown -R runner:docker /home/runner/_work
andsudo chown -R runner:docker /data
. I think a better approach would be to just create those folders within the mounted EmptyDir volume. The running user should already have all permissions there to create the folders so no sudo would be needed but I'm not sure what those folders are currently used for and how hardcoded those paths are.
This led mkdir /data
fail and you get an error about a .runner
file.
Would require an empty dir mount
- mountPath: /data
name: data
Maybe if I create that dir in the Dockerfile it would work without that as long your fs is read write
The nightly doesn't have sudo anymore in the start.sh file, but it can still certainly break existing non k8s setups as of now.
If you add the logging I can reproduce the issue if you like. My guess is that's it's maybe proxy related. But can't tell from the error logs.
I found a mistake in the python wrapper file, probably due to resource constaints to RAM has os.read read less than expected and shorten the message.
I also added some asserts about return values of pipe communication + env ACTIONS_RUNNER_WORKER_DEBUG
would print the job message from python side.
Please try to use that nightly image https://github.com/ChristopherHX/gitea-actions-runner/pkgs/container/gitea-actions-runner/190660665?tag=nightly important change to the os/arch tab and copy full tag + sha variant, I had problems with old cached nightly images.
it should get you to the point that you omited the persistentvolumeclaims of my example and kubernetes cannot start the job pod (also make shure to create an empty dir mount at /data/)
I'm now able to start the runner in k8s namespace with DinD mode. How can I scale up the runners by setting replica=2 or 3?
@ChristopherHX Hi, an interesting project there!
Just a little advice here:
You can technically change the name of the pvc via ACTIONS_RUNNER_CLAIM_NAME env, but I don't know how to get a dynamically generated name of a volume.
I used StatefulSet and its volumeClaimTemplates functionality to dynamically provision PVCs and get its PVC names into the container as env var. You can use https://kubernetes.io/docs/tasks/inject-data-application/define-interdependent-environment-variables/ to define env var that's dependent on another.
Like the following:
volumeClaimTemplates:
- metadata:
name: work
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
and refer as
env:
- name: ACTIONS_RUNNER_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: ACTIONS_RUNNER_CLAIM_NAME
value: work-$(ACTIONS_RUNNER_POD_NAME)
A full working example that I tested is also available from https://github.com/traPtitech/manifest/blob/3ff7e8e6dfa3e0e4fed9a9e8ca1ad09f9b132ff1/gitea/act-runner/gitea-act-runner.yaml.
Thanks for your example, it makes manual scaling pretty straightforward and works in minikube for testing purposes even with 4 replicas.
The first time I read your response I thought work-$(ACTIONS_RUNNER_POD_NAME)
looks like dark magic as a kubernetes newby, since it looks like an indirect resource naming assumption
@ChristopherHX
is it possible that the runner doesn't yet support no-proxy? With the http_proxy / https_proxy and no_proxy env vars set, I see the runner using the proxy:
[WORKER 2024-07-23 12:09:08Z INFO HostContext] Configuring anonymous proxy http://my.proxy/ for all HTTP requests.
[WORKER 2024-07-23 12:09:08Z INFO HostContext] Configuring anonymous proxy http://my.proxy/ for all HTTPS requests.
but it doesn't mention the no_proxy setting and later on errors when trying to connect to itself using it's pod IP (which is in the no_proxy list)
[WORKER 2024-07-23 12:09:09Z ERR GitHubActionsService] GET request to http://172.27.1.66:42791/_apis/connectionData?connectOptions=1&lastChangeId=-1&lastChangeId64=-1 failed. HTTP Status: Forbidden
I didn't go myself through the limitations of actions/runner proxy support
They seem to ignore ip exclusions
We will not support IP addresses for
no_proxy
, only hostnames.
Not shure how my gitea runner can switch to hostnames, maybe try to reverse dns the ip and automatically add it to NO_PROXY?
You can simply use something like this to add it to no-proxy:
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: no_proxy
value: localhost,127.0.0.1,.local,$(POD_IP)
But that won't work if they ignore IP addresses for no_proxy (as in fact, I tested it and it doesn't work). So, why does the runner try to contact itself via it's external interface anyway? Why not use localhost
?
Besides you could always use the DNS service built in k8s, but that would only work if that DNS name is used by the runner instead of the IP, see https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pods
Why not use
localhost
?
I can send a single hostname/ip + port to the actions/runner
If I send localhost
If my gitea runner adapter would be part of the Gitea backend, we would have a real hostname that forwards into nested containers, like the ARC has it
well, then let's use k8s DNS service. That would work.
I can send a single hostname/ip + port to the actions/runner
Can you point me to where this is done? Can this be configured?
Can you point me to where this is done? Can this be configured?
Here I set the connection address, but there is some url entries, maybe one of them is still pointing to the gitea instance without redirection https://github.com/ChristopherHX/gitea-actions-runner/blob/main/runtime/task.go#L734-L745
So, if I understand that correct nektos artifactcache allows you to set outboundIP
as endpoint in StartHandler. Although the name implies an IP address it is actually just a string and there seems to be no validation of it, besides a fallback to use the interface IP if none was provided. But it seems like in gitea-actions-runner always the interface IP is provided to the handler.
Because gitea-act-runner
always uses an IP and actions/runner doesn't support no_proxy
for IPs as you pointed out, this means currently there is no functional proxy support.
So, what do you think about making the IP configurable via an environment variable? GITEA_CACHE_HOST
or something similar?
Is it currently viable to run gitea actions on k8s or is this still very much a work in progress?
So, what do you think about making the IP configurable via an environment variable?
GITEA_CACHE_HOST
or something similar?
Yes I agree mostly about this. However I wouldn't put CACHE into the env. More something like GITEA_ACTIONS_RUNNER_HOST, because a fake actions runtime is also implemented in the runner (used more than one port for tcp listeners).
Eventually if unset prefer hostnames over ips, but that needs testing on my side. (probably behind a feature env var)
I would queue this tomorrow into my todo list, working on multiple projects...
@omniproc Proxy should now work in tag v0.0.13
Use the following env
- name: GITEA_ACTIONS_RUNNER_RUNTIME_USE_DNS_NAME
value: '1'
- name: GITEA_ACTIONS_RUNNER_RUNTIME_APPEND_NO_PROXY # this appends the dns name of the pod to no_proxy
value: '1'
- name: http_proxy
value: http://localhost:2939 # some random proxy address for testing, use the real one
- name: no_proxy
value: .fritz.box,10.96.0.1 # first exclusion for gitea, second for kubernetes adjust as needed
@TomTucka
Is it currently viable to run gitea actions on k8s or is this still very much a work in progress?
This pretty much depends on your requirements, so more would try to use it so more issues can be found & fixed
@ChristopherHX testing v0.0.13... getting closer... now TLS errors. I'm not sure from the log output if this happens due to the mentioned RFC 6066 issue (I was under the impression that now DNS names will be used so not sure why this is logged anyway) or because the CA of the proxy is missing. I'll try to mount the CA to the runner and see what happens. First have to find out what location it's looking up for trusted CAs.
Current runner version: '2.317.0'
Secret source: Actions
Runner is running behind proxy server 'http://myproxy:8080/' for all HTTP requests.
Runner is running behind proxy server 'http://myproxy:8080/' for all HTTPS requests.
Prepare workflow directory
Prepare all required actions
Getting action download info
Download action repository 'https~//github.com/actions/checkout@v4' (SHA:N/A)
Complete job name: test
##[group]Run '/home/runner/k8s/index.js'
shell: /home/runner/externals/node16/bin/node {0}
##[endgroup]
(node:50) [DEP0123] DeprecationWarning: Setting the TLS ServerName to an IP address is not permitted by RFC 6066. This will be ignored in a future version.
(Use `node --trace-deprecation ...` to show where the warning was created)
##[error]Error: unable to get local issuer certificate
##[error]Process completed with exit code 1.
##[error]Executing the custom container implementation failed. Please contact your self hosted runner administrator.
##[group]Run '/home/runner/k8s/index.js'
shell: /home/runner/externals/node16/bin/node {0}
##[endgroup]
(node:61) [DEP0123] DeprecationWarning: Setting the TLS ServerName to an IP address is not permitted by RFC 6066. This will be ignored in a future version.
##[error]Error: unable to get local issuer certificate
(Use `node --trace-deprecation ...` to show where the warning was created)
##[error]Process completed with exit code 1.
##[error]Executing the custom container implementation failed. Please contact your self hosted runner administrator.
Cleaning up orphan processes
Finished
@omniproc nodejs ignores ca certs from common locations on linux and does it's own thing, point env NODE_EXTRA_CA_CERTS to your certbundle file including your kubernetes api cert chain
that cert bundle needs to be mounted to the runner container.
I assume this undescriptive very short error comes from kubernetes api access via https from node Got somthing similar short if I didn't add it to no_proxy and my proxy didn't even exist.
For the dind backend I wrote an provisions script for my self signed certs for all containers run by actions/runner, I could look into creating containers using modified k8s hooks for cert provisioning.
By default every container you use is assumed to have env NODE_EXTRA_CA_CERTS set and the ca folders populated if you use selfsigned certs, not really practicable...
EDIT Is your kubernetes api accessed by your proxy?
@ChristopherHX I can confirm it's working. It was two issues (as you expected):
A few UX improvement suggestions here from my side. As a user when I configure no_proxy I usually only have the URLs in my mind that i know should not be proxied but have to be reached by the runner. I know them because I usually configure them explicitly in my pipeline (e.g. Git repo). What I don't know is what other stuff the runner has to reach. Of course, on second thought, it's obvious why Node tries to reach the K8s API. But since it's the runner who wants to reach it I think it should be the runner's responsibility to setup everything it can to make this happen. So my suggestion would be:
So now that the runner starts a new pod for the workflow I was trying to get DinD to work in it using catthehacker/ubuntu:act-22.04 as the job container image, which doesn't work since the docker socket is not available. I know that in theory it's possible because gitea/act_runner:nightly-dind-rootless can run DinD but that image is of course missing all the Act components.
So before I start fiddling around building a hybrid of catthehacker and dind-rootless: how did you get DinD to run @ChristopherHX ?
@omniproc I might have caused confusion here, I have not set up dind in the job container yet.
Did this only for the runner (that is by default a docker cli client) outside of kubernetes
I would expect using a custom fork of https://github.com/actions/runner-container-hooks could configure a dind installation installed on the external tools (the folder that has node20 etc. for the runner) on any job container
in theory it's possible because gitea/act_runner:nightly-dind-rootless can run DinD
This is similar like I did it e.g. in docker compose (docker hook mode), https://github.com/ChristopherHX/gitea-actions-runner/blob/main/examples/docker-compose-dind-rootless/docker-compose.yml
This only works if you don't use the k8s container hooks, but I'm not shure if the docker.sock bind mount works in that setup as I didn't make use of it
This approuch has flaws if you try to run the following you get strange bugs
@ChristopherHX so I got a working prototype of this. Instead of using DinD, which arguably is a security nightmare more often then not (or comes with many limitations as of today when running unprivileged), I switched to buildkit, which doesn't require any privileges and can be executed in a non-root pod. So the process currently looks like so:
catthehacker/ubuntu:act
extended by a installation of buildkit (buildkitctl actually) to maximize compatibility with many Github actions.moby/buildkit:master-rootless
. Currently I'm running the buildkit container as a sidecar container of the gitea-actions-runner stateful set, so this would scale along with the stateful set. Other szenarios could be done as long as the builtkit container is reachable from the workflow pod.@ChristopherHX is it possible that currently we can not pass environment variables using the env
parameter? E.g.:
on:
push
jobs:
test:
runs-on: myrunner
container:
image: ghcr.io/catthehacker/ubuntu:act-22.04
env:
FOO: BAR
In this case FOO
will not be set
@omniproc
@ChristopherHX is it possible that currently we can not pass environment variables using the
env
parameter? E.g.:
No Idea, I tried the following and it passes my test. (both ways to do that in a container job)
name: Gitea Actions Demo
on: [push]
jobs:
build-docker:
runs-on: trap-cluster
container:
image: buildpack-deps:noble
env:
MY_IMAGE_VAR: foo
env:
MY_GLOBAL_VAR: foo
steps:
- name: Checkout
run: |
echo "MY_GLOBAL_VAR: $MY_GLOBAL_VAR"
echo "MY_IMAGE_VAR: $MY_IMAGE_VAR"
Current runner version: '2.317.0'
Secret source: Actions
Runner is running behind proxy server 'http://localhost:2939' for all HTTP requests.
Prepare workflow directory
Prepare all required actions
Complete job name: build-docker
##[group]Run '/home/runner/k8s/index.js'
shell: /home/runner/externals/node16/bin/node {0}
##[endgroup]
Checkout
3s
##[group]Run echo "MY_GLOBAL_VAR: $MY_GLOBAL_VAR"
echo "MY_GLOBAL_VAR: $MY_GLOBAL_VAR"
echo "MY_IMAGE_VAR: $MY_IMAGE_VAR"
shell: sh -e {0}
env:
MY_GLOBAL_VAR: foo
##[endgroup]
##[group]Run '/home/runner/k8s/index.js'
shell: /home/runner/externals/node16/bin/node {0}
##[endgroup]
MY_GLOBAL_VAR: foo
MY_IMAGE_VAR: foo
Complete job
5s
##[group]Run '/home/runner/k8s/index.js'
shell: /home/runner/externals/node16/bin/node {0}
##[endgroup]
Cleaning up orphan processes
Finished
MY_GLOBAL_VAR is expected to be echoed for every step like in my log while MY_IMAGE_VAR is absent in ${{ env.* }}
@ChristopherHX you're right. I expected MY_IMAGE_VAR to be available in the env scope. It is not. However it is available in the steps when running a shell. Also it is visible in the env of the pod definition. The MY_GLOBAL_VAR on the other hand is available in the env scope within the action and can be access from the shell but is not visible in the pod spec. Interesting. I didn't know that difference between those two envs before. Thanks for clarification.
from my experience with different tools that work with kuberentes:
I've been following this for some time now as I'd really love to switch over to Gitea actions and move away from Jenkins as a CI/CD tool, the main big thing that is preventing from creating a case for it is exactly this, a way to dynamically create temporary pods that only live for as long as the 'Jenkins job/Gitea workflow' so that resourcing can be controlled in a native Kubernetes way.
I agree with everything @querplis has said above, and can state that the Jenkins Kubernetes plugin has the same functionality.
podTemplate
yaml files to be used for different job scripts so that a Go service build pipeline has a Go container or a Java service pipeline has a Maven or Gradle container, etc@Mo0rBy
big thing that is preventing from creating a case for it is exactly this, a way to dynamically create temporary pods that only live for as long as the 'Jenkins job/Gitea workflow' so that resourcing can be controlled in a native Kubernetes way.
This already works, as documented in this GH issue.
@querplis
controller is not really needed , it is possible to communicate directly with kubernetes api and create pods.
It's not needed but it's good practices to separate concerns ( and argueably this design also leads to single responsibility and possibly open-closed). The job of the controller is to talk to and monitor the K8s API and based on that do what it must on the backend system, and/or - vice versa - wait for instructions from the external system (Gitea) to get instructions and perform the required tasks in K8s.
since all of the jobs are one shoot and we don't need to ensure any uptime for job, we can judge success or failure from exit codes of whatever we run in containers, there is no need for anything more then pod with containers in it, and containers in pod can communicate with each other, which gives opportunity to run , for example, disposable postgres for tests, with very simple setup.
True, simple pod with multiple containers might do for most cases. More complex scenarios however might require pre or post steps. Take a look at the ecosystems of FluxCD and ArgoCD and how they involved from basicly what you argue for to something much bigger and more complex. But I agree that for an initial implementation having granular control over a pod - started as k8s job or simple pod - is good enough. However at that point, since it's just as easy as applying a manifest against the K8s API, why limit it? Just leave it up to the user what he wants to define in the manifest and have it applied as the Gitea action workflow starts. A simple label system could signal Gitea what pod to consider important for the workflow to fail (e.g. apply a label gitea-action-must-succeed to all pods that Gitea will consider relevant for the workflow to succeed).
we might need an abilty to define what containers we need in pod, resources, commands, args, node selectors and affinities, some tools just allow user to write its own manifest and inject its own container in it.
Same here. Simply allow the user to supply a K8s manifest with the workflow and the controller would apply it to k8s.
if secrets are stored in kubernetes other pods, that run in same namespace, have a chance to access them, which might be security issue.
Secrets in Kubernetes are not designed to be "secret". They are simply configmaps that can be protected using RBAC. Within a namespace, all pods can access them, just like any configmap. Don't want that? Create a separate namespace, use RBAC. Don't try to re-invent the wheel. It will only break 90% of k8s tools you might need since they pretty much all expect you to use secrets the way they're ment to be used.
Hi @omniproc , @ChristopherHX ,
I've set up my runner according to all of the examples you've provided here. However, I've come to a standstill, where I cannot clone a repository.
My runner is based on @ChristopherHX 's image: ghcr.io/christopherhx/gitea-actions-runner:v0.0.13
I've tried with every container in my workflow: ubuntu:latest, ghcr.io/omniproc/act-buildkit-runner:latest, ghcr/catthehacker/ubuntu:act-22.04
At this step it always fails
- name: checkout repository
- uses: https://github.com/actions/checkout@v4
With this error:
env: '/__e/node20/bin/node': No such file or directory
How did you fix it? We're there any additional environmental variables added to the workflow or the runner itself?
Thanks for everything you guys have done so far by the way!
Seems like you either did not install node or your nodejs env var is pointing to an empty dir. The image will not just have any binary you need. Either you build your own image that bundles nodejs or you use one of the many install nodejs actions available as a pre step in your workflow.
I don't have any node dependencies so I never tested node builds.
@omniproc
Yeah my problem is that I don't have anything that uses node. I'm just trying to check out my repository. Basically with your example I cannot check out my repository.
This is my workflow
name: Gitea Actions Demo
run-name: Testing out Gitea Actions 🚀
on: [push]
jobs:
Explore-Gitea-Actions:
runs-on: beowulf-cluster
container:
image: ghcr.io/catthehacker/ubuntu:act-22.04 #ghcr.io/omniproc/act-buildkit-runner:latest #
steps:
- name: Check out repository code.
uses: https://github.com/actions/checkout@v4
@omniproc
Yeah my problem is that I don't have anything that uses node. I'm just trying to check out my repository. Basically with your example I cannot check out my repository.
This is my workflow
name: Gitea Actions Demo run-name: Testing out Gitea Actions 🚀 on: [push] jobs: Explore-Gitea-Actions: runs-on: beowulf-cluster container: image: ghcr.io/catthehacker/ubuntu:act-22.04 #ghcr.io/omniproc/act-buildkit-runner:latest # steps: - name: Check out repository code. uses: https://github.com/actions/checkout@v4
You are using the github checkout action, which is a javascript action executed by nodejs. Can't tell why it is failing from the little information you provided.
@djeinstine Do - run
step work fine?
Can't tell why it is failing from the little information you provided.
Yes exactly. Something like parts of your kubernetes config could be helpful.
Node normally doesn't need to be installed, this node folder should be copied to the persisted volume claim by https://github.com/actions/runner-container-hooks k8s edition during the Setup Job Step.
@omniproc @ChristopherHX
Yes I didn't post my config. I took inspiration from this post, and the full working example from @motoki317 here https://github.com/traPtitech/manifest/blob/3ff7e8e6dfa3e0e4fed9a9e8ca1ad09f9b132ff1/gitea/act-runner/gitea-act-runner.yaml
and came up with the following:
relevant part of gitea-act-runner.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: gitea-act-runner
spec:
serviceName: gitea-act-runner
replicas: 1
revisionHistoryLimit: 0
volumeClaimTemplates:
- metadata:
name: work
spec:
accessModes:
- ReadWriteOnce
storageClassName: "local-path"
resources:
requests:
storage: 1Gi
persistentVolumeClaimRetentionPolicy:
whenScaled: Delete
whenDeleted: Delete
selector:
matchLabels:
app: gitea-act-runner
template:
metadata:
labels:
app: gitea-act-runner
spec:
serviceAccountName: gitea-act-runner
containers:
- name: runner
image: ghcr.io/christopherhx/gitea-actions-runner:v0.0.13
imagePullPolicy: Always
env:
- name: ACTIONS_RUNNER_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: ACTIONS_RUNNER_CLAIM_NAME
value: work-$(ACTIONS_RUNNER_POD_NAME)
- name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
value: "true"
- name: ACTIONS_RUNNER_CONTAINER_HOOKS
value: /home/runner/k8s/index.js
- name: GITEA_INSTANCE_URL
value: https://gitea.nas.homespace.ovh/
- name: GITEA_RUNNER_REGISTRATION_TOKEN
valueFrom:
secretKeyRef:
name: act-runner
key: registration-token
- name: GITEA_RUNNER_LABELS
value: beowulf-cluster
- name: GITEA_RUNNER_NAME
value: beowulf-act-runner
volumeMounts:
- mountPath: /home/runner/_work
name: work
resources:
requests:
cpu: "100m"
memory: "500Mi"
limits:
cpu: "1"
memory: "2Gi"
demo workflow demo.yaml
name: Gitea Actions Demo
run-name: Testing out Gitea Actions 🚀
on: [push]
jobs:
Explore-Gitea-Actions:
runs-on: beowulf-cluster
container:
image: ghcr.io/catthehacker/ubuntu:act-22.04 #ghcr.io/omniproc/act-buildkit-runner:latest #
steps:
- name: Check out repository code.
uses: https://github.com/actions/checkout@v4
output in Actions
Something is odd in your kubernetes cluster..., maybe try a different storage provider or change size limits?
I have no clue why the external files are not there, for me they are always copied back if I delete it manually.
The externals folder is intact in the image as well, otherwise the k8s hooks couldn't run as well.
storageClassName: "local-path"
This difference is not an issue for me, works with both default and this one.
Had to enable this provider in my minikube.
ubuntu@ubuntu:~$ minikube addons enable storage-provisioner-rancher
❗ storage-provisioner-rancher is a 3rd party addon and is not maintained or verified by minikube maintainers, enable at your own risk.
❗ storage-provisioner-rancher does not currently have an associated maintainer.
▪ Using image docker.io/rancher/local-path-provisioner:v0.0.22
▪ Using image docker.io/busybox:stable
🌟 The 'storage-provisioner-rancher' addon is enabled
Please add a run step before checkout that checks that the /__e contains the node tool by recusively enumerating the folder.
Maybe add a sleep 1000
run step then inspect the spawned pod mounts for more data of this issue.
e.g. if I look at my kubernetes, check that the externals folder has the node program inside of it and you can execute it. The path is from by kubernetes dashboard and I sshed onto minikube
ls -l /opt/local-path-provisioner/pvc-b32ef2bd-682f-42d0-a8c6-d79075247505_default_work-gitea-act-runner-x1-0
total 24
drwxr-xr-x 3 docker 1000 4096 Sep 4 19:32 _PipelineMapping
drwxr-xr-x 3 docker 1000 4096 Sep 4 19:32 _actions
drwxr-xr-x 2 docker 1000 4096 Sep 4 19:33 _temp
drwxr-xr-x 2 docker 1000 4096 Sep 4 19:32 _tool
drwxr-xr-x 4 docker 1000 4096 Sep 4 19:32 externals
drwxr-xr-x 3 docker 1000 4096 Sep 4 19:32 test-actions
The k8s container hooks you are using here are unchanged ones from https://github.com/actions/runner-container-hooks/releases/tag/v0.6.1 using unchanged actions/runner 2.317.0
This is the function responsible to provide the externals that are not found: https://github.com/actions/runner-container-hooks/blob/73655d4639a62f6e4b3d70b5878bc4367c0a436e/packages/k8s/src/hooks/prepare-job.ts#L184-L193
@ChristopherHX Thank you for the tip. I had not thought about looking into mounts. I checked out my mounts and everything is correct on the node's side. I'm using Talos so I cannot ssh into the OS. I can, however, list the directories using the CLI.
Node/Storage Side Results:
I can see all of the relevant files listed in your response. The only difference being is the folder you have as test-actions
, for me is named kubernetes
. Listing all directories in the PV with recursive depth set to 2, I can find the relevant files that the runner claims are missing:
talosctl list -d 2 /var/local-path-provisioner/pvc-d1ef7f45-9eb9-44a0-a521-9cf084c829ba_gitea_work-gitea-act-runner-0 -n 192.168.2.99
NODE NAME
192.168.2.99 .
192.168.2.99 _PipelineMapping
192.168.2.99 _PipelineMapping/lyons
192.168.2.99 _actions
192.168.2.99 _actions/https~
192.168.2.99 _temp
192.168.2.99 _tool
192.168.2.99 externals
192.168.2.99 externals/node16
192.168.2.99 externals/node16_alpine
192.168.2.99 externals/node20
192.168.2.99 externals/node20_alpine
192.168.2.99 kubernetes
192.168.2.99 kubernetes/kubernetes
Runner/Pod Side Results: So I checked the mounts and the correct mounts are indeed there. Including the '/__e' directory mapped to 'externals' that the runner claims is missing. Pod Mounts:
Mounts:
/__e from work (rw,path="externals")
/__w from work (rw)
/github/home from work (rw,path="_temp/_github_home")
/github/workflow from work (rw,path="_temp/_github_workflow")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zkjg6 (ro)
So I did an ls at two folders above the entry point (ls ..
and ls ../..
) of the runner and these are my findings
Runner:
So it seems to me the hooks are perfectly fine, but the mounts are not working properly. /__e from work (rw,path="externals")
is somehow not being linked. That leads me to wonder how does the set up work at all? It doesn't seem to be a cluster problem but a simple pod/image mount issue. @ChristopherHX do you have an example deployment/statefulset that you use?
My example is pretty much the same as yours, but our cluster are the most different factor here. I'm using minikube on my arm64 based server. As omniproc and motoki317 have reported, this should actually work in other clusters as well.
example.yml (This is an export of my minikube with local-path povider as you have shown in your snipped, some generated fields have been removed again)
The /__e from work (rw,path="externals")
volume always mount correctly on my end...
What I didn't understand, does sub path mounts work on your cluster if you create a pod yourself with a subpath mount of a volume like done by the k8s hooks?
For me it looks like a k8s runner container hooks bug or missing functionality in your kubernetes cluster.
I read a while back that older versions of dockerd didn't support mounting a sub folder within a volume to a container, but idk if that is correct.
_I assume modifying k8s hooks (link to the code in earlier comments) so they don't mount /__e
and instead execute an ln -s command in the setup job could workaround cluster incompatibility?_ This code managing the job pod is maintained by GitHub Employees and not by me.
@ChristopherHX I finally got it. I switched my storage class to longhorn and it works now. I have no idea why the local path provisioner doesn't work in my cluster.
@ChristopherHX I finally got it. I switched my storage class to longhorn and it works now. I have no idea why the local path provisioner doesn't work in my cluster.
You mentioned you use Talos, which comes with some special requirements for local path provisioner. Maybe that was the issue? https://www.talos.dev/v1.7/kubernetes-guides/configuration/local-storage/
I just looked at my extra mounts section and I didn't mount the hostPath mounts. Looks like I skipped that section of the docs.
It's not needed but it's good practices to separate concerns ( and argueably this design also leads to single responsibility and possibly open-closed). The job of the controller is to talk to and monitor the K8s API and based on that do what it must on the backend system, and/or - vice versa - wait for instructions from the external system (Gitea) to get instructions and perform the required tasks in K8s.
starting from direct access and then moving to controller as/if neeeded , is a way to faster get there, since controller will add extra complexity.
True, simple pod with multiple containers might do for most cases. More complex scenarios however might require pre or post steps. Take a look at the ecosystems of FluxCD and ArgoCD and how they involved from basicly what you argue for to something much bigger and more complex. But I agree that for an initial implementation having granular control over a pod - started as k8s job or simple pod - is good enough. However at that point, since it's just as easy as applying a manifest against the K8s API, why limit it? Just leave it up to the user what he wants to define in the manifest and have it applied as the Gitea action workflow starts. A simple label system could signal Gitea what pod to consider important for the workflow to fail (e.g. apply a label gitea-action-must-succeed to all pods that Gitea will consider relevant for the workflow to succeed).
its not fluxcd and argocd that we should look in this case, since they do completely different things, but jenkns, gitlab and drone
Same here. Simply allow the user to supply a K8s manifest with the workflow and the controller would apply it to k8s.
exactly!
Secrets in Kubernetes are not designed to be "secret". They are simply configmaps that can be protected using RBAC. Within a namespace, all pods can access them, just like any configmap. Don't want that? Create a separate namespace, use RBAC. Don't try to re-invent the wheel. It will only break 90% of k8s tools you might need since they pretty much all expect you to use secrets the way they're ment to be used.
what i was trying to say that if there is an option to not use k8s secrets for storing job secrets, but instead inject them from somewhere else, directly into job, then that might be a better option, which lets people drastically reduce amount of namespaces they need just to isolate, in some cases, single secret value.
what i was trying to say that if there is an option to not use k8s secrets for storing job secrets, but instead inject them from somewhere else, directly into job, then that might be a better option, which lets people drastically reduce amount of namespaces they need just to isolate, in some cases, single secret value.
I'd argue if you end up with 1 namespace per secret you either have only one secret per concern or your architecture should be refactored. "injecting them from somewhere else" however is always easy if the user has full access to the manifests that should be applied. I personally view the sideloading of secrets as an anti-pattern, but you might have a different opinion on the matter.
Feature Description
The Gitea Actions release was a great first step. But currently it's missing many features of a more mature solution based on K8s runners rather then single nodes. While it's possible to have runners on K8s this currently requires DinD which has it's hole set of own problems, security issues (privileged exec required as of today) and feature limitations (can't use DinD to start another container to build a container image (DinDinD)). I know with buildx workarounds exist, but those are just that: workarounds.
I think the next step could be something like what actions-runner-controller is doing for GitHub actions. Basically a operator that is deployed on K8s and registers as runner. Every job it starts is then started in it's own pod rather then the runner itself. The runner coordinates the pods.
Related docs:
Screenshots
No response