actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.69k stars 1.11k forks source link

`service` containers not working on runners with `containerMode: kubernetes` #1768

Open bquenin opened 2 years ago

bquenin commented 2 years ago

Controller Version

0.25.2

Helm Chart Version

0.20.2

CertManager Version

1.9.1

Deployment Method

Helm

cert-manager installation

Yes I've followed https://github.com/actions-runner-controller/actions-runner-controller#installation and installed cert-manager from the official source https://cert-manager.io/docs/installation/helm/

Checks

Resource Definitions

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerSet
metadata:
  name: k8s-runner
  namespace: actions-runner-system
spec:
  replicas: 4
  organization: devx-ibp
  containerMode: kubernetes
  serviceAccountName: runner-service-account
  selector:
    matchLabels:
      app: k8s-runner
  serviceName: k8s-runner
  template:
    metadata:
      labels:
        app: k8s-runner
  workVolumeClaimTemplate:
    storageClassName: standard
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 1Gi
  labels:
  - k8s-runner
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: runner-role
  namespace: actions-runner-system
rules:
- apiGroups: [ "" ]
  resources: [ "pods" ]
  verbs: [ "get", "list", "create", "delete" ]
- apiGroups: [ "" ]
  resources: [ "pods/exec" ]
  verbs: [ "get", "create" ]
- apiGroups: [ "" ]
  resources: [ "pods/log" ]
  verbs: [ "get", "list", "watch", ]
- apiGroups: [ "batch" ]
  resources: [ "jobs" ]
  verbs: [ "get", "list", "create", "delete" ]
- apiGroups: [ "" ]
  resources: [ "secrets" ]
  verbs: [ "get", "list", "create", "delete" ]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: runner-role-binding
  namespace: actions-runner-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: runner-role
subjects:
- kind: ServiceAccount
  name: runner-service-account
  namespace: actions-runner-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: runner-service-account
  namespace: actions-runner-system

Storage Class:

Name:            standard
IsDefaultClass:  Yes
Annotations:     kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"},"name":"standard"},"provisioner":"rancher.io/local-path","reclaimPolicy":"Delete","volumeBindingMode":"WaitForFirstConsumer"}
,storageclass.kubernetes.io/is-default-class=true
Provisioner:           rancher.io/local-path
Parameters:            <none>
AllowVolumeExpansion:  <unset>
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     WaitForFirstConsumer
Events:                <none>

To Reproduce

Execute the following workflow:

name: Go

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  build:
    runs-on: [ self-hosted, k8s-runner ]
    services:
      redis:
        image: redis
        ports:
        - 6379/tcp
    container:
      image: golang:alpine
    steps:
    - uses: actions/checkout@v3
    - run: go build cmd/hello/main.go
    - run: ./main

Describe the bug

The initialize container step fails:

image

5s
##[debug]Evaluating condition for step: 'Initialize containers'
##[debug]Evaluating: success()
##[debug]Evaluating success:
##[debug]=> true
##[debug]Result: true
##[debug]Starting: Initialize containers
##[debug]Register post job cleanup for stopping/deleting containers.
Run '/runner/k8s/index.js'
##[debug]/runner/externals/node[1](https://github.com/devx-ibp/bquenin-actions/runs/8142952627?check_suite_focus=true#step:3:1)[6](https://github.com/devx-ibp/bquenin-actions/runs/8142952627?check_suite_focus=true#step:3:6)/bin/node /runner/k[8](https://github.com/devx-ibp/bquenin-actions/runs/8142952627?check_suite_focus=true#step:3:8)s/index.js
##[debug]Using image 'golang:alpine' for job image
##[debug]Adding service 'redis' to pod definition
Error: Error: failed to create job pod: HttpError: HTTP request failed
Error: Process completed with exit code 1.
Error: Executing the custom container implementation failed. Please contact your self hosted runner administrator.
##[debug]System.Exception: Executing the custom container implementation failed. Please contact your self hosted runner administrator.
##[debug] ---> System.Exception: The hook script at '/runner/k8s/index.js' running command 'PrepareJob' did not execute successfully
##[debug]   at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.ExecuteHookScript[T](IExecutionContext context, HookInput input, ActionRunStage stage, String prependPath)
##[debug]   --- End of inner exception stack trace ---
##[debug]   at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.ExecuteHookScript[T](IExecutionContext context, HookInput input, ActionRunStage stage, String prependPath)
##[debug]   at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.PrepareJobAsync(IExecutionContext context, List`1 containers)
##[debug]   at GitHub.Runner.Worker.ContainerOperationProvider.StartContainersAsync(IExecutionContext executionContext, Object data)
##[debug]   at GitHub.Runner.Worker.JobExtensionRunner.RunAsync()
##[debug]   at GitHub.Runner.Worker.StepsRunner.RunStepAsync(IStep step, CancellationToken jobCancellationToken)
##[debug]Finishing: Initialize containers

Describe the expected behavior

Hi,

I'm trying to use a service container in a job. I was expecting the service container to be created as an additional container to the pod executing this job but it looks like it's not working. Is there anything I'm missing?

image

Controller Logs

https://gist.github.com/bquenin/ddbe50c71dadd6b136ab0b0b5bee6e63

Runner Pod Logs

https://gist.github.com/bquenin/ddbe50c71dadd6b136ab0b0b5bee6e63
fhammerl commented 1 year ago

I believe there might be a sanitization bug in the portmapping of containerMode: kubernetes.

Instead of 6379/tcp, does 6379 (tcp is default) or 6379:6379/tcp work?

arloliu commented 1 year ago

I met the same issue too. I built a new action runner image with new RUNNER_CONTAINER_HOOKS_VERSION=0.2.0. All the following cases failed.

  1. with port setting: 6379
  2. with port setting: 6379:6379
  3. with port setting: 6379:6379/tcp
  4. without port setting
Brenner87 commented 1 year ago

Any updates here?:) Facing the same problem.....

jdrinkwater-literati commented 1 year ago

While the exact error is different than what is descibed here, @Brenner87 and I have been unable to use GHA sidecar containers in containerMode: kubernetes as well, read more here: https://github.com/actions/actions-runner-controller/discussions/2227 But it seems to be completely nuking the entrypoint command to start the sidecar container for us, having nothing to do with ports.

junchaw commented 1 year ago

I got this error too, and it turns out to be related to our OPA policy to require resources on all containers,

it tooks days for me to figure our the root cause but it's really a tiny issue, the real problem is with the error message, I updated this line to provide detailed error info:

https://github.com/actions/runner-container-hooks/blob/main/packages/k8s/src/hooks/prepare-job.ts#LL53C32-L53C42

# from
throw new Error(`failed to create job pod: ${err}`)

# to
throw new Error(`failed to create job pod: ${JSON.stringify(err)}`)

then, instead of "HTTP Error", you'll get log like this;

Error: Error: failed to create job pod: {"response":{"statusCode":403,"body":{"... is forbidden: failed quota: fuze-quota: must specify cpu for: job; memory for: job","reason":"Forbidden","..."statusCode":403,"name":"HttpError"}

You may get a different error, but I'm sure you'll know how to fix it :)

I have packaged a fixed version of Docker image here https://hub.docker.com/r/kacifer/actions-runner, specify image in the controller deployment or your runner spec: kacifer/actions-runner:0.0.2 (I'm not keeping this image up to date, you could package your own easily).

stephen-tatari commented 1 year ago

@kacifer have you considered a PR against https://github.com/actions/runner-container-hooks? Seems like it'd be worth it. I just ran into this error when I tried to have the worker pod use a service account that didn't exist. Would have been handy to get the full error message here.

junchaw commented 1 year ago

@stephen-tatari yes I could do that, glad to know someone else have the same problem šŸ¤”

ykebede8 commented 1 year ago

Any update on the core issue here? Is it possible to run a job that creates services with a containerMode: kubernetes?

thobianchi commented 1 year ago

@kacifer could you elaborate on the solution you have found? I can't find any PR open to fix this

bastianwegge commented 8 months ago

Following the error messages OP received, it seems to me like it's a configuration issue. This thread #3073 lead me to test service containers with localhost, which works fine!

Just posting this here in case anybody comes here because of the issue title. It does not appear to be a general problem.