actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.71k stars 1.11k forks source link

actions runner pods error #3370

Open sravula84 opened 7 months ago

sravula84 commented 7 months ago

Checks

Controller Version

actions-runner-controller-0.22.0

Deployment Method

Helm

Checks

To Reproduce

install controller 
configure runnerdeployment with replica 10
configure horizontalscaler 10 to 50 replica

Describe the bug

runner pods going to error state and throwing docker socket error

Describe the expected behavior

it should spin up the new runners based on the no of workflow triggered, in this case runner scaler configured with max 50

Additional Context

COMPUTED VALUES:
actionsMetrics:
  port: 8443
  proxy:
    enabled: true
    image:
      repository: quay.io/brancz/kube-rbac-proxy
      tag: v0.13.1
  serviceAnnotations: {}
  serviceMonitor: false
  serviceMonitorLabels: {}
actionsMetricsServer:
  affinity: {}
  enabled: false
  fullnameOverride: ""
  imagePullSecrets: []
  ingress:
    annotations: {}
    enabled: false
    hosts:
    - extraPaths: []
      host: chart-example.local
      paths: []
    ingressClassName: ""
    tls: []
  logFormat: text
  nameOverride: ""
  nodeSelector: {}
  podAnnotations: {}
  podLabels: {}
  podSecurityContext: {}
  priorityClassName: ""
  replicaCount: 1
  resources: {}
  secret:
    create: false
    enabled: false
    github_webhook_secret_token: ""
    name: actions-metrics-server
  securityContext: {}
  service:
    annotations: {}
    ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: http
    type: ClusterIP
  serviceAccount:
    annotations: {}
    create: true
    name: ""
  tolerations: []
additionalVolumeMounts: []
additionalVolumes: []
admissionWebHooks: {}
affinity: {}
authSecret:
  annotations: {}
  create: true
  enabled: true
  github_token:
  name: controller-manager
certManagerEnabled: true
defaultScaleDownDelay: 10m
dockerRegistryMirror: ""
enableLeaderElection: true
env: {}
fullnameOverride: ""
githubWebhookServer:
  affinity: {}
  enabled: false
  fullnameOverride: ""
  imagePullSecrets: []
  ingress:
    annotations: {}
    enabled: false
    hosts:
    - extraPaths: []
      host: chart-example.local
      paths: []
    ingressClassName: ""
    tls: []
  logFormat: text
  nameOverride: ""
  nodeSelector: {}
  podAnnotations: {}
  podDisruptionBudget:
    enabled: false
  podLabels: {}
  podSecurityContext: {}
  priorityClassName: ""
  replicaCount: 1
  resources: {}
  secret:
    create: false
    enabled: false
    github_webhook_secret_token: ""
    name: github-webhook-server
  securityContext: {}
  service:
    annotations: {}
    ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: http
    type: ClusterIP
  serviceAccount:
    annotations: {}
    create: true
    name: ""
  tolerations: []
  useRunnerGroupsVisibility: false
image:
  actionsRunnerImagePullSecrets: []
  actionsRunnerRepositoryAndTag: summerwind/actions-runner:latest
  dindSidecarRepositoryAndTag: docker:24.0.7-dind-alpine3.18
  pullPolicy: IfNotPresent
  repository: summerwind/actions-runner-controller
imagePullSecrets: []
labels: {}
logFormat: text
metrics:
  port: 8443
  proxy:
    enabled: true
    image:
      repository: quay.io/brancz/kube-rbac-proxy
      tag: v0.13.1
  serviceAnnotations: {}
  serviceMonitor: false
  serviceMonitorLabels: {}
nameOverride: ""
nodeSelector: {}
podAnnotations: {}
podDisruptionBudget:
  enabled: false
podLabels: {}
podSecurityContext: {}
priorityClassName: ""
rbac: {}
replicaCount: 1
resources: {}
runner:
  statusUpdateHook:
    enabled: false
scope:
  singleNamespace: false
  watchNamespace: ""
securityContext: {}
service:
  annotations: {}
  port: 443
  type: ClusterIP
serviceAccount:
  annotations: {}
  create: true
  name: ""
syncPeriod: 1m
tolerations: []
webhookPort: 9443

Controller Logs

2024-03-19T22:21:12Z    DEBUG   controller-runtime.webhook.webhooks wrote response  {"webhook": "/mutate-runner-set-pod", "code": 200, "reason": "", "UID": "33222d85-db32-4f05-b866-858ddb83a913", "allowed": true}
2024-03-19T22:21:12Z    INFO    runner  Created runner pod  {"runner": "actions-runner-systems/github-action-small-5r8nj-29dt6", "repository": ""}
2024-03-19T22:21:12Z    DEBUG   events  Created pod 'github-action-small-5r8nj-29dt6'   {"type": "Normal", "object": {"kind":"Runner","namespace":"actions-runner-systems","name":"github-action-small-5r8nj-29dt6","uid":"595fdda6-8727-496d-9a9e-ec221d8e9e5a","apiVersion":"actions.summerwind.dev/v1alpha1","resourceVersion":"67959395"}, "reason": "PodCreated"}
2024-03-19T22:21:12Z    DEBUG   runnerreplicaset    Skipped reconcilation because owner is not synced yet   {"runnerreplicaset": "actions-runner-systems/github-action-small-5r8nj", "owner": "actions-runner-systems/github-action-small-5r8nj-29dt6", "pods": [{"kind":"Pod","apiVersion":"v1","metadata":{"name":"github-action-small-5r8nj-29dt6","namespace":"actions-runner-systems","uid":"c9d9e756-7f3c-450c-b5c3-578dc6eb462e","resourceVersion":"67959403","creationTimestamp":"2024-03-19T22:21:12Z","labels":{"actions-runner":"","actions-runner-controller/inject-registration-token":"true","pod-template-hash":"749c9b7998","runner-deployment-name":"github-action-small","runner-template-hash":"78ddc6dd8"},"annotations":{"actions-runner-controller/token-expires-at":"2024-03-19T16:10:51-07:00","sync-time":"2024-03-19T22:21:11Z"},"ownerReferences":[{"apiVersion":"actions.summerwind.dev/v1alpha1","kind":"Runner","name":"github-action-small-5r8nj-29dt6","uid":"595fdda6-8727-496d-9a9e-ec221d8e9e5a","controller":true,"blockOwnerDeletion":true}],"managedFields":[{"manager":"manager","operation":"Update","apiVersion":"v1","time":"2024-03-19T22:21:12Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:sync-time":{}},"f:labels":{".":{},"f:actions-runner":{},"f:actions-runner-controller/inject-registration-token":{},"f:pod-template-hash":{},"f:runner-deployment-name":{},"f:runner-template-hash":{}},"f:ownerReferences":{".":{},"k:{\"uid\":\"595fdda6-8727-496d-9a9e-ec221d8e9e5a\"}":{}}},"f:spec":{"f:containers":{"k:{\"name\":\"docker\"}":{".":{},"f:env":{".":{},"k:{\"name\":\"DOCKER_TLS_CERTDIR\"}":{".":{},"f:name":{},"f:value":{}}},"f:image":{},"f:imagePullPolicy":{},"f:lifecycle":{".":{},"f:preStop":{".":{},"f:exec":{".":{},"f:command":{}}}},"f:name":{},"f:resources":{},"f:securityContext":{".":{},"f:privileged":{}},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{},"f:volumeMounts":{".":{},"k:{\"mountPath\":\"/certs/client\"}":{".":{},"f:mountPath":{},"f:name":{}},"k:{\"mountPath\":\"/runner\"}":{".":{},"f:mountPath":{},"f:name":{}},"k:{\"mountPath\":\"/runner/_work\"}":{".":{},"f:mountPath":{},"f:name":{}}}},"k:{\"name\":\"runner\"}":{".":{},"f:env":{".":{},"k:{\"name\":\"DOCKERD_IN_RUNNER\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"DOCKER_CERT_PATH\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"DOCKER_ENABLED\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"DOCKER_HOST\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"DOCKER_TLS_VERIFY\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"GITHUB_ACTIONS_RUNNER_EXTRA_USER_AGENT\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"GITHUB_URL\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"RUNNER_ENTERPRISE\"}":{".":{},"f:name":{}},"k:{\"name\":\"RUNNER_EPHEMERAL\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"RUNNER_GROUP\"}":{".":{},"f:name":{}},"k:{\"name\":\"RUNNER_LABELS\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"RUNNER_NAME\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"RUNNER_ORG\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"RUNNER_REPO\"}":{".":{},"f:name":{}},"k:{\"name\":\"RUNNER_STATUS_UPDATE_HOOK\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"RUNNER_TOKEN\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"RUNNER_WORKDIR\"}":{".":{},"f:name":{},"f:value":{}}},"f:image":{},"f:imagePullPolicy":{},"f:name":{},"f:resources":{".":{},"f:limits":{".":{},"f:cpu":{},"f:memory":{}},"f:requests":{".":{},"f:cpu":{},"f:memory":{}}},"f:securityContext":{},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{},"f:volumeMounts":{".":{},"k:{\"mountPath\":\"/certs/client\"}":{".":{},"f:mountPath":{},"f:name":{},"f:readOnly":{}},"k:{\"mountPath\":\"/runner\"}":{".":{},"f:mountPath":{},"f:name":{}},"k:{\"mountPath\":\"/runner/_work\"}":{".":{},"f:mountPath":{},"f:name":{}}}}},"f:dnsPolicy":{},"f:enableServiceLinks":{},"f:restartPolicy":{},"f:schedulerName":{},"f:securityContext":{},"f:terminationGracePeriodSeconds":{},"f:volumes":{".":{},"k:{\"name\":\"certs-client\"}":{".":{},"f:emptyDir":{},"f:name":{}},"k:{\"name\":\"runner\"}":{".":{},"f:emptyDir":{},"f:name":{}},"k:{\"name\":\"work\"}":{".":{},"f:emptyDir":{},"f:name":{}}}}}}]},"spec":{"volumes":[{"name":"runner","emptyDir":{}},{"name":"work","emptyDir":{}},{"name":"certs-client","emptyDir":{}},{"name":"kube-api-access-ldpcz","projected":{"sources":[{"serviceAccountToken":{"expirationSeconds":3607,"path":"token"}},{"configMap":{"name":"kube-root-ca.crt","items":[{"key":"ca.crt","path":"ca.crt"}]}},{"downwardAPI":{"items":[{"path":"namespace","fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}]}}],"defaultMode":420}}],"containers":[{"name":"runner","image":"summerwind/actions-runner:latest","env":[{"name":"RUNNER_ORG","value":"prosperllc"},{"name":"RUNNER_REPO"},{"name":"RUNNER_ENTERPRISE"},{"name":"RUNNER_LABELS","value":"pspr-utils-linux-np-small"},{"name":"RUNNER_GROUP"},{"name":"DOCKER_ENABLED","value":"true"},{"name":"DOCKERD_IN_RUNNER","value":"false"},{"name":"GITHUB_URL","value":"https://github.com/"},{"name":"RUNNER_WORKDIR","value":"/runner/_work"},{"name":"RUNNER_EPHEMERAL","value":"true"},{"name":"RUNNER_STATUS_UPDATE_HOOK","value":"false"},{"name":"GITHUB_ACTIONS_RUNNER_EXTRA_USER_AGENT","value":"actions-runner-controller/v0.27.0"},{"name":"DOCKER_HOST","value":"tcp://localhost:2376"},{"name":"DOCKER_TLS_VERIFY","value":"1"},{"name":"DOCKER_CERT_PATH","value":"/certs/client"},{"name":"RUNNER_NAME","value":"github-action-small-5r8nj-29dt6"},{"name":"RUNNER_TOKEN","value":"AI5PDGHSSR55YFZJEKLE44DF7INXW"}],"resources":{"limits":{"cpu":"2","memory":"4Gi"},"requests":{"cpu":"500m","memory":"2Gi"}},"volumeMounts":[{"name":"runner","mountPath":"/runner"},{"name":"work","mountPath":"/runner/_work"},{"name":"certs-client","readOnly":true,"mountPath":"/certs/client"},{"name":"kube-api-access-ldpcz","readOnly":true,"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount"}],"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"Always","securityContext":{}},{"name":"docker","image":"docker:24.0.7-dind-alpine3.18","env":[{"name":"DOCKER_TLS_CERTDIR","value":"/certs"}],"resources":{},"volumeMounts":[{"name":"runner","mountPath":"/runner"},{"name":"certs-client","mountPath":"/certs/client"},{"name":"work","mountPath":"/runner/_work"},{"name":"kube-api-access-ldpcz","readOnly":true,"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount"}],"lifecycle":{"preStop":{"exec":{"command":["/bin/sh","-c","timeout \"${RUNNER_GRACEFUL_STOP_TIMEOUT:-15}\" /bin/sh -c \"echo 'Prestop hook started'; while [ -f /runner/.runner ]; do sleep 1; done; echo 'Waiting for dockerd to start'; while ! pgrep -x dockerd; do sleep 1; done; echo 'Prestop hook stopped'\" >/proc/1/fd/1 2>&1"]}}},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"IfNotPresent","securityContext":{"privileged":true}}],"restartPolicy":"Never","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","serviceAccountName":"default","serviceAccount":"default","nodeName":"gke-nonprod-us-west1-default-node-poo-504a47a4-8he7","securityContext":{},"schedulerName":"default-scheduler","tolerations":[{"key":"node.kubernetes.io/not-ready","operator":"Exists","effect":"NoExecute","tolerationSeconds":300},{"key":"node.kubernetes.io/unreachable","operator":"Exists","effect":"NoExecute","tolerationSeconds":300}],"priority":0,"enableServiceLinks":true,"preemptionPolicy":"PreemptLowerPriority"},"status":{"phase":"Pending","conditions":[{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-03-19T22:21:12Z"}],"qosClass":"Burstable"}}]}
2024-03-19T22:21:50Z    DEBUG   runner  Runner appears to have been registered and running. {"runner": "actions-runner-systems/github-action-small-5r8nj-lhd8g", "podCreationTimestamp": "2024-03-19 22:21:11 +0000 UTC"}
2024-03-19T22:21:50Z    DEBUG   runner  Runner appears to have been registered and running. {"runner": "actions-runner-systems/github-action-small-5r8nj-flq4v", "podCreationTimestamp": "2024-03-19 22:21:12 +0000 UTC"}

Runner Pod Logs

# Authentication

√ Connected to GitHub

# Runner Registration

√ Runner successfully added
√ Runner connection is good

# Runner settings

√ Settings Saved.

2024-03-19 22:13:18.165  DEBUG --- Runner successfully configured.
{
  "agentId": 88824,
  "agentName": "github-action-small-5r8nj-hv2gf",
  "poolId": 1,
  "poolName": "Default",
  "ephemeral": true,
  "serverUrl": "https://pipelinesghubeus21.actions.githubusercontent.com/tMTkzAKYleoidiHAI9FjPaHPkEkp2s7TIoUW3BW1740YmeFlFo/",
  "gitHubUrl": "https://github.com/prosperllc",
  "workFolder": "/runner/_work"
2024-03-19 22:13:18.174  DEBUG --- Docker enabled runner detected and Docker daemon wait is enabled
2024-03-19 22:13:18.177  DEBUG --- Waiting until Docker is available or the timeout of 120 seconds is reached
Failed to initialize: unable to resolve docker endpoint: open /certs/client/ca.pem: no such file or directory
Failed to initialize: unable to resolve docker endpoint: open /certs/client/ca.pem: no such file or directory
Failed to initialize: unable to resolve docker endpoint: open /certs/client/ca.pem: no such file or directory
Failed to initialize: unable to resolve docker endpoint: open /certs/client/ca.pem: no such file or directory
Cannot connect to the Docker daemon at tcp://localhost:2376. Is the docker daemon running?
Cannot connect to the Docker daemon at tcp://localhost:2376. Is the docker daemon running?
Cannot connect to the Docker daemon at tcp://localhost:2376. Is the docker daemon running?
Cannot connect to the Docker daemon at tcp://localhost:2376. Is the docker daemon running?
Cannot connect to the Docker daemon at tcp://localhost:2376. Is the docker daemon running?
Cannot connect to the Docker daemon at tcp://localhost:2376. Is the docker daemon running?
Cannot connect to the Docker daemon at tcp://localhost:2376. Is the docker daemon running?
Cannot connect to the Docker daemon at tcp://localhost:2376. Is the docker daemon running?
Cannot connect to the Docker daemon at tcp://localhost:2376. Is the docker daemon running?
sravula84 commented 7 months ago

we have 2 cluster configured with actions runners controller

1) first cluster never had any issues - 1.25.16-gke.1460000 2) second cluster always running with the above issue . both cluster same helm chart version used . the only difference is kubernetes version- 1.27.9-gke.1092000

is there any specific controller version need to use or runner image or any other configuration changes required ?

sravula84 commented 7 months ago

any suggestions on the above case?

sravula84 commented 7 months ago

HI Team ,

any suggestions on the above issue?

sravula84 commented 7 months ago

Hi Team any suggestions on the above issue , i see few of users raised similar issue

Thanks Sridhar