actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.69k stars 1.11k forks source link

ARM64 docker containers on AMD64 Runners cannot fetch https urls #2511

Closed ali-kafel closed 1 year ago

ali-kafel commented 1 year ago

Checks

Controller Version

0.27.2

Helm Chart Version

0.23.1

CertManager Version

1.10.0

Deployment Method

Helm

cert-manager installation

Checks

Resource Definitions

Actions Runner Controller ```yaml apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "13" meta.helm.sh/release-name: actions-runner-controller meta.helm.sh/release-namespace: arc creationTimestamp: "2023-03-15T11:31:32Z" generation: 13 labels: app.kubernetes.io/instance: actions-runner-controller app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: actions-runner-controller app.kubernetes.io/version: 0.27.2 helm.sh/chart: actions-runner-controller-0.23.1 name: arc namespace: arc resourceVersion: "340288275" uid: e15859d1-ed82-4a5b-9122-bee678bd5aa3 spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: app.kubernetes.io/instance: actions-runner-controller app.kubernetes.io/name: actions-runner-controller strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: annotations: kubectl.kubernetes.io/restartedAt: "2023-04-12T18:57:54-04:00" creationTimestamp: null labels: app.kubernetes.io/instance: actions-runner-controller app.kubernetes.io/name: actions-runner-controller spec: containers: - args: - --metrics-addr=127.0.0.1:8080 - --enable-leader-election - --leader-election-id=arc - --port=9443 - --sync-period=1m - --default-scale-down-delay=10m - --docker-image=docker:dind - --runner-image=summerwind/actions-runner:latest - --watch-namespace=arc - --log-format=text command: - /manager env: - name: GITHUB_TOKEN valueFrom: secretKeyRef: key: github_token name: controller-manager optional: true - name: GITHUB_APP_ID valueFrom: secretKeyRef: key: github_app_id name: controller-manager optional: true - name: GITHUB_APP_INSTALLATION_ID valueFrom: secretKeyRef: key: github_app_installation_id name: controller-manager optional: true - name: GITHUB_APP_PRIVATE_KEY valueFrom: secretKeyRef: key: github_app_private_key name: controller-manager optional: true - name: GITHUB_BASICAUTH_PASSWORD valueFrom: secretKeyRef: key: github_basicauth_password name: controller-manager optional: true image: summerwind/actions-runner-controller:v0.27.2 imagePullPolicy: IfNotPresent name: manager ports: - containerPort: 9443 hostPort: 9443 name: webhook-server protocol: TCP resources: {} securityContext: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/actions-runner-controller name: secret readOnly: true - mountPath: /tmp name: tmp - mountPath: /tmp/k8s-webhook-server/serving-certs name: cert readOnly: true - args: - --secure-listen-address=0.0.0.0:8443 - --upstream=http://127.0.0.1:8080/ - --logtostderr=true - --v=10 image: quay.io/brancz/kube-rbac-proxy:v0.13.1 imagePullPolicy: IfNotPresent name: kube-rbac-proxy ports: - containerPort: 8443 hostPort: 8443 name: metrics-port protocol: TCP resources: {} securityContext: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst hostNetwork: true restartPolicy: Always schedulerName: default-scheduler securityContext: {} serviceAccount: arc serviceAccountName: arc terminationGracePeriodSeconds: 10 volumes: - name: secret secret: defaultMode: 420 secretName: controller-manager - name: cert secret: defaultMode: 420 secretName: arc-serving-cert - emptyDir: {} name: tmp status: availableReplicas: 1 conditions: - lastTransitionTime: "2023-03-15T11:31:37Z" lastUpdateTime: "2023-03-15T11:31:37Z" message: Deployment has minimum availability. reason: MinimumReplicasAvailable status: "True" type: Available - lastTransitionTime: "2023-03-15T11:31:32Z" lastUpdateTime: "2023-04-13T14:27:08Z" message: ReplicaSet "arc-7f857b959c" has successfully progressed. reason: NewReplicaSetAvailable status: "True" type: Progressing observedGeneration: 13 readyReplicas: 1 replicas: 1 updatedReplicas: 1 ```
Runner Pod ```yaml apiVersion: v1 kind: Pod metadata: annotations: actions-runner/id: "738" kubernetes.io/psp: eks.privileged sync-time: "2023-04-13T14:26:49Z" creationTimestamp: "2023-04-13T14:26:50Z" finalizers: - actions.summerwind.dev/runner-pod labels: actions-runner: "" actions-runner-controller/inject-registration-token: "true" pod-template-hash: 7fb967d764 runner-deployment-name: arc-2xlarge runner-template-hash: b444b7b88 name: arc-2xlarge-5hf5p-8w7x9 namespace: arc ownerReferences: - apiVersion: actions.summerwind.dev/v1alpha1 blockOwnerDeletion: true controller: true kind: Runner name: arc-2xlarge-5hf5p-8w7x9 uid: f8885559-5007-4f74-9151-55dda422ffb6 resourceVersion: "340287939" uid: b4100608-db3c-4b6f-b29a-687ff923e085 spec: containers: - env: - name: RUNNER_ORG value: arc - name: RUNNER_REPO - name: RUNNER_ENTERPRISE - name: RUNNER_LABELS value: 2xlarge,arc-2xlarge - name: RUNNER_GROUP - name: DOCKER_ENABLED value: "true" - name: DOCKERD_IN_RUNNER value: "false" - name: GITHUB_URL value: https://github.com/ - name: RUNNER_WORKDIR value: /runner/_work - name: RUNNER_EPHEMERAL value: "true" - name: RUNNER_STATUS_UPDATE_HOOK value: "false" - name: GITHUB_ACTIONS_RUNNER_EXTRA_USER_AGENT value: actions-runner-controller/v0.27.1 - name: DOCKER_HOST value: tcp://localhost:2376 - name: DOCKER_TLS_VERIFY value: "1" - name: DOCKER_CERT_PATH value: /certs/client - name: RUNNER_NAME value: arc-2xlarge-5hf5p-8w7x9 - name: RUNNER_TOKEN value: A6PX3GOGITN6QJ7RVPPZ2F3EHANYRAVPNFXHG5DBNRWGC5DJN5XF62LEZYBBTYOEWFUW443UMFWGYYLUNFXW4X3UPFYGLN2JNZ2GKZ3SMF2GS33OJFXHG5DBNRWGC5DJN5XA - name: AWS_STS_REGIONAL_ENDPOINTS value: regional - name: AWS_DEFAULT_REGION value: us-east-1 - name: AWS_REGION value: us-east-1 - name: AWS_ROLE_ARN value: ########## - name: AWS_WEB_IDENTITY_TOKEN_FILE value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token image: summerwind/actions-runner:latest imagePullPolicy: Always name: runner resources: {} securityContext: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /runner name: runner - mountPath: /runner/_work name: work - mountPath: /certs/client name: certs-client readOnly: true - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-hl9m9 readOnly: true - mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount name: aws-iam-token readOnly: true - env: - name: DOCKER_GROUP_GID value: "1001" - name: DOCKER_TLS_CERTDIR value: /certs - name: AWS_STS_REGIONAL_ENDPOINTS value: regional - name: AWS_DEFAULT_REGION value: us-east-1 - name: AWS_REGION value: us-east-1 - name: AWS_ROLE_ARN value: ####### - name: AWS_WEB_IDENTITY_TOKEN_FILE value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token image: docker:dind imagePullPolicy: IfNotPresent lifecycle: preStop: exec: command: - /bin/sh - -c - timeout "${RUNNER_GRACEFUL_STOP_TIMEOUT:-15}" /bin/sh -c "echo 'Prestop hook started'; while [ -f /runner/.runner ]; do sleep 1; done; echo 'Waiting for dockerd to start'; while ! pgrep -x dockerd; do sleep 1; done; echo 'Prestop hook stopped'" >/proc/1/fd/1 2>&1 name: docker resources: limits: ephemeral-storage: 5Gi requests: ephemeral-storage: 256Mi securityContext: privileged: true terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /runner name: runner - mountPath: /certs/client name: certs-client - mountPath: /runner/_work name: work - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-hl9m9 readOnly: true - mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount name: aws-iam-token readOnly: true dnsPolicy: ClusterFirst enableServiceLinks: true nodeName: ip-10-14-2-227.ec2.internal preemptionPolicy: PreemptLowerPriority priority: 0 priorityClassName: default restartPolicy: Never schedulerName: default-scheduler securityContext: fsGroup: 1000 serviceAccount: arc-runner serviceAccountName: arc-runner terminationGracePeriodSeconds: 30 tolerations: - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 300 - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 300 volumes: - name: aws-iam-token projected: defaultMode: 420 sources: - serviceAccountToken: audience: sts.amazonaws.com expirationSeconds: 86400 path: token - emptyDir: {} name: runner - emptyDir: {} name: work - emptyDir: {} name: certs-client - name: kube-api-access-hl9m9 projected: defaultMode: 420 sources: - serviceAccountToken: expirationSeconds: 3607 path: token - configMap: items: - key: ca.crt path: ca.crt name: kube-root-ca.crt - downwardAPI: items: - fieldRef: apiVersion: v1 fieldPath: metadata.namespace path: namespace ```

To Reproduce

1. Start a shell session with the runner container on the runner pod (amd64)
2. Start a alpine or debian arm64 container and start a shell session with the container: `docker run -it --rm --platform linux/arm64 alpine`
3. Curl or fetch an `https` url and you will get a core dump error on the amd64 runner with an arm64 container. In my case I ran `apk add curl` on an arm64 alpine container

Describe the bug

The first time I saw this was when we were trying to build arm64 container on our amd64 runners on github actions using docker buildx. It ran apk update in the build step and we saw this error.

I then opened a shell on the the runner container on the runner pod and tested the following commands on an arm64 alpine container and an amd64 alpine container.

runner@arc-2xlarge-5hf5p-8w7x9:/$ docker run -it --rm --platform linux/arm64 alpine
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
c41833b44d91: Pull complete
Digest: sha256:124c7d2707904eea7431fffe91522a01e5a861a624ee31d03372cc1d138a3126
Status: Downloaded newer image for alpine:latest
/ # apk add curl
fetch https://dl-cdn.alpinelinux.org/alpine/v3.17/main/aarch64/APKINDEX.tar.gz
Segmentation fault (core dumped)
/ # exit

runner@arc-2xlarge-5hf5p-8w7x9:/$ docker run -it --rm --platform linux/amd64 alpine
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
f56be85fc22e: Pull complete
Digest: sha256:124c7d2707904eea7431fffe91522a01e5a861a624ee31d03372cc1d138a3126
Status: Downloaded newer image for alpine:latest
/ # apk add curl
fetch https://dl-cdn.alpinelinux.org/alpine/v3.17/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.17/community/x86_64/APKINDEX.tar.gz
(1/5) Installing ca-certificates (20220614-r4)
(2/5) Installing brotli-libs (1.0.9-r9)
(3/5) Installing nghttp2-libs (1.51.0-r0)
(4/5) Installing libcurl (7.88.1-r1)
(5/5) Installing curl (7.88.1-r1)
Executing busybox-1.35.0-r29.trigger
Executing ca-certificates-20220614-r4.trigger
OK: 9 MiB in 20 packages
/ # exit

Describe the expected behavior

I expected the apk add curl command on the arm64 container on the runner to succeed instead of failing with core dump. This also happens when I curl any https endpoint, http endpoints are not affected. Openssl commands do the same thing as well.

Whole Controller Logs

Controller Logs ```shell Defaulted container "manager" out of: manager, kube-rbac-proxy I0413 14:27:07.074225 1 request.go:690] Waited for 1.037396895s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/elbv2.k8s.aws/v1beta1?timeout=32s 2023-04-13T14:27:07Z INFO controller-runtime.metrics Metrics server is starting to listen {"addr": "127.0.0.1:8080"} 2023-04-13T14:27:07Z INFO Initializing actions-runner-controller {"version": "v0.27.2", "default-scale-down-delay": "10m0s", "sync-period": "1m0s", "default-runner-image": "summerwind/actions-runner:latest", "default-docker-image": "docker:dind", "common-runnner-labels": null, "leader-election-enabled": true, "leader-election-id": "circle-techops-test-runner-controller", "watch-namespace": "circle-techops-test-actions"} 2023-04-13T14:27:07Z INFO controller-runtime.builder Registering a mutating webhook {"GVK": "actions.summerwind.dev/v1alpha1, Kind=Runner", "path": "/mutate-actions-summerwind-dev-v1alpha1-runner"} 2023-04-13T14:27:07Z INFO controller-runtime.webhook Registering webhook {"path": "/mutate-actions-summerwind-dev-v1alpha1-runner"} 2023-04-13T14:27:07Z INFO controller-runtime.builder Registering a validating webhook {"GVK": "actions.summerwind.dev/v1alpha1, Kind=Runner", "path": "/validate-actions-summerwind-dev-v1alpha1-runner"} 2023-04-13T14:27:07Z INFO controller-runtime.webhook Registering webhook {"path": "/validate-actions-summerwind-dev-v1alpha1-runner"} 2023-04-13T14:27:07Z INFO controller-runtime.builder Registering a mutating webhook {"GVK": "actions.summerwind.dev/v1alpha1, Kind=RunnerDeployment", "path": "/mutate-actions-summerwind-dev-v1alpha1-runnerdeployment"} 2023-04-13T14:27:07Z INFO controller-runtime.webhook Registering webhook {"path": "/mutate-actions-summerwind-dev-v1alpha1-runnerdeployment"} 2023-04-13T14:27:07Z INFO controller-runtime.builder Registering a validating webhook {"GVK": "actions.summerwind.dev/v1alpha1, Kind=RunnerDeployment", "path": "/validate-actions-summerwind-dev-v1alpha1-runnerdeployment"} 2023-04-13T14:27:07Z INFO controller-runtime.webhook Registering webhook {"path": "/validate-actions-summerwind-dev-v1alpha1-runnerdeployment"} 2023-04-13T14:27:07Z INFO controller-runtime.builder Registering a mutating webhook {"GVK": "actions.summerwind.dev/v1alpha1, Kind=RunnerReplicaSet", "path": "/mutate-actions-summerwind-dev-v1alpha1-runnerreplicaset"} 2023-04-13T14:27:07Z INFO controller-runtime.webhook Registering webhook {"path": "/mutate-actions-summerwind-dev-v1alpha1-runnerreplicaset"} 2023-04-13T14:27:07Z INFO controller-runtime.builder Registering a validating webhook {"GVK": "actions.summerwind.dev/v1alpha1, Kind=RunnerReplicaSet", "path": "/validate-actions-summerwind-dev-v1alpha1-runnerreplicaset"} 2023-04-13T14:27:07Z INFO controller-runtime.webhook Registering webhook {"path": "/validate-actions-summerwind-dev-v1alpha1-runnerreplicaset"} 2023-04-13T14:27:07Z INFO controller-runtime.webhook Registering webhook {"path": "/mutate-runner-set-pod"} 2023-04-13T14:27:07Z INFO starting manager 2023-04-13T14:27:07Z INFO controller-runtime.webhook.webhooks Starting webhook server 2023-04-13T14:27:07Z INFO Starting server {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"} 2023-04-13T14:27:07Z INFO controller-runtime.certwatcher Updated current TLS certificate 2023-04-13T14:27:07Z INFO controller-runtime.certwatcher Starting certificate watcher 2023-04-13T14:27:07Z INFO controller-runtime.webhook Serving webhook server {"host": "", "port": 9443} I0413 14:27:07.183613 1 leaderelection.go:248] attempting to acquire leader lease circle-techops-test-actions/circle-techops-test-runner-controller... I0413 14:27:25.783655 1 leaderelection.go:258] successfully acquired lease circle-techops-test-actions/circle-techops-test-runner-controller 2023-04-13T14:27:25Z DEBUG events ip-10-14-2-181.ec2.internal_1cf1b6f8-f069-4143-b3a2-4777dd4f58fd became leader {"type": "Normal", "object": {"kind":"Lease","namespace":"circle-techops-test-actions","name":"circle-techops-test-runner-controller","uid":"beecb36d-14de-478c-826c-2887b981ea26","apiVersion":"coordination.k8s.io/v1","resourceVersion":"340288556"}, "reason": "LeaderElection"} 2023-04-13T14:27:25Z INFO Starting EventSource {"controller": "runner-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "Runner", "source": "kind source: *v1alpha1.Runner"} 2023-04-13T14:27:25Z INFO Starting EventSource {"controller": "runner-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "Runner", "source": "kind source: *v1.Pod"} 2023-04-13T14:27:25Z INFO Starting Controller {"controller": "runner-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "Runner"} 2023-04-13T14:27:25Z INFO Starting EventSource {"controller": "runnerdeployment-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerDeployment", "source": "kind source: *v1alpha1.RunnerDeployment"} 2023-04-13T14:27:25Z INFO Starting EventSource {"controller": "runnerdeployment-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerDeployment", "source": "kind source: *v1alpha1.RunnerReplicaSet"} 2023-04-13T14:27:25Z INFO Starting Controller {"controller": "runnerdeployment-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerDeployment"} 2023-04-13T14:27:25Z INFO Starting EventSource {"controller": "runnerreplicaset-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerReplicaSet", "source": "kind source: *v1alpha1.RunnerReplicaSet"} 2023-04-13T14:27:25Z INFO Starting EventSource {"controller": "runnerreplicaset-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerReplicaSet", "source": "kind source: *v1alpha1.Runner"} 2023-04-13T14:27:25Z INFO Starting Controller {"controller": "runnerreplicaset-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerReplicaSet"} 2023-04-13T14:27:25Z INFO Starting EventSource {"controller": "runnerset-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerSet", "source": "kind source: *v1alpha1.RunnerSet"} 2023-04-13T14:27:25Z INFO Starting EventSource {"controller": "runnerset-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerSet", "source": "kind source: *v1.StatefulSet"} 2023-04-13T14:27:25Z INFO Starting EventSource {"controller": "runnerpersistentvolume-controller", "controllerGroup": "", "controllerKind": "PersistentVolume", "source": "kind source: *v1.PersistentVolume"} 2023-04-13T14:27:25Z INFO Starting EventSource {"controller": "runnerpod-controller", "controllerGroup": "", "controllerKind": "Pod", "source": "kind source: *v1.Pod"} 2023-04-13T14:27:25Z INFO Starting Controller {"controller": "runnerpersistentvolume-controller", "controllerGroup": "", "controllerKind": "PersistentVolume"} 2023-04-13T14:27:25Z INFO Starting Controller {"controller": "runnerpod-controller", "controllerGroup": "", "controllerKind": "Pod"} 2023-04-13T14:27:25Z INFO Starting EventSource {"controller": "horizontalrunnerautoscaler-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "HorizontalRunnerAutoscaler", "source": "kind source: *v1alpha1.HorizontalRunnerAutoscaler"} 2023-04-13T14:27:25Z INFO Starting Controller {"controller": "horizontalrunnerautoscaler-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "HorizontalRunnerAutoscaler"} 2023-04-13T14:27:25Z INFO Starting EventSource {"controller": "runnerpersistentvolumeclaim-controller", "controllerGroup": "", "controllerKind": "PersistentVolumeClaim", "source": "kind source: *v1.PersistentVolumeClaim"} 2023-04-13T14:27:25Z INFO Starting Controller {"controller": "runnerset-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerSet"} 2023-04-13T14:27:25Z INFO Starting Controller {"controller": "runnerpersistentvolumeclaim-controller", "controllerGroup": "", "controllerKind": "PersistentVolumeClaim"} 2023-04-13T14:27:25Z INFO Starting workers {"controller": "runnerpod-controller", "controllerGroup": "", "controllerKind": "Pod", "worker count": 1} 2023-04-13T14:27:25Z INFO Starting workers {"controller": "runnerset-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerSet", "worker count": 1} 2023-04-13T14:27:25Z INFO Starting workers {"controller": "runnerdeployment-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerDeployment", "worker count": 1} 2023-04-13T14:27:25Z INFO Starting workers {"controller": "horizontalrunnerautoscaler-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "HorizontalRunnerAutoscaler", "worker count": 1} 2023-04-13T14:27:25Z INFO Starting workers {"controller": "runner-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "Runner", "worker count": 1} 2023-04-13T14:27:25Z INFO Starting workers {"controller": "runnerreplicaset-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerReplicaSet", "worker count": 1} 2023-04-13T14:27:25Z INFO Starting workers {"controller": "runnerpersistentvolume-controller", "controllerGroup": "", "controllerKind": "PersistentVolume", "worker count": 1} 2023-04-13T14:27:25Z INFO runnerdeployment The newest runnerreplicaset is 100% available. Deleting old runnerreplicasets {"runnerdeployment": "circle-techops-test-actions/circle-techops-test-runner-large", "newest_runnerreplicaset": "circle-techops-test-actions/circle-techops-test-runner-large-7gvxf", "newest_runnerreplicaset_replicas_ready": 2, "newest_runnerreplicaset_replicas_desired": 2, "old_runnerreplicasets_count": 1} 2023-04-13T14:27:25Z DEBUG runner Runner appears to have been registered and running. {"runner": "circle-techops-test-actions/circle-techops-test-runner-large-q2dpd-msjwq", "podCreationTimestamp": "2023-04-13 14:11:05 +0000 UTC"} 2023-04-13T14:27:25Z INFO Starting workers {"controller": "runnerpersistentvolumeclaim-controller", "controllerGroup": "", "controllerKind": "PersistentVolumeClaim", "worker count": 1} 2023-04-13T14:27:25Z INFO runnerdeployment The newest runnerreplicaset is 100% available. Deleting old runnerreplicasets {"runnerdeployment": "circle-techops-test-actions/circle-techops-test-runner-small", "newest_runnerreplicaset": "circle-techops-test-actions/circle-techops-test-runner-small-vbz6c", "newest_runnerreplicaset_replicas_ready": 2, "newest_runnerreplicaset_replicas_desired": 2, "old_runnerreplicasets_count": 1} 2023-04-13T14:27:25Z INFO runnerdeployment The newest runnerreplicaset is 100% available. Deleting old runnerreplicasets {"runnerdeployment": "circle-techops-test-actions/circle-techops-test-runner-large", "newest_runnerreplicaset": "circle-techops-test-actions/circle-techops-test-runner-large-7gvxf", "newest_runnerreplicaset_replicas_ready": 2, "newest_runnerreplicaset_replicas_desired": 2, "old_runnerreplicasets_count": 1} 2023-04-13T14:27:25Z DEBUG runner Runner appears to have been registered and running. {"runner": "circle-techops-test-actions/circle-techops-test-runner-small-kjklm-r6lcx", "podCreationTimestamp": "2023-04-13 14:11:06 +0000 UTC"} 2023-04-13T14:27:25Z DEBUG runner Runner appears to have been registered and running. {"runner": "circle-techops-test-actions/circle-techops-test-runner-small-kjklm-w65pz", "podCreationTimestamp": "2023-04-13 14:11:08 +0000 UTC"} 2023-04-13T14:27:25Z ERROR runnerreplicaset Failed to patch owner to have actions-runner/unregistration-complete-timestamp annotation {"runnerreplicaset": "circle-techops-test-actions/circle-techops-test-runner-small-kjklm", "owner": "circle-techops-test-actions/circle-techops-test-runner-small-kjklm-r6lcx", "error": "Operation cannot be fulfilled on runners.actions.summerwind.dev \"circle-techops-test-runner-small-kjklm-r6lcx\": the object has been modified; please apply your changes to the latest version and try again"} github.com/actions/actions-runner-controller/controllers/actions%2esummerwind%2enet.collectPodsForOwners github.com/actions/actions-runner-controller/controllers/actions.summerwind.net/runner_pod_owner.go:552 github.com/actions/actions-runner-controller/controllers/actions%2esummerwind%2enet.syncRunnerPodsOwners github.com/actions/actions-runner-controller/controllers/actions.summerwind.net/runner_pod_owner.go:253 github.com/actions/actions-runner-controller/controllers/actions%2esummerwind%2enet.(*RunnerReplicaSetReconciler).Reconcile github.com/actions/actions-runner-controller/controllers/actions.summerwind.net/runnerreplicaset_controller.go:131 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:122 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:323 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:274 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:235 2023-04-13T14:27:25Z ERROR Reconciler error {"controller": "runnerreplicaset-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerReplicaSet", "RunnerReplicaSet": {"name":"circle-techops-test-runner-small-kjklm","namespace":"circle-techops-test-actions"}, "namespace": "circle-techops-test-actions", "name": "circle-techops-test-runner-small-kjklm", "reconcileID": "e48eccd1-b681-4362-b50b-faf61df162c2", "error": "Operation cannot be fulfilled on runners.actions.summerwind.dev \"circle-techops-test-runner-small-kjklm-r6lcx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:329 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:274 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:235 2023-04-13T14:27:25Z DEBUG runner Runner appears to have been registered and running. {"runner": "circle-techops-test-actions/circle-techops-test-runner-large-q2dpd-vx2gn", "podCreationTimestamp": "2023-04-13 14:11:04 +0000 UTC"} 2023-04-13T14:27:25Z INFO runnerdeployment The newest runnerreplicaset is 100% available. Deleting old runnerreplicasets {"runnerdeployment": "circle-techops-test-actions/circle-techops-test-runner-large", "newest_runnerreplicaset": "circle-techops-test-actions/circle-techops-test-runner-large-7gvxf", "newest_runnerreplicaset_replicas_ready": 2, "newest_runnerreplicaset_replicas_desired": 2, "old_runnerreplicasets_count": 1} 2023-04-13T14:27:25Z INFO runnerdeployment Deleted runnerreplicaset {"runnerdeployment": "circle-techops-test-actions/circle-techops-test-runner-large", "runnerreplicaset": "circle-techops-test-runner-large-q2dpd"} 2023-04-13T14:27:25Z DEBUG events Deleted runnerreplicaset 'circle-techops-test-runner-large-q2dpd' {"type": "Normal", "object": {"kind":"RunnerDeployment","namespace":"circle-techops-test-actions","name":"circle-techops-test-runner-large","uid":"410b9806-8f10-48eb-b643-ee075f12b2bf","apiVersion":"actions.summerwind.dev/v1alpha1","resourceVersion":"340288044"}, "reason": "RunnerReplicaSetDeleted"} 2023-04-13T14:27:25Z INFO runner Removed finalizer {"runner": "circle-techops-test-actions/circle-techops-test-runner-small-kjklm-w65pz"} ```

Whole Runner Pod Logs

Runner Pod Logs ```shell Defaulted container "runner" out of: runner, docker 2023-04-13 14:26:51.338 NOTICE --- Runner init started with pid 7 2023-04-13 14:26:51.343 DEBUG --- Github endpoint URL https://github.com/ 2023-04-13 14:26:52.119 DEBUG --- Passing --ephemeral to config.sh to enable the ephemeral runner. 2023-04-13 14:26:52.124 DEBUG --- Configuring the runner. -------------------------------------------------------------------------------- | ____ _ _ _ _ _ _ _ _ | | / ___(_) |_| | | |_ _| |__ / \ ___| |_(_) ___ _ __ ___ | | | | _| | __| |_| | | | | '_ \ / _ \ / __| __| |/ _ \| '_ \/ __| | | | |_| | | |_| _ | |_| | |_) | / ___ \ (__| |_| | (_) | | | \__ \ | | \____|_|\__|_| |_|\__,_|_.__/ /_/ \_\___|\__|_|\___/|_| |_|___/ | | | | Self-hosted runner registration | | | -------------------------------------------------------------------------------- # Authentication √ Connected to GitHub # Runner Registration √ Runner successfully added √ Runner connection is good # Runner settings √ Settings Saved. 2023-04-13 14:26:56.278 DEBUG --- Runner successfully configured. { "agentId": 738, "agentName": "circle-techops-test-runner-2xlarge-5hf5p-8w7x9", "poolId": 1, "poolName": "Default", "ephemeral": true, "serverUrl": "https://pipelines.actions.githubusercontent.com/oNYkxkRnelTWP0h2CkmfKJRP2TzZhmT7xnwlRLZMox9hhZmuzc", "gitHubUrl": "https://github.com/circle-techops-test", "workFolder": "/runner/_work" 2023-04-13 14:26:56.296 DEBUG --- Docker enabled runner detected and Docker daemon wait is enabled 2023-04-13 14:26:56.300 DEBUG --- Waiting until Docker is available or the timeout of 120 seconds is reached unable to resolve docker endpoint: open /certs/client/ca.pem: no such file or directory unable to resolve docker endpoint: open /certs/client/ca.pem: no such file or directory unable to resolve docker endpoint: open /certs/client/ca.pem: no such file or directory unable to resolve docker endpoint: open /certs/client/ca.pem: no such file or directory unable to resolve docker endpoint: open /certs/client/ca.pem: no such file or directory unable to resolve docker endpoint: open /certs/client/ca.pem: no such file or directory unable to resolve docker endpoint: open /certs/client/ca.pem: no such file or directory Cannot connect to the Docker daemon at tcp://localhost:2376. Is the docker daemon running? }CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 2023-04-13 14:27:06.233 NOTICE --- WARNING LATEST TAG HAS BEEN DEPRECATED. SEE GITHUB ISSUE FOR DETAILS: 2023-04-13 14:27:06.253 NOTICE --- https://github.com/actions/actions-runner-controller/issues/2056 √ Connected to GitHub Current runner version: '2.303.0' 2023-04-13 14:27:08Z: Listening for Jobs ```

Additional Context

No response

github-actions[bot] commented 1 year ago

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

ali-kafel commented 1 year ago

This issue was caused by datadog agent pods running on the same cluster. Detailed here: https://github.com/DataDog/datadog-agent/issues/18228

ali-kafel commented 1 year ago

This issue shows the solution: https://github.com/moby/buildkit/issues/3812