ctrox / zeropod

pod that scales down to zero
Apache License 2.0
27 stars 3 forks source link

Failed to create pod sandbox #28

Closed DragonHunter274 closed 10 hours ago

DragonHunter274 commented 2 days ago

Trying to use zeropod on k3s, kubernetes version v1.27.7 getting this error: FailedCreatePodSandBox 13s (x11 over 2m20s) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox runtime: no runtime for "zeropod" is configured

Kubernetes Manifest (click to reveal) ```yaml apiVersion: v1 kind: List items: - apiVersion: v1 kind: Namespace metadata: labels: kubernetes.io/metadata.name: zeropod-system name: zeropod-system spec: finalizers: - kubernetes - apiVersion: v1 kind: ServiceAccount metadata: name: zeropod-node namespace: zeropod-system - apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: zeropod:pod-updater rules: - apiGroups: [""] resources: ["pods"] verbs: ["get", "update"] - apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: zeropod:runtimeclass-installer rules: - apiGroups: ["node.k8s.io"] resources: ["runtimeclasses"] verbs: ["create", "delete", "update"] - apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: zeropod:pod-updater roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: zeropod:pod-updater subjects: - kind: ServiceAccount name: zeropod-node namespace: zeropod-system - apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: zeropod:runtimeclass-installer roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: zeropod:runtimeclass-installer subjects: - kind: ServiceAccount name: zeropod-node namespace: zeropod-system - apiVersion: apps/v1 kind: DaemonSet metadata: labels: app.kubernetes.io/name: zeropod-node name: zeropod-node namespace: zeropod-system spec: revisionHistoryLimit: 10 selector: matchLabels: app.kubernetes.io/name: zeropod-node template: metadata: labels: app.kubernetes.io/name: zeropod-node spec: containers: - args: - -metrics-addr=:8080 - -status-labels=true command: - /zeropod-manager image: ghcr.io/ctrox/zeropod-manager:v0.4.1 imagePullPolicy: IfNotPresent name: manager ports: - containerPort: 8080 name: metrics protocol: TCP securityContext: capabilities: add: - SYS_PTRACE - SYS_ADMIN - NET_ADMIN privileged: true volumeMounts: - mountPath: /run/zeropod name: zeropod-run - mountPath: /hostproc name: hostproc - mountPath: /sys/fs/bpf name: bpf initContainers: - args: - -criu-image=ghcr.io/ctrox/zeropod-criu:v3.19 image: ghcr.io/ctrox/zeropod-installer:v0.4.1 imagePullPolicy: IfNotPresent name: installer volumeMounts: - mountPath: /etc/containerd name: containerd-etc - mountPath: /run/containerd name: containerd-run - mountPath: /opt/zeropod name: zeropod-opt - mountPath: /run/systemd name: systemd-run - mountPath: /etc/criu name: criu-etc - args: - mount | grep "/sys/fs/bpf type bpf" || mount -t bpf bpf /sys/fs/bpf command: - /bin/sh - -c - -- image: alpine:3.19.1 imagePullPolicy: IfNotPresent name: prepare-bpf-fs securityContext: privileged: true volumeMounts: - mountPath: /sys/fs/bpf mountPropagation: Bidirectional name: bpf nodeSelector: zeropod.ctrox.dev/node: "true" serviceAccountName: zeropod-node tolerations: - operator: Exists volumes: - hostPath: path: /var/lib/rancher/k3s/agent/etc/containerd/ name: containerd-etc - hostPath: path: /run/k3s/containerd name: containerd-run - hostPath: path: /var/lib/rancher/k3s/agent/containerd name: zeropod-opt - hostPath: path: /run/zeropod name: zeropod-run - hostPath: path: /run/systemd name: systemd-run - hostPath: path: /etc/criu name: criu-etc - hostPath: path: /proc type: Directory name: hostproc - hostPath: path: /sys/fs/bpf type: Directory name: bpf updateStrategy: rollingUpdate: maxSurge: 0 maxUnavailable: 1 type: RollingUpdate ```
ctrox commented 2 days ago

Hi, if the runtimeclass is not installed, I suspect the installer did not finish properly. Can you check the logs of the installer?

kubectl -n zeropod-system logs -l app.kubernetes.io/name=zeropod-node -c installer

If you have no zeropod-node pod, your node might simply be missing the required label: kubectl label node <your node> zeropod.ctrox.dev/node=true

DragonHunter274 commented 1 day ago

The weird thing is, the runtimeClass is installed, the Installer log is:

2024/09/16 13:42:17 installed criu binaries
2024/09/16 13:42:17 installing runtime for containerd
2024/09/16 13:42:17 runtime already configured, no need to restart containerd
2024/09/16 13:42:17 installed runtime
2024/09/16 13:42:17 installed runtimeClass
2024/09/16 13:42:17 installer completed

kubectl get runtimeclass -A

NAME      HANDLER   AGE
zeropod   zeropod   31h
ctrox commented 15 hours ago

Oh, then the problem is somewhere else, I think containerd does not know the zeropod runtime. This can happen if containerd got configured (as the log seems to confirm) but k3s/containerd was not restarted properly. Can you manually restart k3s agent/server on the affected node to see if that fixes the issue?

DragonHunter274 commented 14 hours ago

I already tried that, it didn't help

ctrox commented 14 hours ago

Can you post the containerd config?

cat /var/lib/rancher/k3s/agent/etc/containerd/config.toml
# and the template
cat /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl
DragonHunter274 commented 14 hours ago
Containerd config ```toml # File generated by k3s. DO NOT EDIT. Use config.toml.tmpl instead. version = 2 [plugins."io.containerd.internal.v1.opt"] path = "/var/lib/rancher/k3s/agent/containerd" [plugins."io.containerd.grpc.v1.cri"] stream_server_address = "127.0.0.1" stream_server_port = "10010" enable_selinux = false enable_unprivileged_ports = true enable_unprivileged_icmp = true sandbox_image = "rancher/mirrored-pause:3.6" [plugins."io.containerd.grpc.v1.cri".containerd] snapshotter = "overlayfs" disable_snapshot_annotations = true [plugins."io.containerd.grpc.v1.cri".cni] bin_dir = "/var/lib/rancher/k3s/data/bf3548384eaabb3435bf08112f1b0cba1afc5add6a6f2f2372aa2906a598fd04/bin" conf_dir = "/var/lib/rancher/k3s/agent/etc/cni/net.d" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc] runtime_type = "io.containerd.runc.v2" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] SystemdCgroup = true ```

config.toml.tmpl doesn't exist

DragonHunter274 commented 13 hours ago

I forgot to configure -runtime=k3s, I just fixed that but it still doesn't create the .tmpl file

DragonHunter274 commented 12 hours ago

another restart of k3s fixed it, the nginx example runs now but it never scales down

DragonHunter274 commented 10 hours ago

Closing this, continuing the other issue in #29