Can't initialize Kubernetes-Cluster with cri-dockerd

fakoe commented 1 year ago

Hi,

I'm relatively new to Kubernetes and the general setup. I read some guides and the official documentations of Kubernetes / Mirantis / Docker on how to install all required components for a cluster-setup. I tried to get the Kubernetes control-plane to be setup successfully, but it doesn't work with cri-dockerd. I also tried it with the latest containerd.io and it was successful. So I figured I might ask for help in this repository, since the setup workds with containerd, but not with cri-dockerd. I wrote down all my (exact) installation steps below. After kubeadm init ... I run into a timeout and get an error message (described in Actual behaviour). Can you help me fixing this problem? I can provide further logs / diagnostics of my system, if you tell me where to look.

Thanks in advance!

Expected behaviour The initalization runs without problems and I get to "Your Kubernetes control-plane has initialized successfully!"

Actual behaviour I get the following error message:

Unfortunately, an error has occurred:
        timed out waiting for the condition

This error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

General information about setup

VM with Ubuntu 22.04, 4 CPUs, 4 GB RAM, 30 GB disk
Docker version: 23.0.3
cri-dockerd version: 0.3.1 (HEAD)
Kubernetes version: 1.27

Installation steps

Disable swap

sudo swapoff -a
sudo vim /etc/fstab # set hashtag in front of swap line
sudo reboot

Enable kernel modules

sudo tee /etc/modules-load.d/k8s.conf <<EOF
overlay
br_netfilter
EOF

Load modules

sudo modprobe overlay
sudo modprobe br_netfilter

Setup required sysctl params

sudo tee /etc/sysctl.d/kubernetes.conf<<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF

Reload sysctl
```
sudo sysctl --system
```

Install docker

sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo tee /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
EOF
systemctl enable docker
sudo reboot

Install cri-dockerd (as root)

git clone https://github.com/Mirantis/cri-dockerd.git
wget https://storage.googleapis.com/golang/getgo/installer_linux
chmod +x ./installer_linux
./installer_linux
source ~/.bash_profile
cd cri-dockerd
mkdir bin
go build -o bin/cri-dockerd
mkdir -p /usr/local/bin
install -o root -g root -m 0755 bin/cri-dockerd /usr/local/bin/cri-dockerd
cp -a packaging/systemd/* /etc/systemd/system
sed -i -e 's,/usr/bin/cri-dockerd,/usr/local/bin/cri-dockerd,' /etc/systemd/system/cri-docker.service
systemctl daemon-reload
systemctl enable cri-docker.service
systemctl enable --now cri-docker.socket

Install Kubernetes

sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl
sudo curl -fsSLo /etc/apt/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

Initialize cluster sudo kubeadm init --pod-network-cidr 192.168.0.0/16 --cri-socket unix:///var/run/cri-dockerd.sock

evol262 commented 1 year ago

Please provide logs from kudeadm

fakoe commented 1 year ago

This is the full stacktrace of the initalization:

I0413 10:43:42.502238    4864 interface.go:432] Looking for default routes with IPv4 addresses
I0413 10:43:42.502267    4864 interface.go:437] Default route transits interface "ens18"
I0413 10:43:42.502354    4864 interface.go:209] Interface ens18 is up
I0413 10:43:42.502393    4864 interface.go:257] Interface "ens18" has 2 addresses :[10.99.132.26/22 fe80::c838:56ff:fe4e:c487/64].
I0413 10:43:42.502406    4864 interface.go:224] Checking addr  10.99.132.26/22.
I0413 10:43:42.502411    4864 interface.go:231] IP found 10.99.132.26
I0413 10:43:42.502431    4864 interface.go:263] Found valid IPv4 address 10.99.132.26 for interface "ens18".
I0413 10:43:42.502439    4864 interface.go:443] Found active IP 10.99.132.26
I0413 10:43:42.502455    4864 kubelet.go:196] the value of KubeletConfiguration.cgroupDriver is empty; setting it to "systemd"
I0413 10:43:42.507288    4864 version.go:187] fetching Kubernetes version from URL: https://dl.k8s.io/release/stable-1.txt
[init] Using Kubernetes version: v1.27.0
[preflight] Running pre-flight checks
I0413 10:43:42.945684    4864 checks.go:563] validating Kubernetes and kubeadm version
I0413 10:43:42.945718    4864 checks.go:168] validating if the firewall is enabled and active
I0413 10:43:42.951622    4864 checks.go:203] validating availability of port 6443
I0413 10:43:42.951767    4864 checks.go:203] validating availability of port 10259
I0413 10:43:42.951793    4864 checks.go:203] validating availability of port 10257
I0413 10:43:42.951817    4864 checks.go:280] validating the existence of file /etc/kubernetes/manifests/kube-apiserver.yaml
I0413 10:43:42.966786    4864 checks.go:280] validating the existence of file /etc/kubernetes/manifests/kube-controller-manager.yaml
I0413 10:43:42.966806    4864 checks.go:280] validating the existence of file /etc/kubernetes/manifests/kube-scheduler.yaml
I0413 10:43:42.966813    4864 checks.go:280] validating the existence of file /etc/kubernetes/manifests/etcd.yaml
I0413 10:43:42.966822    4864 checks.go:430] validating if the connectivity type is via proxy or direct
I0413 10:43:42.966837    4864 checks.go:469] validating http connectivity to first IP address in the CIDR
I0413 10:43:42.966851    4864 checks.go:469] validating http connectivity to first IP address in the CIDR
I0413 10:43:42.966861    4864 checks.go:104] validating the container runtime
I0413 10:43:43.780194    4864 checks.go:639] validating whether swap is enabled or not
I0413 10:43:43.780262    4864 checks.go:370] validating the presence of executable crictl
I0413 10:43:43.780283    4864 checks.go:370] validating the presence of executable conntrack
I0413 10:43:43.780292    4864 checks.go:370] validating the presence of executable ip
I0413 10:43:43.780304    4864 checks.go:370] validating the presence of executable iptables
I0413 10:43:43.780317    4864 checks.go:370] validating the presence of executable mount
I0413 10:43:43.780329    4864 checks.go:370] validating the presence of executable nsenter
I0413 10:43:43.780340    4864 checks.go:370] validating the presence of executable ebtables
I0413 10:43:43.780386    4864 checks.go:370] validating the presence of executable ethtool
I0413 10:43:43.780412    4864 checks.go:370] validating the presence of executable socat
I0413 10:43:43.780453    4864 checks.go:370] validating the presence of executable tc
I0413 10:43:43.780492    4864 checks.go:370] validating the presence of executable touch
I0413 10:43:43.780506    4864 checks.go:516] running all checks
I0413 10:43:43.798457    4864 checks.go:401] checking whether the given node name is valid and reachable using net.LookupHost
I0413 10:43:43.798479    4864 checks.go:605] validating kubelet version
I0413 10:43:43.838385    4864 checks.go:130] validating if the "kubelet" service is enabled and active
I0413 10:43:43.882023    4864 checks.go:203] validating availability of port 10250
I0413 10:43:43.882083    4864 checks.go:329] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
I0413 10:43:43.882131    4864 checks.go:329] validating the contents of file /proc/sys/net/ipv4/ip_forward
I0413 10:43:43.882149    4864 checks.go:203] validating availability of port 2379
I0413 10:43:43.882167    4864 checks.go:203] validating availability of port 2380
I0413 10:43:43.882183    4864 checks.go:243] validating the existence and emptiness of directory /var/lib/etcd
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
W0413 10:43:43.882420    4864 images.go:80] could not find officially supported version of etcd for Kubernetes v1.27.0, falling back to the nearest etcd version (3.5.7-0)
I0413 10:43:43.882435    4864 checks.go:828] using image pull policy: IfNotPresent
I0413 10:43:43.950959    4864 checks.go:854] pulling: registry.k8s.io/kube-apiserver:v1.27.0
I0413 10:43:51.632545    4864 checks.go:854] pulling: registry.k8s.io/kube-controller-manager:v1.27.0
I0413 10:43:58.612257    4864 checks.go:854] pulling: registry.k8s.io/kube-scheduler:v1.27.0
I0413 10:44:00.510757    4864 checks.go:854] pulling: registry.k8s.io/kube-proxy:v1.27.0
I0413 10:44:03.587274    4864 checks.go:833] failed to detect the sandbox image for local container runtime, output: time="2023-04-13T10:44:03Z" level=fatal msg="getting status of runtime: failed to template data: template: tmplExecuteRawJSON:1:9: executing \"tmplExecuteRawJSON\" at <.config.sandboxImage>: map has no entry for key \"config\""
, error: exit status 1
I0413 10:44:03.611804    4864 checks.go:854] pulling: registry.k8s.io/pause:3.9
I0413 10:44:04.728341    4864 checks.go:854] pulling: registry.k8s.io/etcd:3.5.7-0
I0413 10:44:21.136428    4864 checks.go:854] pulling: registry.k8s.io/coredns/coredns:v1.10.1
[certs] Using certificateDir folder "/etc/kubernetes/pki"
I0413 10:44:22.906828    4864 certs.go:112] creating a new certificate authority for ca
[certs] Generating "ca" certificate and key
I0413 10:44:23.120082    4864 certs.go:519] validating certificate period for ca certificate
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local kubernetes2023master] and IPs [10.96.0.1 10.99.132.26]
[certs] Generating "apiserver-kubelet-client" certificate and key
I0413 10:44:23.392790    4864 certs.go:112] creating a new certificate authority for front-proxy-ca
[certs] Generating "front-proxy-ca" certificate and key
I0413 10:44:23.541900    4864 certs.go:519] validating certificate period for front-proxy-ca certificate
[certs] Generating "front-proxy-client" certificate and key
I0413 10:44:23.599912    4864 certs.go:112] creating a new certificate authority for etcd-ca
[certs] Generating "etcd/ca" certificate and key
I0413 10:44:23.746892    4864 certs.go:519] validating certificate period for etcd/ca certificate
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kubernetes2023master localhost] and IPs [10.99.132.26 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [kubernetes2023master localhost] and IPs [10.99.132.26 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
I0413 10:44:24.415989    4864 certs.go:78] creating new public/private key files for signing service account users
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
I0413 10:44:24.541200    4864 kubeconfig.go:103] creating kubeconfig file for admin.conf
[kubeconfig] Writing "admin.conf" kubeconfig file
I0413 10:44:24.876449    4864 kubeconfig.go:103] creating kubeconfig file for kubelet.conf
[kubeconfig] Writing "kubelet.conf" kubeconfig file
I0413 10:44:25.073252    4864 kubeconfig.go:103] creating kubeconfig file for controller-manager.conf
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
I0413 10:44:25.384124    4864 kubeconfig.go:103] creating kubeconfig file for scheduler.conf
[kubeconfig] Writing "scheduler.conf" kubeconfig file
I0413 10:44:25.732113    4864 kubelet.go:67] Stopping the kubelet
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
I0413 10:44:26.094621    4864 manifests.go:99] [control-plane] getting StaticPodSpecs
I0413 10:44:26.095001    4864 certs.go:519] validating certificate period for CA certificate
I0413 10:44:26.095068    4864 manifests.go:125] [control-plane] adding volume "ca-certs" for component "kube-apiserver"
I0413 10:44:26.095076    4864 manifests.go:125] [control-plane] adding volume "etc-ca-certificates" for component "kube-apiserver"
I0413 10:44:26.095081    4864 manifests.go:125] [control-plane] adding volume "etc-pki" for component "kube-apiserver"
I0413 10:44:26.095087    4864 manifests.go:125] [control-plane] adding volume "k8s-certs" for component "kube-apiserver"
I0413 10:44:26.095091    4864 manifests.go:125] [control-plane] adding volume "usr-local-share-ca-certificates" for component "kube-apiserver"
I0413 10:44:26.095099    4864 manifests.go:125] [control-plane] adding volume "usr-share-ca-certificates" for component "kube-apiserver"
I0413 10:44:26.109452    4864 manifests.go:154] [control-plane] wrote static Pod manifest for component "kube-apiserver" to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
I0413 10:44:26.109480    4864 manifests.go:99] [control-plane] getting StaticPodSpecs
I0413 10:44:26.109708    4864 manifests.go:125] [control-plane] adding volume "ca-certs" for component "kube-controller-manager"
I0413 10:44:26.109722    4864 manifests.go:125] [control-plane] adding volume "etc-ca-certificates" for component "kube-controller-manager"
I0413 10:44:26.109730    4864 manifests.go:125] [control-plane] adding volume "etc-pki" for component "kube-controller-manager"
I0413 10:44:26.109736    4864 manifests.go:125] [control-plane] adding volume "flexvolume-dir" for component "kube-controller-manager"
I0413 10:44:26.109742    4864 manifests.go:125] [control-plane] adding volume "k8s-certs" for component "kube-controller-manager"
I0413 10:44:26.109748    4864 manifests.go:125] [control-plane] adding volume "kubeconfig" for component "kube-controller-manager"
I0413 10:44:26.109753    4864 manifests.go:125] [control-plane] adding volume "usr-local-share-ca-certificates" for component "kube-controller-manager"
I0413 10:44:26.109757    4864 manifests.go:125] [control-plane] adding volume "usr-share-ca-certificates" for component "kube-controller-manager"
I0413 10:44:26.110386    4864 manifests.go:154] [control-plane] wrote static Pod manifest for component "kube-controller-manager" to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[control-plane] Creating static Pod manifest for "kube-scheduler"
I0413 10:44:26.110401    4864 manifests.go:99] [control-plane] getting StaticPodSpecs
I0413 10:44:26.110578    4864 manifests.go:125] [control-plane] adding volume "kubeconfig" for component "kube-scheduler"
I0413 10:44:26.110925    4864 manifests.go:154] [control-plane] wrote static Pod manifest for component "kube-scheduler" to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
W0413 10:44:26.111084    4864 images.go:80] could not find officially supported version of etcd for Kubernetes v1.27.0, falling back to the nearest etcd version (3.5.7-0)
I0413 10:44:26.126934    4864 local.go:65] [etcd] wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/manifests/etcd.yaml"
I0413 10:44:26.126961    4864 waitcontrolplane.go:83] [wait-control-plane] Waiting for the API server to be healthy
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
        timed out waiting for the condition

This error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
        - 'crictl --runtime-endpoint unix:///var/run/cri-dockerd.sock ps -a | grep kube | grep -v pause'
        Once you have found the failing container, you can inspect its logs with:
        - 'crictl --runtime-endpoint unix:///var/run/cri-dockerd.sock logs CONTAINERID'
couldn't initialize a Kubernetes cluster
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init.runWaitControlPlanePhase
        cmd/kubeadm/app/cmd/phases/init/waitcontrolplane.go:108
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:259
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:446
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:232
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
        cmd/kubeadm/app/cmd/init.go:111
github.com/spf13/cobra.(*Command).execute
        vendor/github.com/spf13/cobra/command.go:916
github.com/spf13/cobra.(*Command).ExecuteC
        vendor/github.com/spf13/cobra/command.go:1040
github.com/spf13/cobra.(*Command).Execute
        vendor/github.com/spf13/cobra/command.go:968
k8s.io/kubernetes/cmd/kubeadm/app.Run
        cmd/kubeadm/app/kubeadm.go:50
main.main
        cmd/kubeadm/kubeadm.go:25
runtime.main
        /usr/local/go/src/runtime/proc.go:250
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1598
error execution phase wait-control-plane
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:260
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:446
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:232
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
        cmd/kubeadm/app/cmd/init.go:111
github.com/spf13/cobra.(*Command).execute
        vendor/github.com/spf13/cobra/command.go:916
github.com/spf13/cobra.(*Command).ExecuteC
        vendor/github.com/spf13/cobra/command.go:1040
github.com/spf13/cobra.(*Command).Execute
        vendor/github.com/spf13/cobra/command.go:968
k8s.io/kubernetes/cmd/kubeadm/app.Run
        cmd/kubeadm/app/kubeadm.go:50
main.main
        cmd/kubeadm/kubeadm.go:25
runtime.main
        /usr/local/go/src/runtime/proc.go:250
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1598

This is an excerpt of the kubectl's journal:

Apr 13 10:51:48 kubernetes2023master kubelet[5454]: E0413 10:51:48.150851    5454 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"kubernetes2023master.175578a0835c3468", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"kubernetes2023master", UID:"kubernetes2023master", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasNoDiskPressure", Message:"Node kubernetes2023master status is now: NodeHasNoDiskPressure", Source:v1.EventSource{Component:"kubelet", Host:"kubernetes2023master"}, FirstTimestamp:time.Date(2023, time.April, 13, 10, 44, 26, 686706792, time.Local), LastTimestamp:time.Date(2023, time.April, 13, 10, 44, 26, 686706792, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://10.99.132.26:6443/api/v1/namespaces/default/events": dial tcp 10.99.132.26:6443: connect: connection refused'(may retry after sleeping)
Apr 13 10:51:49 kubernetes2023master kubelet[5454]: E0413 10:51:49.312244    5454 pod_workers.go:1281] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-scheduler\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-scheduler pod=kube-scheduler-kubernetes2023master_kube-system(01b93443896f05f35e58e6528727fbe2)\"" pod="kube-system/kube-scheduler-kubernetes2023master" podUID=01b93443896f05f35e58e6528727fbe2
Apr 13 10:51:49 kubernetes2023master kubelet[5454]: I0413 10:51:49.326705    5454 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="629459d15997d7fc643cb6f8e57c14dfa721fe1125c2ff5a855e9cfd61c22ccb"
Apr 13 10:51:49 kubernetes2023master kubelet[5454]: I0413 10:51:49.326860    5454 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="db323c83722bc9712690892034fe31c77f70dbaa990daa9283a8ce8458dbba75"
Apr 13 10:51:49 kubernetes2023master kubelet[5454]: E0413 10:51:49.370542    5454 pod_workers.go:1281] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"etcd\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=etcd pod=etcd-kubernetes2023master_kube-system(9fbcb229ba4f4d1422a77bb7e8754b2f)\"" pod="kube-system/etcd-kubernetes2023master" podUID=9fbcb229ba4f4d1422a77bb7e8754b2f
Apr 13 10:51:49 kubernetes2023master kubelet[5454]: E0413 10:51:49.383414    5454 pod_workers.go:1281] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 2m40s restarting failed container=kube-apiserver pod=kube-apiserver-kubernetes2023master_kube-system(50d99ddb3cabe509efcae42e576fd419)\"" pod="kube-system/kube-apiserver-kubernetes2023master" podUID=50d99ddb3cabe509efcae42e576fd419
Apr 13 10:51:49 kubernetes2023master kubelet[5454]: E0413 10:51:49.384218    5454 pod_workers.go:1281] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-controller-manager\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-controller-manager pod=kube-controller-manager-kubernetes2023master_kube-system(bf3cd71cea8fd4cc0dfa876ed8267d72)\"" pod="kube-system/kube-controller-manager-kubernetes2023master" podUID=bf3cd71cea8fd4cc0dfa876ed8267d72
Apr 13 10:51:49 kubernetes2023master kubelet[5454]: I0413 10:51:49.397667    5454 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="3d0aff0a8096208ea52a3896aa8fece15fa1ee0d1385055d325eb9757eaadb8e"
Apr 13 10:51:49 kubernetes2023master kubelet[5454]: I0413 10:51:49.397703    5454 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="b11ba4391ffd87089266a5e285e2a8dbe47385942f0fdcb0c630d02daef50269"
Apr 13 10:51:49 kubernetes2023master kubelet[5454]: I0413 10:51:49.413685    5454 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="a5e702a933897caa2004a80eeb29c29fbeb634dc98e9d511d04540a2bc75c20b"
Apr 13 10:51:49 kubernetes2023master kubelet[5454]: I0413 10:51:49.413705    5454 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="a2504213f90d01b529caff023797915d6905e0995c483412d5a89cae65b02fed"
Apr 13 10:51:49 kubernetes2023master kubelet[5454]: I0413 10:51:49.432689    5454 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="7da46526a0eb1e721778c8d9bc5f558949c3a80e18b74caa15b25e6e3ce3efa9"
Apr 13 10:51:50 kubernetes2023master kubelet[5454]: I0413 10:51:50.464176    5454 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="7d247eb6bf4d14b0fb3f8dee1e542db82ef01bc1bba138a1da1dc2e0442e8eeb"
Apr 13 10:51:50 kubernetes2023master kubelet[5454]: I0413 10:51:50.762097    5454 kubelet_node_status.go:70] "Attempting to register node" node="kubernetes2023master"
Apr 13 10:51:50 kubernetes2023master kubelet[5454]: E0413 10:51:50.762393    5454 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://10.99.132.26:6443/api/v1/nodes\": dial tcp 10.99.132.26:6443: connect: connection refused" node="kubernetes2023master"
Apr 13 10:51:50 kubernetes2023master kubelet[5454]: E0413 10:51:50.925853    5454 pod_workers.go:1281] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"etcd\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=etcd pod=etcd-kubernetes2023master_kube-system(9fbcb229ba4f4d1422a77bb7e8754b2f)\"" pod="kube-system/etcd-kubernetes2023master" podUID=9fbcb229ba4f4d1422a77bb7e8754b2f
Apr 13 10:51:51 kubernetes2023master kubelet[5454]: E0413 10:51:51.513091    5454 pod_workers.go:1281] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 2m40s restarting failed container=kube-apiserver pod=kube-apiserver-kubernetes2023master_kube-system(50d99ddb3cabe509efcae42e576fd419)\"" pod="kube-system/kube-apiserver-kubernetes2023master" podUID=50d99ddb3cabe509efcae42e576fd419
Apr 13 10:51:51 kubernetes2023master kubelet[5454]: I0413 10:51:51.530365    5454 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="db323c83722bc9712690892034fe31c77f70dbaa990daa9283a8ce8458dbba75"
Apr 13 10:51:51 kubernetes2023master kubelet[5454]: E0413 10:51:51.649786    5454 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://10.99.132.26:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/kubernetes2023master?timeout=10s\": dial tcp 10.99.132.26:6443: connect: connection refused" interval="7s"
Apr 13 10:51:51 kubernetes2023master kubelet[5454]: E0413 10:51:51.752916    5454 pod_workers.go:1281] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-controller-manager\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-controller-manager pod=kube-controller-manager-kubernetes2023master_kube-system(bf3cd71cea8fd4cc0dfa876ed8267d72)\"" pod="kube-system/kube-controller-manager-kubernetes2023master" podUID=bf3cd71cea8fd4cc0dfa876ed8267d72
Apr 13 10:51:51 kubernetes2023master kubelet[5454]: E0413 10:51:51.757166    5454 pod_workers.go:1281] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-scheduler\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-scheduler pod=kube-scheduler-kubernetes2023master_kube-system(01b93443896f05f35e58e6528727fbe2)\"" pod="kube-system/kube-scheduler-kubernetes2023master" podUID=01b93443896f05f35e58e6528727fbe2
Apr 13 10:51:51 kubernetes2023master kubelet[5454]: I0413 10:51:51.765867    5454 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="b11ba4391ffd87089266a5e285e2a8dbe47385942f0fdcb0c630d02daef50269"
Apr 13 10:51:51 kubernetes2023master kubelet[5454]: I0413 10:51:51.765888    5454 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="40fdb0abb3c2a90d7e7d8b928b66dec3e9a3bd1b6c5edbf13178903c604cc543"
Apr 13 10:51:51 kubernetes2023master kubelet[5454]: I0413 10:51:51.803063    5454 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="5c01a66c140f5d2c094860d7a45755563c73d8c6ccb06fb4df86085972252b7a"
Apr 13 10:51:52 kubernetes2023master kubelet[5454]: I0413 10:51:52.868081    5454 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="c896a3cc7073b3ea653c8050fc4f2db3af5958cdd8ea5d81a59eb88719309c53"
Apr 13 10:51:53 kubernetes2023master kubelet[5454]: E0413 10:51:53.678918    5454 pod_workers.go:1281] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 2m40s restarting failed container=kube-apiserver pod=kube-apiserver-kubernetes2023master_kube-system(50d99ddb3cabe509efcae42e576fd419)\"" pod="kube-system/kube-apiserver-kubernetes2023master" podUID=50d99ddb3cabe509efcae42e576fd419
Apr 13 10:51:53 kubernetes2023master kubelet[5454]: E0413 10:51:53.797516    5454 pod_workers.go:1281] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"etcd\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=etcd pod=etcd-kubernetes2023master_kube-system(9fbcb229ba4f4d1422a77bb7e8754b2f)\"" pod="kube-system/etcd-kubernetes2023master" podUID=9fbcb229ba4f4d1422a77bb7e8754b2f
Apr 13 10:51:54 kubernetes2023master kubelet[5454]: W0413 10:51:54.387840    5454 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://10.99.132.26:6443/api/v1/nodes?fieldSelector=metadata.name%3Dkubernetes2023master&limit=500&resourceVersion=0": dial tcp 10.99.132.26:6443: connect: connection refused
Apr 13 10:51:54 kubernetes2023master kubelet[5454]: E0413 10:51:54.387909    5454 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://10.99.132.26:6443/api/v1/nodes?fieldSelector=metadata.name%3Dkubernetes2023master&limit=500&resourceVersion=0": dial tcp 10.99.132.26:6443: connect: connection refused
Apr 13 10:51:54 kubernetes2023master kubelet[5454]: E0413 10:51:54.786090    5454 pod_workers.go:1281] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-controller-manager\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-controller-manager pod=kube-controller-manager-kubernetes2023master_kube-system(bf3cd71cea8fd4cc0dfa876ed8267d72)\"" pod="kube-system/kube-controller-manager-kubernetes2023master" podUID=bf3cd71cea8fd4cc0dfa876ed8267d72
Apr 13 10:51:54 kubernetes2023master kubelet[5454]: E0413 10:51:54.797220    5454 pod_workers.go:1281] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-scheduler\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-scheduler pod=kube-scheduler-kubernetes2023master_kube-system(01b93443896f05f35e58e6528727fbe2)\"" pod="kube-system/kube-scheduler-kubernetes2023master" podUID=01b93443896f05f35e58e6528727fbe2
Apr 13 10:51:54 kubernetes2023master kubelet[5454]: E0413 10:51:54.806466    5454 pod_workers.go:1281] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"etcd\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=etcd pod=etcd-kubernetes2023master_kube-system(9fbcb229ba4f4d1422a77bb7e8754b2f)\"" pod="kube-system/etcd-kubernetes2023master" podUID=9fbcb229ba4f4d1422a77bb7e8754b2f
Apr 13 10:51:54 kubernetes2023master kubelet[5454]: E0413 10:51:54.810932    5454 pod_workers.go:1281] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 2m40s restarting failed container=kube-apiserver pod=kube-apiserver-kubernetes2023master_kube-system(50d99ddb3cabe509efcae42e576fd419)\"" pod="kube-system/kube-apiserver-kubernetes2023master" podUID=50d99ddb3cabe509efcae42e576fd419
Apr 13 10:51:54 kubernetes2023master kubelet[5454]: I0413 10:51:54.951435    5454 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="6d458255ba187a8dc60b552bdbed40432116411e1e27caee72468fa76e407615"
Apr 13 10:51:54 kubernetes2023master kubelet[5454]: I0413 10:51:54.971965    5454 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="5444aadb15b531aa24a1a2dcc09f12218cb3b709a8ccaee7b10c3d8ec5cead7d"
Apr 13 10:51:54 kubernetes2023master kubelet[5454]: I0413 10:51:54.997265    5454 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="40fdb0abb3c2a90d7e7d8b928b66dec3e9a3bd1b6c5edbf13178903c604cc543"
Apr 13 10:51:55 kubernetes2023master kubelet[5454]: I0413 10:51:55.019924    5454 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="51b0e82a2952d972ea6226a0bb8e0f2517b99b02ab12e391b66e1ac90c2d5967"
Apr 13 10:51:56 kubernetes2023master kubelet[5454]: E0413 10:51:56.537583    5454 pod_workers.go:1281] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-scheduler\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-scheduler pod=kube-scheduler-kubernetes2023master_kube-system(01b93443896f05f35e58e6528727fbe2)\"" pod="kube-system/kube-scheduler-kubernetes2023master" podUID=01b93443896f05f35e58e6528727fbe2
Apr 13 10:51:56 kubernetes2023master kubelet[5454]: E0413 10:51:56.539847    5454 pod_workers.go:1281] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 2m40s restarting failed container=kube-apiserver pod=kube-apiserver-kubernetes2023master_kube-system(50d99ddb3cabe509efcae42e576fd419)\"" pod="kube-system/kube-apiserver-kubernetes2023master" podUID=50d99ddb3cabe509efcae42e576fd419
Apr 13 10:51:56 kubernetes2023master kubelet[5454]: I0413 10:51:56.552678    5454 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="1ca88116a657fc08792718629e349eda0aa822ada6c29e807f7be3f5c391ad9d"
Apr 13 10:51:56 kubernetes2023master kubelet[5454]: I0413 10:51:56.552701    5454 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="5199c650e3cd9e01b91274cecf680ca50950794982e3d4d5ee14d0cc09c1c7ae"
Apr 13 10:51:56 kubernetes2023master kubelet[5454]: E0413 10:51:56.716946    5454 pod_workers.go:1281] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"etcd\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=etcd pod=etcd-kubernetes2023master_kube-system(9fbcb229ba4f4d1422a77bb7e8754b2f)\"" pod="kube-system/etcd-kubernetes2023master" podUID=9fbcb229ba4f4d1422a77bb7e8754b2f
Apr 13 10:51:56 kubernetes2023master kubelet[5454]: E0413 10:51:56.759854    5454 pod_workers.go:1281] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-controller-manager\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-controller-manager pod=kube-controller-manager-kubernetes2023master_kube-system(bf3cd71cea8fd4cc0dfa876ed8267d72)\"" pod="kube-system/kube-controller-manager-kubernetes2023master" podUID=bf3cd71cea8fd4cc0dfa876ed8267d72
Apr 13 10:51:56 kubernetes2023master kubelet[5454]: I0413 10:51:56.774392    5454 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="a45fdbbe40ec4eed666cc834335ec3541f495c4a8d657f053f04eb687d0fc269"

evol262 commented 1 year ago

What user are you running this as? What are the permissions on the socket versus that user's group membership?

fakoe commented 1 year ago

The /var/run/cri-dockerd.sock is in group docker The user I run on is 1000 and in the groups: lobster adm cdrom sudo dip plugdev lxd

I installed cri-dockerd as root and ran every other command as user 1000

srw-rw---- 1 root docker 0 Apr 13 11:42 /var/run/cri-dockerd.sock
drwxr-x---  5 lobster lobster 4096 Apr 13 11:44 lobster

cuisongliu commented 1 year ago

I0413 10:44:03.587274 4864 checks.go:833] failed to detect the sandbox image for local container runtime, output: time="2023-04-13T10:44:03Z" level=fatal msg="getting status of runtime: failed to template data: template: tmplExecuteRawJSON:1:9: executing \"tmplExecuteRawJSON\" at <.config.sandboxImage>: map has no entry for key \"config\"" , error: exit status 1

cuisongliu commented 1 year ago

https://github.com/Mirantis/cri-dockerd/issues/178

evol262 commented 1 year ago

178

This has absolutely nothing to do with this.

The /var/run/cri-dockerd.sock is in group docker The user I run on is 1000 and in the groups: lobster adm cdrom sudo dip plugdev lxd

I installed cri-dockerd as root and ran every other command as user 1000
srw-rw---- 1 root docker 0 Apr 13 11:42 /var/run/cri-dockerd.sock
drwxr-x---  5 lobster lobster 4096 Apr 13 11:44 lobster

The user must be in the docker group or otherwise have permissions to actually manipulate containers.

fakoe commented 1 year ago

The user must be in the docker group or otherwise have permissions to actually manipulate containers.

I run kubeadm as sudo, so I think the permissions should be fine, shouldn't they? It was misleading from me to say I run every other command as user 1000. Well I did, but I used sudo :/ Sorry, that might haven't been clear. Basically you can copy-paste all commands from my initial post above. That is exactly all that I did.

1 1/2 years ago I created another Kubernetes-Cluster, when docker was still part of it and used almost the same procedure. I installed Docker, then Kubernetes and initialized with "sudo kubeadm...".

It was basically the same steps as I descriped on top of the thread, minus Enable kernel modules, Load modules, Setup required sysctl params, Reload sysctl and, obviously, Install cri-dockerd.

I also used Ubuntu 20.04 back then, could this be a problem, using 22.04 now?

I also tried setting up the current Kubernetes-Cluster with containerd, and this works actually fine. The only difference in the kubeadm output, compared to when I run it with cri-dockerd, is the error message cuisongliu cited, regarding the sandbox image.

Everything else, besides the failing of kubeadm is identical between containerd and cri-dockerd, when I initialize the cluster with kubeadm.

I wonder, has anyone of you created a new Kubernetes-Cluster from scratch since the new release of 1.27? I might try running Kubernetes 1.26, so if that fails also, it might be my setup and if it runs, it probably has something to do with 1.27 somehow.

afbjorklund commented 1 year ago

I wonder, has anyone of you created a new Kubernetes-Cluster from scratch since the new release of 1.27?

Yes.

Where does this argument come from: --pod-network-cidr 192.168.0.0/16. Does it conflict with the Docker network ?

Before your cluster is operational, you also need to configure CNI. But that should fail at a later stage, like with coredns.

systemctl status kubelet

fakoe commented 1 year ago

I wonder, has anyone of you created a new Kubernetes-Cluster from scratch since the new release of 1.27?

Yes.

Where does this argument come from: --pod-network-cidr 192.168.0.0/16. Does it conflict with the Docker network ?

Before your cluster is operational, you also need to configure CNI. But that should fail at a later stage, like with coredns.

As far as I know, it will set the ip-range for the CNI. I want to use calico, so I set the range myself. For flannel, it is required to be set to "10.244.0.0/16". But I'm not even able to get to install either of them, since the init itself won't finish successfully.

I tested to install cri-dockerd with Kubernetes 1.26.X and 1.25.X. Both of them run into the same error as 1.27.X, so I assume, it must be something else than the issue that cuisongliu mentioned (sandbox image config). Because 1.26 and 1.25 both didn't have that error popping up during kubeadm init, but still fail with the same results and logs from kubelet (see second post of me)

evol262 commented 1 year ago

I wonder, has anyone of you created a new Kubernetes-Cluster from scratch since the new release of 1.27?

Yes. Where does this argument come from: --pod-network-cidr 192.168.0.0/16. Does it conflict with the Docker network ? Before your cluster is operational, you also need to configure CNI. But that should fail at a later stage, like with coredns.

As far as I know, it will set the ip-range for the CNI. I want to use calico, so I set the range myself. For flannel, it is required to be set to "10.244.0.0/16". But I'm not even able to get to install either of them, since the init itself won't finish successfully.

I tested to install cri-dockerd with Kubernetes 1.26.X and 1.25.X. Both of them run into the same error as 1.27.X, so I assume, it must be something else than the issue that cuisongliu mentioned (sandbox image config). Because 1.26 and 1.25 both didn't have that error popping up during kubeadm init, but still fail with the same results and logs from kubelet (see second post of me)

Does that CIDR overlap with your normal network?

fakoe commented 1 year ago

Does that CIDR overlap with your normal network?

No, there are no networks, that overlap.

But I just found out, that if I use the pre-build binaries of cri-dockerd, the kubeadm init works without problems.

That means, if you take my installation steps from the first post and replace the step Install cri-dockerd (as root) with the following steps, the kubeadm initializes the cluster:

Download and install pre-build cri-dockerd binaries

VER=$(curl -s https://api.github.com/repos/Mirantis/cri-dockerd/releases/latest|grep tag_name | cut -d '"' -f 4|sed 's/v//g')
wget https://github.com/Mirantis/cri-dockerd/releases/download/v${VER}/cri-dockerd-${VER}.amd64.tgz
tar xvf cri-dockerd-${VER}.amd64.tgz
sudo mv cri-dockerd/cri-dockerd /usr/local/bin/
wget https://raw.githubusercontent.com/Mirantis/cri-dockerd/master/packaging/systemd/cri-docker.service
wget https://raw.githubusercontent.com/Mirantis/cri-dockerd/master/packaging/systemd/cri-docker.socket
sudo mv cri-docker.socket cri-docker.service /etc/systemd/system/
sudo sed -i -e 's,/usr/bin/cri-dockerd,/usr/local/bin/cri-dockerd,' /etc/systemd/system/cri-docker.service
sudo systemctl daemon-reload
sudo systemctl enable cri-docker.service
sudo systemctl enable --now cri-docker.socket

The pre-build binaries have the version cri-dockerd 0.3.1 (7e528b98)

The binaries built from the README in this repository have the version cri-dockerd 0.3.1 (HEAD)

So my problem must be caused by code between 7e528b98 and HEAD.

afbjorklund commented 1 year ago

https://github.com/Mirantis/cri-dockerd/issues/172

I used cri-dockerd 0.3.1 (7e528b98), from cri-dockerd_0.3.1.3-0.ubuntu-focal_amd64.deb, for the vanilla testing.

evol262 commented 1 year ago

Upgrade to v0.3.2 all pods crashed #172

I just reverted this PR, since it appears to have bad side effects in cases where a CNI's IPAM isn't up and running (cluster upgrades, new clusters)

Mirantis / cri-dockerd

Can't initialize Kubernetes-Cluster with cri-dockerd #179

178