SSU-DCN / podmigration-operator

MIT License
24 stars 10 forks source link

kubeadm init error #16

Closed 120L020314 closed 9 months ago

120L020314 commented 9 months ago

when i run kubeadm init,i get errors like: root@server:/home/server/Downloads/tmp/zly/podmigration-operator# sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --cri-socket unix:///var/run/containerd/containerd.sock I0124 14:53:12.678735 24843 version.go:252] remote version is much newer: v1.29.1; falling back to: stable-1.19 W0124 14:53:14.622308 24843 configset.go:250] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io] [init] Using Kubernetes version: v1.19.16 [preflight] Running pre-flight checks error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR CRI]: container runtime is not running: output: time="2024-01-24T14:53:14+08:00" level=fatal msg="validate service connection: CRI v1 runtime API is not implemented for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService" , error: exit status 1 [preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=... To see the stack trace of this error execute with --v=5 or higher can anyone teach me how to solve this problem?i have no idea,thank you very much!

vutuong commented 9 months ago

@120L020314 please follow this guide here: https://github.com/SSU-DCN/podmigration-operator/blob/main/init-cluster-containerd-CRIU.md If you handle certain tasks independently, they fall outside the scope of this Git repository.

120L020314 commented 9 months ago

i follow the guide of the https://github.com/SSU-DCN/podmigration-operator/blob/main/init-cluster-containerd-CRIU.md,but when i run kuneadm init,it fails. root@server:/home/server/Downloads/tmp/zly/podmigration-operator# kubeadm init --kubernetes-version stable-1.19 --pod-network-cidr=10.244.0.0/16 W0124 15:29:45.025697 33937 configset.go:250] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io] [init] Using Kubernetes version: v1.19.16 [preflight] Running pre-flight checks error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR CRI]: container runtime is not running: output: time="2024-01-24T15:29:45+08:00" level=fatal msg="validate service connection: CRI v1 runtime API is not implemented for endpoint \"unix:///run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService" , error: exit status 1 [preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=... To see the stack trace of this error execute with --v=5 or higher my instruction is: sudo apt-get update sudo apt-get install gcc mkdir tmp cd tmp/ mkdir zly cd zly sudo wget https://golang.org/dl/go1.15.5.linux-amd64.tar.gz sudo tar -xzf go1.15.5.linux-amd64.tar.gz sudo mv go /usr/local sudo gedit $HOME/.profile 内容如下:

export GOROOT=/usr/local/go export GOPATH=$HOME/go export GOBIN=$GOPATH/bin export PATH=$GOROOT/bin:$GOBIN:$PATH

source $HOME/.profile go version sudo apt install make wget https://github.com/containerd/containerd/releases/download/v1.3.6/containerd-1.3.6-linux-amd64.tar.gz mkdir containerd tar -xvf containerd-1.3.6-linux-amd64.tar.gz -C containerd sudo mv containerd/bin/* /bin/ cd containerd/ wget https://k8s-pod-migration.obs.eu-de.otc.t-systems.com/v2/containerd cd .. apt install git -y git clone https://github.com/SSU-DCN/podmigration-operator.git cd podmigration-operator tar -vxf binaries.tar.bz2 cd custom-binaries/ chmod +x containerd sudo mv containerd /bin/ sudo mkdir /etc/containerd sudo gedit /etc/containerd/config.toml 内容如下:

[plugins] [plugins.cri.containerd] snapshotter = "overlayfs" [plugins.cri.containerd.default_runtime] runtime_type = "io.containerd.runtime.v1.linux" runtime_engine = "/usr/local/bin/runc" runtime_root = ""

wget https://github.com/opencontainers/runc/releases/download/v1.0.0-rc92/runc.amd64 whereis runc sudo mv runc.amd64 runc chmod +x runc sudo mv runc /usr/local/bin/ sudo gedit /etc/systemd/system/containerd.service 内容如下:

[Unit] Description=containerd container runtime Documentation=https://containerd.io After=network.target

[Service] ExecStartPre=/sbin/modprobe overlay ExecStart=/bin/containerd Restart=always RestartSec=5 Delegate=yes KillMode=process OOMScoreAdjust=-999 LimitNOFILE=1048576 LimitNPROC=infinity LimitCORE=infinity

[Install] WantedBy=multi-user.target

sudo systemctl daemon-reload sudo systemctl restart containerd sudo systemctl status containerd sudo gedit /etc/sysctl.conf 添加内容:

... net.bridge.bridge-nf-call-iptables = 1

sudo -s sudo echo '1' > /proc/sys/net/ipv4/ip_forward exit sudo sysctl --system sudo modprobe overlay sudo modprobe br_netfilter gedit /etc/hosts 内容如下:

192.168.31.47 server 192.168.31.48 agent1 192.168.31.49 agent2

cd .. cd .. apt install curl curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add sudo apt-add-repository "deb http://apt.kubernetes.io/ kubernetes-xenial main" sudo apt-get install kubeadm=1.19.0-00 kubelet=1.19.0-00 kubectl=1.19.0-00 -y whereis kubeadm whereis kubelet git clone https://github.com/vutuong/kubernetes.git cd podmigration-operator/custom-binaries chmod +x kubeadm kubelet sudo mv kubeadm kubelet /usr/bin/ sudo systemctl daemon-reload sudo systemctl restart kubelet sudo systemctl status kubelet sudo gedit /etc/fstab

/swapfile none swap sw 0 0

swapoff -a sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --cri-socket unix:///var/run/containerd/containerd.sock my environment is: ubuntu 18.04.6 i don't know which step is wrong,sorry to disturb you,but i am very interested in your work,thank you for help!

120L020314 commented 9 months ago

my containerd status like: root@server:/home/server/Downloads/tmp/zly/podmigration-operator# systemctl status containerd ● containerd.service - containerd container runtime Loaded: loaded (/etc/systemd/system/containerd.service; disabled; vendor preset: enabled) Active: active (running) since Wed 2024-01-24 15:35:02 CST; 16s ago Docs: https://containerd.io Process: 35109 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS) Main PID: 35111 (containerd) Tasks: 14 (limit: 4630) CGroup: /system.slice/containerd.service └─35111 /bin/containerd

1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.078811340+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." type=io.containerd.gr 1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.078959926+08:00" level=info msg="Start subscribing containerd event" 1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.079008127+08:00" level=info msg="Start recovering state" 1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.079056682+08:00" level=info msg="Start event monitor" 1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.079063528+08:00" level=info msg="Start snapshots syncer" 1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.079068185+08:00" level=info msg="Start cni network conf syncer" 1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.079072115+08:00" level=info msg="Start streaming server" 1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.079928061+08:00" level=info msg=serving... address=/run/containerd/containerd.sock.ttrpc 1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.080019996+08:00" level=info msg=serving... address=/run/containerd/containerd.sock 1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.080032902+08:00" level=info msg="containerd successfully booted in 0.017574s" and my kubelet status like: root@server:/home/server/Downloads/tmp/zly/podmigration-operator# systemctl status kubelet ● kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/kubelet.service.d └─10-kubeadm.conf Active: activating (auto-restart) (Result: exit-code) since Wed 2024-01-24 15:36:15 CST; 3s ago Docs: https://kubernetes.io/docs/home/ Process: 35409 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=255) Main PID: 35409 (code=exited, status=255) maybe because i haven't init my cluster successfully.

120L020314 commented 9 months ago

sorry,thank you for your answer,i succeed when i make by myself in my machine.the problem is containerd 's not installed correct.

vutuong commented 9 months ago

sorry,thank you for your answer,i succeed when i make by myself in my machine.the problem is containerd 's not installed correct.

Thank you for your interest. If you are success, please help to give me a star for my fame =))) . And please help to close this issue.

120L020314 commented 9 months ago

ok,thank you,i am trying to continue do your work in my ubuntu. maybe i will encount more question ,please help me,thank you very much !

120L020314 commented 9 months ago

root@server:/home/server/Downloads/tmp/zly# kubectl get pod -n kube-system NAME READY STATUS RESTARTS AGE coredns-856dbd57b4-4btrn 0/1 Pending 0 44m coredns-856dbd57b4-lqxvw 0/1 Pending 0 44m etcd-server 1/1 Running 0 44m kube-apiserver-server 1/1 Running 0 44m kube-controller-manager-server 1/1 Running 0 44m kube-proxy-g5brg 1/1 Running 0 44m kube-proxy-kqkbj 1/1 Running 0 23s kube-scheduler-server 1/1 Running 0 44m root@server:/home/server/Downloads/tmp/zly# kubectl describe pod coredns-856dbd57b4-4btrn -n kube-system Name: coredns-856dbd57b4-4btrn Namespace: kube-system Priority: 2000000000 Priority Class Name: system-cluster-critical Node: Labels: k8s-app=kube-dns pod-template-hash=856dbd57b4 Annotations: Status: Pending IP:
IPs: Controlled By: ReplicaSet/coredns-856dbd57b4 Containers: coredns: Image: k8s.gcr.io/coredns:1.6.7 Ports: 53/UDP, 53/TCP, 9153/TCP Host Ports: 0/UDP, 0/TCP, 0/TCP Args: -conf /etc/coredns/Corefile Limits: memory: 170Mi Requests: cpu: 100m memory: 70Mi Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5 Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: Mounts: /etc/coredns from config-volume (ro) /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-475rr (ro) Conditions: Type Status PodScheduled False Volumes: config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: coredns Optional: false coredns-token-475rr: Type: Secret (a volume populated by a Secret) SecretName: coredns-token-475rr Optional: false QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: CriticalAddonsOnly op=Exists node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message


Warning FailedScheduling 2m4s (x32 over 45m) default-scheduler 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate. Warning FailedScheduling 54s default-scheduler 0/2 nodes are available: 2 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate. sorry,but my core-dns can not be schduled,maybe i need flannel cni plugin? @vutuong thank you very much,i am sorry to disturb you.

vutuong commented 9 months ago

@120L020314 please check your node and node taint information in your node from:

 k get nodes
 k describe node your_node_name
120L020314 commented 9 months ago

@vutuong sorry to disturb you,but when i run kubectl checkpoint simple /var/lib/kubelet/migration/simple,i find can not make a checkpoint.i don't know how to solve this. simple is a pod running in work1,and i look the log of the kubelet in this node ,like this:1月 24 21:49:09 agent1 kubelet[111046]: I0124 21:49:09.692462 111046 kubelet.go:1505] Checkpoint the firstime running pod to use for other scale without booting from scratch: %+vsimple 1月 24 21:49:09 agent1 kubelet[111046]: E0124 21:49:09.692913 111046 remote_runtime.go:289] CheckpointContainer "5fab4d089320a38aa93aed6b865b306d5764ca1643ea82446e3d0097e05cb584" from runtime service failed: rpc error: code = Unimplemented desc = unknown method CheckpointContainer for service runtime.v1alpha2.RuntimeService 1月 24 21:49:09 agent1 kubelet[111046]: I0124 21:49:09.693279 111046 kuberuntime_manager.go:841] Should we migrate?Runningfalse 1月 24 21:49:36 agent1 kubelet[111046]: I0124 21:49:36.691280 111046 kuberuntime_manager.go:841] Should we migrate?Runningfalse 1月 24 21:50:15 agent1 kubelet[111046]: I0124 21:50:15.691442 111046 kuberuntime_manager.go:841] Should we migrate?Runningfalse 1月 24 21:50:19 agent1 kubelet[111046]: I0124 21:50:19.691534 111046 kuberuntime_manager.go:841] Should we migrate?Runningfalse please help me ,thank you very much!

120L020314 commented 9 months ago

my virtual machine is Ubuntu 18.04.6,when i run criu check --al,it shows root@agent1:/var/lib/kubelet/migration# criu check --all Warn (criu/cr-check.c:1230): clone3() with set_tid not supported Error (criu/cr-check.c:1272): Time namespaces are not supported Looks good but some kernel features are missing which, depending on your process tree, may cause dump or restore failure. i do not know how to fix it。 @vutuong

120L020314 commented 9 months ago

sorry,i have solved my all question ! your work is so meaningful for me ,i learn a lot .the problem a lot is caused by containerd , i just replace containerd in /bin/,it works normal.