Closed 120L020314 closed 9 months ago
@120L020314 please follow this guide here: https://github.com/SSU-DCN/podmigration-operator/blob/main/init-cluster-containerd-CRIU.md If you handle certain tasks independently, they fall outside the scope of this Git repository.
--ignore-preflight-errors=...
To see the stack trace of this error execute with --v=5 or higher
my instruction is:
sudo apt-get update
sudo apt-get install gcc
mkdir tmp
cd tmp/
mkdir zly
cd zly
sudo wget https://golang.org/dl/go1.15.5.linux-amd64.tar.gz
sudo tar -xzf go1.15.5.linux-amd64.tar.gz
sudo mv go /usr/local
sudo gedit $HOME/.profile
内容如下:[Unit] Description=containerd container runtime Documentation=https://containerd.io After=network.target
[Service] ExecStartPre=/sbin/modprobe overlay ExecStart=/bin/containerd Restart=always RestartSec=5 Delegate=yes KillMode=process OOMScoreAdjust=-999 LimitNOFILE=1048576 LimitNPROC=infinity LimitCORE=infinity
cd .. cd .. apt install curl curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add sudo apt-add-repository "deb http://apt.kubernetes.io/ kubernetes-xenial main" sudo apt-get install kubeadm=1.19.0-00 kubelet=1.19.0-00 kubectl=1.19.0-00 -y whereis kubeadm whereis kubelet git clone https://github.com/vutuong/kubernetes.git cd podmigration-operator/custom-binaries chmod +x kubeadm kubelet sudo mv kubeadm kubelet /usr/bin/ sudo systemctl daemon-reload sudo systemctl restart kubelet sudo systemctl status kubelet sudo gedit /etc/fstab
swapoff -a sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --cri-socket unix:///var/run/containerd/containerd.sock my environment is: ubuntu 18.04.6 i don't know which step is wrong,sorry to disturb you,but i am very interested in your work,thank you for help!
my containerd status like: root@server:/home/server/Downloads/tmp/zly/podmigration-operator# systemctl status containerd ● containerd.service - containerd container runtime Loaded: loaded (/etc/systemd/system/containerd.service; disabled; vendor preset: enabled) Active: active (running) since Wed 2024-01-24 15:35:02 CST; 16s ago Docs: https://containerd.io Process: 35109 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS) Main PID: 35111 (containerd) Tasks: 14 (limit: 4630) CGroup: /system.slice/containerd.service └─35111 /bin/containerd
1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.078811340+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." type=io.containerd.gr 1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.078959926+08:00" level=info msg="Start subscribing containerd event" 1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.079008127+08:00" level=info msg="Start recovering state" 1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.079056682+08:00" level=info msg="Start event monitor" 1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.079063528+08:00" level=info msg="Start snapshots syncer" 1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.079068185+08:00" level=info msg="Start cni network conf syncer" 1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.079072115+08:00" level=info msg="Start streaming server" 1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.079928061+08:00" level=info msg=serving... address=/run/containerd/containerd.sock.ttrpc 1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.080019996+08:00" level=info msg=serving... address=/run/containerd/containerd.sock 1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.080032902+08:00" level=info msg="containerd successfully booted in 0.017574s" and my kubelet status like: root@server:/home/server/Downloads/tmp/zly/podmigration-operator# systemctl status kubelet ● kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/kubelet.service.d └─10-kubeadm.conf Active: activating (auto-restart) (Result: exit-code) since Wed 2024-01-24 15:36:15 CST; 3s ago Docs: https://kubernetes.io/docs/home/ Process: 35409 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=255) Main PID: 35409 (code=exited, status=255) maybe because i haven't init my cluster successfully.
sorry,thank you for your answer,i succeed when i make by myself in my machine.the problem is containerd 's not installed correct.
sorry,thank you for your answer,i succeed when i make by myself in my machine.the problem is containerd 's not installed correct.
Thank you for your interest. If you are success, please help to give me a star for my fame =))) . And please help to close this issue.
ok,thank you,i am trying to continue do your work in my ubuntu. maybe i will encount more question ,please help me,thank you very much !
root@server:/home/server/Downloads/tmp/zly# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-856dbd57b4-4btrn 0/1 Pending 0 44m
coredns-856dbd57b4-lqxvw 0/1 Pending 0 44m
etcd-server 1/1 Running 0 44m
kube-apiserver-server 1/1 Running 0 44m
kube-controller-manager-server 1/1 Running 0 44m
kube-proxy-g5brg 1/1 Running 0 44m
kube-proxy-kqkbj 1/1 Running 0 23s
kube-scheduler-server 1/1 Running 0 44m
root@server:/home/server/Downloads/tmp/zly# kubectl describe pod coredns-856dbd57b4-4btrn -n kube-system
Name: coredns-856dbd57b4-4btrn
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node:
IPs:
Warning FailedScheduling 2m4s (x32 over 45m) default-scheduler 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate. Warning FailedScheduling 54s default-scheduler 0/2 nodes are available: 2 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate. sorry,but my core-dns can not be schduled,maybe i need flannel cni plugin? @vutuong thank you very much,i am sorry to disturb you.
@120L020314 please check your node and node taint information in your node from:
k get nodes
k describe node your_node_name
@vutuong sorry to disturb you,but when i run kubectl checkpoint simple /var/lib/kubelet/migration/simple,i find can not make a checkpoint.i don't know how to solve this. simple is a pod running in work1,and i look the log of the kubelet in this node ,like this:1月 24 21:49:09 agent1 kubelet[111046]: I0124 21:49:09.692462 111046 kubelet.go:1505] Checkpoint the firstime running pod to use for other scale without booting from scratch: %+vsimple 1月 24 21:49:09 agent1 kubelet[111046]: E0124 21:49:09.692913 111046 remote_runtime.go:289] CheckpointContainer "5fab4d089320a38aa93aed6b865b306d5764ca1643ea82446e3d0097e05cb584" from runtime service failed: rpc error: code = Unimplemented desc = unknown method CheckpointContainer for service runtime.v1alpha2.RuntimeService 1月 24 21:49:09 agent1 kubelet[111046]: I0124 21:49:09.693279 111046 kuberuntime_manager.go:841] Should we migrate?Runningfalse 1月 24 21:49:36 agent1 kubelet[111046]: I0124 21:49:36.691280 111046 kuberuntime_manager.go:841] Should we migrate?Runningfalse 1月 24 21:50:15 agent1 kubelet[111046]: I0124 21:50:15.691442 111046 kuberuntime_manager.go:841] Should we migrate?Runningfalse 1月 24 21:50:19 agent1 kubelet[111046]: I0124 21:50:19.691534 111046 kuberuntime_manager.go:841] Should we migrate?Runningfalse please help me ,thank you very much!
my virtual machine is Ubuntu 18.04.6,when i run criu check --al,it shows root@agent1:/var/lib/kubelet/migration# criu check --all Warn (criu/cr-check.c:1230): clone3() with set_tid not supported Error (criu/cr-check.c:1272): Time namespaces are not supported Looks good but some kernel features are missing which, depending on your process tree, may cause dump or restore failure. i do not know how to fix it。 @vutuong
sorry,i have solved my all question ! your work is so meaningful for me ,i learn a lot .the problem a lot is caused by containerd , i just replace containerd in /bin/,it works normal.
when i run kubeadm init,i get errors like: root@server:/home/server/Downloads/tmp/zly/podmigration-operator# sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --cri-socket unix:///var/run/containerd/containerd.sock I0124 14:53:12.678735 24843 version.go:252] remote version is much newer: v1.29.1; falling back to: stable-1.19 W0124 14:53:14.622308 24843 configset.go:250] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io] [init] Using Kubernetes version: v1.19.16 [preflight] Running pre-flight checks error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR CRI]: container runtime is not running: output: time="2024-01-24T14:53:14+08:00" level=fatal msg="validate service connection: CRI v1 runtime API is not implemented for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService" , error: exit status 1 [preflight] If you know what you are doing, you can make a check non-fatal with
--ignore-preflight-errors=...
To see the stack trace of this error execute with --v=5 or higher can anyone teach me how to solve this problem?i have no idea,thank you very much!