checkpoint-restore / criu

Checkpoint/Restore tool
criu.org
Other
2.79k stars 565 forks source link

Error while dumping calico-node container "mnt: FS mnt ./sys/fs/bpf dev 0x75 root / unsupported id 691" #2264

Open tu1h opened 10 months ago

tu1h commented 10 months ago

Description I ran a kubernetes on debian 12, and tried to dump checkpont for pod calico-node by using Checkpoint API. But I got an error as shown below:

(00.111810) mnt: autodetected external mount /sys/ for ./sys(811)
(00.111819) mnt: Inspecting sharing on 693 shared_id 0 master_id 0 (@./run/calico/cgroup)
(00.111820) mnt: Inspecting sharing on 691 shared_id 0 master_id 0 (@./sys/fs/bpf)
(00.111822) mnt: Inspecting sharing on 828 shared_id 0 master_id 0 (@./run/secrets/kubernetes.io/serviceaccount)
(00.111824) mnt: Inspecting sharing on 827 shared_id 0 master_id 0 (@./var/log/calico/cni)
(00.111826) mnt: Inspecting sharing on 826 shared_id 0 master_id 0 (@./host/etc/cni/net.d)
(00.111827) mnt: Inspecting sharing on 825 shared_id 0 master_id 0 (@./run/nodeagent)
(00.111829) mnt: Inspecting sharing on 824 shared_id 0 master_id 0 (@./var/lib/calico)
(00.111830) mnt: Inspecting sharing on 823 shared_id 0 master_id 0 (@./run/calico)
(00.111832) mnt: Inspecting sharing on 822 shared_id 0 master_id 0 (@./dev/termination-log)
(00.111833) mnt: Inspecting sharing on 821 shared_id 0 master_id 0 (@./etc/hosts)
(00.111835) mnt: Inspecting sharing on 820 shared_id 0 master_id 0 (@./run/xtables.lock)
(00.111836) mnt: Inspecting sharing on 819 shared_id 0 master_id 0 (@./usr/lib/modules)
(00.111838) mnt: Inspecting sharing on 818 shared_id 0 master_id 5 (@./run/.containerenv)
(00.111841) mnt: Detected external slavery for 818 via 818
(00.111842) mnt: Inspecting sharing on 815 shared_id 0 master_id 5 (@./etc/hostname)
(00.111844) mnt: Detected external slavery for 815 via 815
(00.111846) mnt: Inspecting sharing on 814 shared_id 0 master_id 5 (@./etc/resolv.conf)
(00.111847) mnt: Detected external slavery for 814 via 814
(00.111849) mnt: Inspecting sharing on 813 shared_id 0 master_id 231 (@./dev/shm)
(00.111851) mnt: Detected external slavery for 813 via 813
(00.111852) mnt: Inspecting sharing on 812 shared_id 0 master_id 0 (@./sys/fs/cgroup)
(00.111854) mnt: Inspecting sharing on 811 shared_id 0 master_id 0 (@./sys)
(00.111856) mnt: Inspecting sharing on 810 shared_id 0 master_id 0 (@./dev/mqueue)
(00.111857) mnt: Inspecting sharing on 809 shared_id 0 master_id 0 (@./dev/pts)
(00.111858) mnt: Inspecting sharing on 808 shared_id 0 master_id 0 (@./dev)
(00.111860) mnt: Inspecting sharing on 807 shared_id 0 master_id 0 (@./proc)
(00.111862) mnt: Inspecting sharing on 806 shared_id 0 master_id 0 (@./)
(00.111864) Error (criu/mount.c:724): mnt: FS mnt ./sys/fs/bpf dev 0x75 root / unsupported id 691
(00.111882) Unlock network
(00.111915) Unfreezing tasks into 1
(00.111918)     Unseizing 1942543 into 1

System Info:

$ cat /etc/debian_version
12.1
$ uname -r
6.1.0-9-amd64
$ kubectl get no -owide
NAME     STATUS   ROLES           AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                         KERNEL-VERSION   CONTAINER-RUNTIME
debian   Ready    control-plane   86m   v1.28.2   172.30.41.4   <none>        Debian GNU/Linux 12 (bookworm)   6.1.0-9-amd64    cri-o://1.28.1

CRIU logs and information:

CRIU full dump/restore logs:

[dump.log](https://github.com/checkpoint-restore/criu/files/12657343/dump.log)

Output of criu --version:

``` Version: 3.17.1 ```

Output of criu check --all:

``` Looks good. ```

Additional environment details:

adrianreber commented 10 months ago

CRIU probably cannot handle mounts of the type bpf and that why it fails. When I run containers in Kubernetes/CRI-O I do not have /sys/fs/bpf mounted in the container. I have it mounted on my host. Not sure why your container mounts /sys/fs/bpf.

tu1h commented 10 months ago

CRIU probably cannot handle mounts of the type bpf and that why it fails. When I run containers in Kubernetes/CRI-O I do not have /sys/fs/bpf mounted in the container. I have it mounted on my host. Not sure why your container mounts /sys/fs/bpf.

Thanks for your reply. And that container(calico-node) is for network provision in kubernetes scope, and it has some ebpf functionalities.

adrianreber commented 10 months ago

And that container(calico-node) is for network provision in kubernetes scope, and it has some ebpf functionalities.

CRIU cannot handle that. I am not aware of anyone working on it. But today it cannot work.

github-actions[bot] commented 9 months ago

A friendly reminder that this issue had no activity for 30 days.