checkpoint-restore / criu

Checkpoint/Restore tool
criu.org
Other
2.77k stars 561 forks source link

Error (criu/eventpoll.c:419): epoll: Can't add event on 0x00000a: Operation not permitted #2296

Open wdjiang opened 8 months ago

wdjiang commented 8 months ago

restore fail error log : (00.000000) Will dump/restore TCP connections (00.000021) Warn (criu/log.c:203): The early log isn't empty (00.000029) Version: 3.12 (gitid 0) (00.000033) Running on host1 Linux 3.10.0-1160.99.1.el7.x86_64 #1 SMP Wed Sep 13 14:19:20 UTC 2023 x86_64 (00.000048) Loaded kdat cache from /run/criu/criu.kdat (00.000058) rlimit: RLIMIT_NOFILE unlimited for self (00.000190) cpu: x86_family 6 x86_vendor_id GenuineIntel x86_model_id Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz (00.000199) cpu: fpu: xfeatures_mask 0x5 xsave_size 832 xsave_size_max 832 xsaves_size 832 (00.000209) cpu: fpu: x87 floating point registers xstate_offsets 0 / 0 xstate_sizes 160 / 160
(00.000212) cpu: fpu: AVX registers xstate_offsets 576 / 576 xstate_sizes 256 / 256
(00.000214) cpu: fpu:1 fxsr:1 xsave:1 xsaveopt:1 xsavec:1 xgetbv1:1 xsaves:1 (00.000322) kernel pid_max=131072 (00.000324) Reading image tree (00.000346) Add mnt ns 5 pid 67180 (00.000350) Add net ns 2 pid 67180 (00.000359) pstree pid_max=67181 (00.000364) Migrating process tree (SID 65444->57447) (00.000369) Will restore in 0 namespaces (00.000371) NS mask to use 0 (00.000426) Collecting 37/54 (flags 2) (00.000452) Collected [usr/bin/bash] ID 0x1 (00.000456) Collected [usr/lib64/ld-2.17.so] ID 0x2 (00.000459) Collected [usr/lib64/gconv/gconv-modules.cache] ID 0x3 (00.000460) Collected [usr/lib64/libtinfo.so.5.9] ID 0x4 (00.000462) Collected [usr/lib64/libdl-2.17.so] ID 0x5 (00.000464) Collected [usr/lib64/libc-2.17.so] ID 0x6 (00.000466) Collected [usr/lib/locale/locale-archive] ID 0x7 (00.000470) Collected [dev/null] ID 0x8 (00.000473) Collected [opt/quickpool-2.0/chkpnt/224/1698912464.224.out] ID 0x9 (00.000484) epoll: Collected eventpoll: id 0x00000a flags 0x02 (00.000489) Collected [opt/quickpool-2.0/log/jmchild.host1.log] ID 0xb (00.000505) Collected [opt/quickpool-2.0/chkpnt/224/1698912464.224] ID 0xe (00.000524) Collected [opt/quickpool-2.0/log] ID 0xf (00.000528) Collected [.] ID 0x10 (00.000530) Collected [opt/test] ID 0x11 (00.000534) - ... done (00.000535) Collecting 43/59 (flags 0) (00.000539) No remap-fpath.img image (00.000540)- ... done (00.000621) cg: Preparing cgroups yard (cgroups restore mode 0x4) (00.001004) cg: Opening .criu.cgyard.9TPTxo as cg yard (00.001030) cg: Making controller dir .criu.cgyard.9TPTxo/memory (memory) (00.002520) cg: Determined cgroup dir memory/quickpool/job.224 already exist (00.002535) cg: Skip restoring properties on cgroup dir memory/quickpool/job.224 (00.002556) cg: Making controller dir .criu.cgyard.9TPTxo/cpuset (cpuset) (00.002872) cg: Determined cgroup dir cpuset/ already exist (00.002879) cg: Skip restoring properties on cgroup dir cpuset/ (00.002892) cg: Making controller dir .criu.cgyard.9TPTxo/net_prio,net_cls (net_prio,net_cls) (00.003070) cg: Determined cgroup dir net_prio,net_cls/ already exist (00.003078) cg: Skip restoring properties on cgroup dir net_prio,net_cls/ (00.003092) cg: Making controller dir .criu.cgyard.9TPTxo/freezer (freezer) (00.003311) cg: Determined cgroup dir freezer/ already exist (00.003317) cg: Skip restoring properties on cgroup dir freezer/ (00.003330) cg: Making controller dir .criu.cgyard.9TPTxo/pids (pids) (00.003529) cg: Determined cgroup dir pids/user.slice already exist (00.003534) cg: Skip restoring properties on cgroup dir pids/user.slice (00.003548) cg: Making controller dir .criu.cgyard.9TPTxo/perf_event (perf_event) (00.003695) cg: Determined cgroup dir perf_event/ already exist (00.003701) cg: Skip restoring properties on cgroup dir perf_event/ (00.003713) cg: Making controller dir .criu.cgyard.9TPTxo/blkio (blkio) (00.003900) cg: Determined cgroup dir blkio/ already exist (00.003910) cg: Skip restoring properties on cgroup dir blkio/ (00.003923) cg: Making controller dir .criu.cgyard.9TPTxo/hugetlb (hugetlb) (00.004073) cg: Determined cgroup dir hugetlb/ already exist (00.004077) cg: Skip restoring properties on cgroup dir hugetlb/ (00.004089) cg: Making controller dir .criu.cgyard.9TPTxo/devices (devices) (00.004803) cg: Determined cgroup dir devices/user.slice already exist (00.004819) cg: Skip restoring properties on cgroup dir devices/user.slice (00.004835) cg: Making controller dir .criu.cgyard.9TPTxo/cpuacct,cpu (cpuacct,cpu) (00.005085) cg: Determined cgroup dir cpuacct,cpu/quickpool/job.224 already exist (00.005096) cg: Skip restoring properties on cgroup dir cpuacct,cpu/quickpool/job.224 (00.005111) cg: Making controller dir .criu.cgyard.9TPTxo/systemd (none,name=systemd) (00.005324) cg: Determined cgroup dir systemd/user.slice/user-0.slice/session-297.scope already exist (00.005330) cg: Skip restoring properties on cgroup dir systemd/user.slice/user-0.slice/session-297.scope (00.005365) Running pre-restore scripts (00.005461) No mountpoints-5.img image (00.005469) mnt: Reading mountpoint images (id 5 pid 67180) (00.005480) No netns-2.img image (00.005575) Forking task with 67180 pid (flags 0x0) (00.005700) PID: real 67180 virt 67180 (00.006402) 67180: cg: Move into 2 (00.006444) 67180: cg: -> blkio///tasks (00.006470) 67180: cg:-> cpuacct,cpu//quickpool/job.224/tasks (00.006495) 67180: cg: -> cpuset///tasks (00.006506) 67180: cg:-> devices//user.slice/tasks (00.006516) 67180: cg: -> freezer///tasks (00.006525) 67180: cg:-> hugetlb///tasks (00.006535) 67180: cg: -> memory//quickpool/job.224/tasks (00.006549) 67180: cg:-> systemd//user.slice/user-0.slice/session-297.scope/tasks (00.006563) 67180: cg: -> net_prio,net_cls///tasks (00.006573) 67180: cg:-> perf_event///tasks (00.006582) 67180: cg: -> pids//user.slice/tasks (00.006593) 67180: Calling restore_sid() for init (00.006614) 67180: Collecting 41/37 (flags 2) (00.006630) 67180: No tty-info.img image (00.006646) 67180:- ... done (00.006649) 67180: Collecting 42/51 (flags 0) (00.006655) 67180: No tty-data.img image (00.006661) 67180: - ... done (00.006663) 67180: Restoring namespaces 67180 flags 0x0 (00.006677) 67180: Preparing info about shared resources (00.006733) 67180: Collecting 45/38 (flags 0) (00.006741) 67180: No filelocks.img image (00.006744) 67180:- ... done (00.006746) 67180: Collecting 39/27 (flags 0) (00.006750) 67180: No pipes-data.img image (00.006753) 67180: - ... done (00.006755) 67180: Collecting 40/27 (flags 0) (00.006759) 67180: No fifo-data.img image (00.006761) 67180:- ... done (00.006763) 67180: Collecting 38/60 (flags 0) (00.006767) 67180: No sk-queues.img image (00.006769) 67180: - ... done (00.006831) 67180: Found 29 VMAs in image (00.006846) 67180: vma 0x400000 0x4de000 (00.006849) 67180: vma 0x6dd000 0x6de000 (00.006852) 67180: vma 0x6de000 0x6e7000 (00.006854) 67180: vma 0x6e7000 0x6ed000 (00.006856) 67180: vma 0xa00000 0xa21000 (00.006858) 67180: vma 0x2b152b3ea000 0x2b152b40c000 (00.006861) 67180: vma 0x2b152b40c000 0x2b152b40d000 (00.006863) 67180: vma 0x2b152b40d000 0x2b152b414000 (00.006866) 67180: vma 0x2b152b422000 0x2b152b426000 (00.006868) 67180: vma 0x2b152b60b000 0x2b152b60c000 (00.006871) 67180: vma 0x2b152b60c000 0x2b152b60d000 (00.006873) 67180: vma 0x2b152b60d000 0x2b152b60e000 (00.006875) 67180: vma 0x2b152b60e000 0x2b152b633000 (00.006878) 67180: vma 0x2b152b633000 0x2b152b833000 (00.006880) 67180: vma 0x2b152b833000 0x2b152b837000 (00.006882) 67180: vma 0x2b152b837000 0x2b152b838000 (00.006885) 67180: vma 0x2b152b838000 0x2b152b83a000 (00.006887) 67180: vma 0x2b152b83a000 0x2b152ba3a000 (00.006889) 67180: vma 0x2b152ba3a000 0x2b152ba3b000 (00.006892) 67180: vma 0x2b152ba3b000 0x2b152ba3c000 (00.006894) 67180: vma 0x2b152ba3c000 0x2b152bc00000 (00.006897) 67180: vma 0x2b152bc00000 0x2b152bdff000 (00.006899) 67180: vma 0x2b152bdff000 0x2b152be03000 (00.006901) 67180: vma 0x2b152be03000 0x2b152be05000 (00.006904) 67180: vma 0x2b152be05000 0x2b152be0a000 (00.006906) 67180: vma 0x2b152be0a000 0x2b153234d000 (00.006909) 67180: vma 0x7ffd2fc1f000 0x7ffd2fc40000 (00.006911) 67180: vma 0x7ffd2fc79000 0x7ffd2fc7b000 (00.006913) 67180: vma 0xffffffffff600000 0xffffffffff601000 (00.006930) 67180: Collect fdinfo pid=67180 fd=0 id=0x8 (00.006939) 67180: Collect fdinfo pid=67180 fd=1 id=0x9 (00.006949) 67180: Collect fdinfo pid=67180 fd=2 id=0x9 (00.006952) 67180: Collect fdinfo pid=67180 fd=3 id=0xa (00.006961) 67180: Collect fdinfo pid=67180 fd=4 id=0xb (00.006966) 67180: Collect fdinfo pid=67180 fd=6 id=0x8 (00.006969) 67180: Collect fdinfo pid=67180 fd=7 id=0x9 (00.006972) 67180: Collect fdinfo pid=67180 fd=10 id=0xc (00.006975) 67180: Collect fdinfo pid=67180 fd=12 id=0xd (00.006978) 67180: Collect fdinfo pid=67180 fd=255 id=0xe (00.007028) 67180: Found 17 VMAs in image (00.007032) 67180: vma 0x400000 0x401000 (00.007035) 67180: vma 0x600000 0x601000 (00.007037) 67180: vma 0x601000 0x602000 (00.007040) 67180: vma 0x2aefa0b25000 0x2aefa0b47000 (00.007042) 67180: vma 0x2aefa0b47000 0x2aefa0b49000 (00.007044) 67180: vma 0x2aefa0b5d000 0x2aefa0b60000 (00.007046) 67180: vma 0x2aefa0d46000 0x2aefa0d47000 (00.007049) 67180: vma 0x2aefa0d47000 0x2aefa0d48000 (00.007051) 67180: vma 0x2aefa0d48000 0x2aefa0d49000 (00.007053) 67180: vma 0x2aefa0d49000 0x2aefa0f0d000 (00.007056) 67180: vma 0x2aefa0f0d000 0x2aefa110c000 (00.007058) 67180: vma 0x2aefa110c000 0x2aefa1110000 (00.007060) 67180: vma 0x2aefa1110000 0x2aefa1112000 (00.007063) 67180: vma 0x2aefa1112000 0x2aefa1117000 (00.007065) 67180: vma 0x7ffeb222f000 0x7ffeb2250000 (00.007067) 67180: vma 0x7ffeb22a8000 0x7ffeb22aa000 (00.007070) 67180: vma 0xffffffffff600000 0xffffffffff601000 (00.007078) 67180: Collect fdinfo pid=67181 fd=0 id=0x8 (00.007082) 67180: Collect fdinfo pid=67181 fd=1 id=0x9 (00.007085) 67180: Collect fdinfo pid=67181 fd=2 id=0x9 (00.007088) 67180: Collect fdinfo pid=67181 fd=3 id=0xa (00.007091) 67180: Collect fdinfo pid=67181 fd=4 id=0xb (00.007093) 67180: Collect fdinfo pid=67181 fd=6 id=0x8 (00.007096) 67180: Collect fdinfo pid=67181 fd=7 id=0x9 (00.007099) 67180: Collect fdinfo pid=67181 fd=10 id=0xc (00.007102) 67180: Collect fdinfo pid=67181 fd=12 id=0xd (00.007171) 67180: skqueue: Preparing SCMs (00.007180) 67180: unix: ghost: Resolving addresses (00.007185) 67180: File descs: (00.007188) 67180:- type 1 ID 0x1 (00.007190) 67180: - type 1 ID 0x2 (00.007192) 67180:- type 1 ID 0x3 (00.007194) 67180: - type 1 ID 0x4 (00.007196) 67180:- type 1 ID 0x5 (00.007199) 67180: - type 1 ID 0x6 (00.007201) 67180:- type 1 ID 0x7 (00.007203) 67180: - type 1 ID 0x8 (00.007205) 67180:- FD 0 pid 67180 (00.007207) 67180: - FD 6 pid 67180 (00.007209) 67180:- FD 0 pid 67181 (00.007211) 67180: - FD 6 pid 67181 (00.007213) 67180:- type 1 ID 0x9 (00.007215) 67180: - FD 1 pid 67180 (00.007217) 67180:- FD 2 pid 67180 (00.007219) 67180: - FD 7 pid 67180 (00.007221) 67180:- FD 1 pid 67181 (00.007223) 67180: - FD 2 pid 67181 (00.007225) 67180:- FD 7 pid 67181 (00.007227) 67180: - type 7 ID 0xa (00.007229) 67180:- FD 3 pid 67180 (00.007231) 67180: - FD 3 pid 67181 (00.007233) 67180:- type 1 ID 0xb (00.007235) 67180: - FD 4 pid 67180 (00.007237) 67180:- FD 4 pid 67181 (00.007238) 67180: - type 4 ID 0xc (00.007240) 67180:- FD 10 pid 67180 (00.007243) 67180: - FD 10 pid 67181 (00.007244) 67180:- type 4 ID 0xd (00.007246) 67180: - FD 12 pid 67180 (00.007248) 67180:- FD 12 pid 67181 (00.007250) 67180: - type 1 ID 0xe (00.007252) 67180:- FD 255 pid 67180 (00.007254) 67180: - type 1 ID 0xf (00.007256) 67180:- type 1 ID 0x10 (00.007258) 67180: `- type 1 ID 0x11 (00.007447) 67180: Opened local page read 1 (parent 0) (00.007466) 67180: Enqueue page-read (00.007469) 67180: Enqueue page-read (00.007472) 67180: Enqueue page-read (00.007474) 67180: Enqueue page-read (00.007476) 67180: Enqueue page-read (00.007478) 67180: Enqueue page-read (00.007480) 67180: Enqueue page-read (00.007483) 67180: Enqueue page-read (00.007485) 67180: Enqueue page-read (00.007487) 67180: Enqueue page-read (00.007489) 67180: Enqueue page-read (00.007491) 67180: Enqueue page-read (00.007493) 67180: Enqueue page-read (00.007495) 67180: Enqueue page-read (00.007503) 67180: Enqueue page-read (00.007506) 67180: Enqueue page-read (00.007508) 67180: Enqueue page-read (00.007510) 67180: Enqueue page-read (00.007512) 67180: Enqueue page-read (00.007514) 67180: Enqueue page-read (00.007516) 67180: Enqueue page-read (00.007518) 67180: Enqueue page-read (00.007520) 67180: Enqueue page-read (00.007522) 67180: Enqueue page-read (00.007531) 67180: nr_restored_pages: 53 (00.007534) 67180: nr_shared_pages: 0 (00.007536) 67180: nr_dropped_pages: 0 (00.007538) 67180: nr_lazy: 0 (00.007552) 67180: Shrunk premap area to 0x7f0e1e4b9000(0) (00.007559) 67180: Restore on-core sigactions for 67180 (00.007657) 67180: Restoring children in alien sessions: (00.007663) 67180: Restoring children in our session: (00.007758) 67180: Forking task with 67181 pid (flags 0x0) (00.008011) 67180: Restoring 67180 to 67180 pgid (00.008037) 67180: will call setpgid, mine pgid is 67185 (00.008022) 67180: PID: real 67181 virt 67181 (00.008061) 67181: cg: Cgroups 2 inherited from parent (00.008159) 67181: Opened local page read 2 (parent 0) (00.008176) 67181: Enqueue page-read (00.008182) 67181: Enqueue page-read (00.008183) 67181: Enqueue page-read (00.008185) 67181: Enqueue page-read (00.008186) 67181: Enqueue page-read (00.008187) 67181: Enqueue page-read (00.008188) 67181: Enqueue page-read (00.008190) 67181: Enqueue page-read (00.008191) 67181: Enqueue page-read (00.008192) 67181: Enqueue page-read (00.008194) 67181: Enqueue page-read (00.008195) 67181: Enqueue page-read (00.008196) 67181: Enqueue page-read (00.008197) 67181: Enqueue page-read (00.008199) 67181: Enqueue page-read (00.008200) 67181: Enqueue page-read (00.008201) 67181: Enqueue page-read (00.008207) 67181: nr_restored_pages: 25 (00.008209) 67181: nr_shared_pages: 0 (00.008210) 67181: nr_dropped_pages: 0 (00.008211) 67181: nr_lazy: 0 (00.008218) 67181: Shrunk premap area to 0x7f0e24f36000(0) (00.008222) 67181: Restore on-core sigactions for 67181 (00.008269) 67181: Restoring children in alien sessions: (00.008273) 67181: Restoring children in our session: (00.008285) 67181: Restoring 67181 to 67180 pgid (00.008295) 67181: will call setpgid, mine pgid is 67185 (00.008403) 67180: Restoring resources (00.008427) 67180: Opening fdinfo-s (00.008553) 67180: Create fd for 0 (00.008561) 67180: Going to dup 0 into 6 (00.008566) 67180: Send fd 0 to /crtools-fd-67181 (00.008584) 67180: Send fd 0 to /crtools-fd-67181 (00.008602) 67180: Create fd for 1 (00.008605) 67180: Going to dup 1 into 2 (00.008608) 67180: Going to dup 1 into 7 (00.008612) 67180: Send fd 1 to /crtools-fd-67181 (00.008617) 67180: Send fd 1 to /crtools-fd-67181 (00.008622) 67180: Send fd 1 to /crtools-fd-67181 (00.008626) 67180: Receive fd for 2 (00.008632) 67180: epoll: Restore eventpoll: id 0x00000a flags 0x02 (00.008645) 67180: Create fd for 3 (00.008648) 67180: Send fd 3 to /crtools-fd-67181 (00.008662) 67180: Create fd for 4 (00.008666) 67180: Send fd 4 to /crtools-fd-67181 (00.008671) 67180: Receive fd for 6 (00.008674) 67180: Receive fd for 7 (00.008691) 67180: inet: Restore: family AF_INET type SOCK_STREAM proto IPPROTO_TCP port 55156 state TCP_ESTABLISHED src_addr 192.168.66.66 (00.008754) 67180: tcp: Restoring TCP connection (00.008768) 67180: tcp: Restoring TCP connection id c ino bd542 (00.008801) 67180: Debug: Setting 1 queue seq to 4221799396 (00.008810) 67180: Debug: Setting 2 queue seq to 3466898985 (00.008827) 67180: Debug: Restoring TCP options (00.008830) 67180: Debug: Will turn SAK on (00.008833) 67180: Debug: Will set snd_wscale to 7 (00.008835) 67180: Debug: Will set rcv_wscale to 7 (00.008837) 67180: Debug: Will turn timestamps on (00.008839) 67180: Debug: Will set mss clamp to 65495 (00.008853) 67180: sockets: 5 restore sndbuf 2626560 rcv buf 1061296 (00.008862) 67180: sockets: restore priority 0 for socket (00.008871) 67180: sockets: restore rcvlowat 1 for socket (00.008874) 67180: sockets: restore mark 0 for socket (00.008889) 67180: Create fd for 10 (00.008892) 67180: Send fd 10 to /crtools-fd-67181 (00.008902) 67180: inet: Restore: family AF_INET type SOCK_STREAM proto IPPROTO_TCP port 44194 state TCP_ESTABLISHED src_addr 192.168.66.66 (00.008928) 67180: tcp: Restoring TCP connection (00.008933) 67180: tcp: Restoring TCP connection id d ino bd545 (00.008953) 67180: Debug: Setting 1 queue seq to 4062037418 (00.008958) 67180: Debug: Setting 2 queue seq to 1685262397 (00.008967) 67180: Debug: Restoring TCP options (00.008970) 67180: Debug: Will turn SAK on (00.008972) 67180: Debug: Will set snd_wscale to 7 (00.008974) 67180: Debug: Will set rcv_wscale to 7 (00.008977) 67180: Debug: Will turn timestamps on (00.008979) 67180: Debug: Will set mss clamp to 65495 (00.008991) 67180: sockets: 5 restore sndbuf 2626560 rcv buf 1061296 (00.008997) 67180: sockets: restore priority 0 for socket (00.009001) 67180: sockets: restore rcvlowat 1 for socket (00.009004) 67180: sockets: restore mark 0 for socket (00.009012) 67180: Create fd for 12 (00.009015) 67180: Send fd 12 to /crtools-fd-67181 (00.009033) 67180: Create fd for 255 (00.009037) 67180: epoll: Restore eventpoll-tfd: id 0x00000a tfd 7 events 0x80000019 data 0x00000000000007 (00.009082) 67180: Error (criu/eventpoll.c:419): epoll: Can't add event on 0x00000a: Operation not permitted (00.009094) 67180: Error (criu/files.c:1196): Unable to open fd=3 id=0xa (00.030360) Error (criu/cr-restore.c:1417): 67180 killed by signal 9: Killed (00.030445) Error (criu/cr-restore.c:2293): Restoring FAILED.

Please check this Thank you!

adrianreber commented 8 months ago

You are using CRIU 3.12 on RHEL 7. CRIU was never supported on RHEL 7, just a tech preview. Please try something newer. Newer OS and newer version of CRIU.

wdjiang commented 8 months ago

I followed the same steps to test using CRIU3.16, and the results were the same, [root@host1 chkpnt]# cat /etc/redhat-release CentOS Linux release 7.9.2009 (Core) [root@host1 chkpnt]# criu -V Version: 3.16 error log: [Uploading chkpntlog.txt…]()

wdjiang commented 8 months ago

(00.000000) Will dump/restore TCP connections (00.000026) Version: 3.16 (gitid 0) (00.000036) Running on host1 Linux 3.10.0-1160.99.1.el7.x86_64 #1 SMP Wed Sep 13 14:19:20 UTC 2023 x86_64 (00.000052) Loaded kdat cache from /run/criu/criu.kdat (00.000060) rlimit: RLIMIT_NOFILE unlimited for self (00.000215) cpu: x86_family 6 x86_vendor_id GenuineIntel x86_model_id Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz (00.000226) cpu: fpu: xfeatures_mask 0x5 xsave_size 832 xsave_size_max 832 xsaves_size 832 (00.000236) cpu: fpu: x87 floating point registers xstate_offsets 0 / 0 xstate_sizes 160 / 160
(00.000239) cpu: fpu: AVX registers xstate_offsets 576 / 576 xstate_sizes 256 / 256
(00.000241) cpu: fpu:1 fxsr:1 xsave:1 xsaveopt:1 xsavec:1 xgetbv1:1 xsaves:1 (00.000333) kernel pid_max=131072 (00.000337) Reading image tree (00.000359) Add mnt ns 6 pid 98657 (00.000363) Add net ns 2 pid 98657 (00.000364) Add pid ns 1 pid 98657 (00.000367) pstree pid_max=98657 (00.000371) Migrating process tree (SID 65444->98272) (00.000375) Migrating process tree (GID 98656->98669) (00.000377) Will restore in 0 namespaces (00.000378) NS mask to use 0 (00.000414) Collecting 51/56 (flags 3) (00.000418) No memfd.img image (00.000421) - ... done (00.000426) Collecting 40/54 (flags 2) (00.000449) Collected [opt/test] ID 0x1 (00.000454) Collected [usr/lib64/ld-2.17.so] ID 0x2 (00.000456) Collected [usr/lib64/libc-2.17.so] ID 0x3 (00.000459) Collected [dev/null] ID 0x4 (00.000461) Collected [opt/quickpool-2.0/chkpnt/226/1698972950.226.out] ID 0x5 (00.000469) epoll: Collected eventpoll: id 0x000006 flags 0x02 (00.000477) Collected [opt/quickpool-2.0/log/jmchild.host1.log] ID 0x7 (00.000492) Collected [opt/quickpool-2.0/log] ID 0xa (00.000494) Collected [.] ID 0xb (00.000497)- ... done (00.000500) Collecting 46/68 (flags 0) (00.000503) No remap-fpath.img image (00.000504) - ... done (00.000557) No apparmor.img image (00.000616) cg: Preparing cgroups yard (cgroups restore mode 0x4) (00.000825) cg: Opening .criu.cgyard.cgWNC2 as cg yard (00.000841) cg: Making controller dir .criu.cgyard.cgWNC2/memory (memory) (00.001103) cg: Determined cgroup dir memory/quickpool/job.226 already exist (00.001111) cg: Skip restoring properties on cgroup dir memory/quickpool/job.226 (00.001127) cg: Making controller dir .criu.cgyard.cgWNC2/cpuset (cpuset) (00.001387) cg: Determined cgroup dir cpuset/ already exist (00.001393) cg: Skip restoring properties on cgroup dir cpuset/ (00.001406) cg: Making controller dir .criu.cgyard.cgWNC2/net_prio,net_cls (net_prio,net_cls) (00.001625) cg: Determined cgroup dir net_prio,net_cls/ already exist (00.001630) cg: Skip restoring properties on cgroup dir net_prio,net_cls/ (00.001642) cg: Making controller dir .criu.cgyard.cgWNC2/freezer (freezer) (00.001851) cg: Determined cgroup dir freezer/ already exist (00.001857) cg: Skip restoring properties on cgroup dir freezer/ (00.002380) cg: Making controller dir .criu.cgyard.cgWNC2/pids (pids) (00.002564) cg: Determined cgroup dir pids/user.slice already exist (00.002570) cg: Skip restoring properties on cgroup dir pids/user.slice (00.002583) cg: Making controller dir .criu.cgyard.cgWNC2/perf_event (perf_event) (00.002703) cg: Determined cgroup dir perf_event/ already exist (00.002707) cg: Skip restoring properties on cgroup dir perf_event/ (00.002719) cg: Making controller dir .criu.cgyard.cgWNC2/blkio (blkio) (00.002923) cg: Determined cgroup dir blkio/ already exist (00.002928) cg: Skip restoring properties on cgroup dir blkio/ (00.002942) cg: Making controller dir .criu.cgyard.cgWNC2/hugetlb (hugetlb) (00.003063) cg: Determined cgroup dir hugetlb/ already exist (00.003068) cg: Skip restoring properties on cgroup dir hugetlb/ (00.003079) cg: Making controller dir .criu.cgyard.cgWNC2/devices (devices) (00.003287) cg: Determined cgroup dir devices/user.slice already exist (00.003297) cg: Skip restoring properties on cgroup dir devices/user.slice (00.003311) cg: Making controller dir .criu.cgyard.cgWNC2/cpuacct,cpu (cpuacct,cpu) (00.003561) cg: Determined cgroup dir cpuacct,cpu/quickpool/job.226 already exist (00.003575) cg: Skip restoring properties on cgroup dir cpuacct,cpu/quickpool/job.226 (00.003590) cg: Making controller dir .criu.cgyard.cgWNC2/systemd (none,name=systemd) (00.003853) cg: Determined cgroup dir systemd/user.slice/user-0.slice/session-297.scope already exist (00.003859) cg: Skip restoring properties on cgroup dir systemd/user.slice/user-0.slice/session-297.scope (00.003909) Running pre-restore scripts (00.003992) No pidns-1.img image (00.004140) Forking task with 98657 pid (flags 0x0) (00.004289) PID: real 98657 virt 98657 (00.005067) 98657: cg: Move into 2 (00.005093) 98657: cg:-> blkio///cgroup.procs (00.005221) 98657: cg: -> cpuacct,cpu//quickpool/job.226/cgroup.procs (00.005263) 98657: cg:-> cpuset///cgroup.procs (00.005274) 98657: cg: -> devices//user.slice/cgroup.procs (00.005285) 98657: cg:-> freezer///cgroup.procs (00.005293) 98657: cg: -> hugetlb///cgroup.procs (00.005300) 98657: cg:-> memory//quickpool/job.226/cgroup.procs (00.005323) 98657: cg: -> systemd//user.slice/user-0.slice/session-297.scope/cgroup.procs (00.005338) 98657: cg:-> net_prio,net_cls///cgroup.procs (00.005347) 98657: cg: -> perf_event///cgroup.procs (00.005357) 98657: cg:-> pids//user.slice/cgroup.procs (00.005368) 98657: Calling restore_sid() for init (00.005390) 98657: Collecting 44/37 (flags 2) (00.005406) 98657: No tty-info.img image (00.005422) 98657: - ... done (00.005424) 98657: Collecting 45/51 (flags 0) (00.005429) 98657: No tty-data.img image (00.005433) 98657:- ... done (00.005435) 98657: Restoring namespaces 98657 flags 0x0 (00.005447) 98657: Preparing info about shared resources (00.005518) 98657: Collecting 48/38 (flags 0) (00.005524) 98657: No filelocks.img image (00.005527) 98657: - ... done (00.005529) 98657: Collecting 42/27 (flags 0) (00.005533) 98657: No pipes-data.img image (00.005535) 98657:- ... done (00.005537) 98657: Collecting 43/27 (flags 0) (00.005540) 98657: No fifo-data.img image (00.005542) 98657: - ... done (00.005544) 98657: Collecting 41/69 (flags 0) (00.005548) 98657: No sk-queues.img image (00.005550) 98657:- ... done (00.005597) 98657: Found 17 VMAs in image (00.005614) 98657: vma 0x400000 0x401000 (00.005618) 98657: vma 0x600000 0x601000 (00.005620) 98657: vma 0x601000 0x602000 (00.005622) 98657: vma 0x2b683357b000 0x2b683359d000 (00.005624) 98657: vma 0x2b683359d000 0x2b683359f000 (00.005626) 98657: vma 0x2b68335b3000 0x2b68335b6000 (00.005628) 98657: vma 0x2b683379c000 0x2b683379d000 (00.005630) 98657: vma 0x2b683379d000 0x2b683379e000 (00.005632) 98657: vma 0x2b683379e000 0x2b683379f000 (00.005634) 98657: vma 0x2b683379f000 0x2b6833963000 (00.005636) 98657: vma 0x2b6833963000 0x2b6833b62000 (00.005638) 98657: vma 0x2b6833b62000 0x2b6833b66000 (00.005640) 98657: vma 0x2b6833b66000 0x2b6833b68000 (00.005642) 98657: vma 0x2b6833b68000 0x2b6833b6d000 (00.005644) 98657: vma 0x7ffea2ff7000 0x7ffea3018000 (00.005646) 98657: vma 0x7ffea31bf000 0x7ffea31c1000 (00.005648) 98657: vma 0xffffffffff600000 0xffffffffff601000 (00.005664) 98657: Collect fdinfo pid=98657 fd=0 id=0x4 (00.005672) 98657: Collect fdinfo pid=98657 fd=1 id=0x5 (00.005675) 98657: Collect fdinfo pid=98657 fd=2 id=0x5 (00.005678) 98657: Collect fdinfo pid=98657 fd=3 id=0x6 (00.005680) 98657: Collect fdinfo pid=98657 fd=4 id=0x7 (00.005682) 98657: Collect fdinfo pid=98657 fd=6 id=0x4 (00.005684) 98657: Collect fdinfo pid=98657 fd=7 id=0x5 (00.005687) 98657: Collect fdinfo pid=98657 fd=10 id=0x8 (00.005689) 98657: Collect fdinfo pid=98657 fd=12 id=0x9 (00.005773) 98657: skqueue: Preparing SCMs (00.005784) 98657: unix: ghost: Resolving addresses (00.005789) 98657: File descs: (00.005791) 98657: - type 1 ID 0x1 (00.005792) 98657:- type 1 ID 0x2 (00.005794) 98657: - type 1 ID 0x3 (00.005795) 98657:- type 1 ID 0x4 (00.005797) 98657: - FD 0 pid 98657 (00.005799) 98657:- FD 6 pid 98657 (00.005801) 98657: - type 1 ID 0x5 (00.005811) 98657:- FD 1 pid 98657 (00.005814) 98657: - FD 2 pid 98657 (00.005816) 98657:- FD 7 pid 98657 (00.005818) 98657: - type 7 ID 0x6 (00.005819) 98657:- FD 3 pid 98657 (00.005821) 98657: - type 1 ID 0x7 (00.005823) 98657:- FD 4 pid 98657 (00.005824) 98657: - type 4 ID 0x8 (00.005826) 98657:- FD 10 pid 98657 (00.005828) 98657: - type 4 ID 0x9 (00.005830) 98657:- FD 12 pid 98657 (00.005832) 98657: - type 1 ID 0xa (00.005834) 98657:- type 1 ID 0xb (00.005994) 98657: No parent images directory provided (00.006023) 98657: Opened local page read 1 (parent 0) (00.006036) 98657: Enqueue page-read (00.006046) 98657: Enqueue page-read (00.006048) 98657: Enqueue page-read (00.006050) 98657: Enqueue page-read (00.006051) 98657: Enqueue page-read (00.006053) 98657: Enqueue page-read (00.006055) 98657: Enqueue page-read (00.006056) 98657: Enqueue page-read (00.006058) 98657: Enqueue page-read (00.006059) 98657: Enqueue page-read (00.006061) 98657: Enqueue page-read (00.006063) 98657: Enqueue page-read (00.006064) 98657: Enqueue page-read (00.006066) 98657: Enqueue page-read (00.006073) 98657: nr_restored_pages: 24 (00.006076) 98657: nr_shared_pages: 0 (00.006077) 98657: nr_dropped_pages: 0 (00.006079) 98657: nr_lazy: 0 (00.006090) 98657: Shrunk premap area to 0x7f18358f9000(0) (00.006095) 98657: Restore on-core sigactions for 98657 (00.006206) 98657: Restoring children in alien sessions: (00.006211) 98657: Restoring children in our session: (00.006227) 98657: Restoring 98657 to 98669 pgid (00.006238) 98657: Restoring resources (00.006246) 98657: Opening fdinfo-s (00.006263) 98657: Create fd for 0 (00.006265) 98657: Going to dup 0 into 6 (00.006275) 98657: Create fd for 1 (00.006276) 98657: Going to dup 1 into 2 (00.006278) 98657: Going to dup 1 into 7 (00.006279) 98657: Receive fd for 2 (00.006282) 98657: epoll: Restore eventpoll: id 0x000006 flags 0x02 (00.006290) 98657: Create fd for 3 (00.006294) 98657: Create fd for 4 (00.006296) 98657: Receive fd for 6 (00.006297) 98657: Receive fd for 7 (00.006309) 98657: inet: Restore: family AF_INET type SOCK_STREAM proto IPPROTO_TCP port 47198 state TCP_ESTABLISHED src_addr 192.168.66.66 (00.006341) 98657: tcp: Restoring TCP connection (00.006350) 98657: tcp: Restoring TCP connection id 8 ino cd407 (00.006374) 98657: Debug: Setting 1 queue seq to 439636293 (00.006379) 98657: Debug: Setting 2 queue seq to 1854530543 (00.006392) 98657: Debug: Restoring TCP options (00.006393) 98657: Debug: Will turn SAK on (00.006394) 98657: Debug: Will set snd_wscale to 7 (00.006395) 98657: Debug: Will set rcv_wscale to 7 (00.006396) 98657: Debug: Will turn timestamps on (00.006397) 98657: Debug: Will set mss clamp to 65495 (00.006406) 98657: sockets: 5 restore sndbuf 2626560 rcv buf 1061296 (00.006409) 98657: sockets: restore priority 0 for socket (00.006410) 98657: sockets: restore rcvlowat 1 for socket (00.006412) 98657: sockets: restore mark 0 for socket (00.006413) 98657: sockets: set so_linger for socket (00.006415) 98657: sockets: set keepcnt for socket (00.006417) 98657: sockets: set keepidle for socket (00.006418) 98657: sockets: set keepintvl for socket (00.006421) 98657: Create fd for 10 (00.006423) 98657: inet: Restore: family AF_INET type SOCK_STREAM proto IPPROTO_TCP port 47328 state TCP_ESTABLISHED src_addr 192.168.66.66 (00.006437) 98657: tcp: Restoring TCP connection (00.006439) 98657: tcp: Restoring TCP connection id 9 ino cd40a (00.006448) 98657: Debug: Setting 1 queue seq to 3296282753 (00.006451) 98657: Debug: Setting 2 queue seq to 992184728 (00.006455) 98657: Debug: Restoring TCP options (00.006456) 98657: Debug: Will turn SAK on (00.006458) 98657: Debug: Will set snd_wscale to 7 (00.006459) 98657: Debug: Will set rcv_wscale to 7 (00.006461) 98657: Debug: Will turn timestamps on (00.006462) 98657: Debug: Will set mss clamp to 65495 (00.006469) 98657: sockets: 5 restore sndbuf 2626560 rcv buf 1061296 (00.006499) 98657: sockets: restore priority 0 for socket (00.006501) 98657: sockets: restore rcvlowat 1 for socket (00.006504) 98657: sockets: restore mark 0 for socket (00.006506) 98657: sockets: set keepcnt for socket (00.006510) 98657: sockets: set keepidle for socket (00.006513) 98657: sockets: set keepintvl for socket (00.006517) 98657: Create fd for 12 (00.006519) 98657: epoll: Restore eventpoll-tfd: id 0x000006 tfd 7 events 0x80000019 data 0x00000000000007 (00.006550) 98657: Error (criu/eventpoll.c:417): epoll: Can't add event on 0x000006: Operation not permitted (00.006557) 98657: Error (criu/files.c:1200): Unable to open fd=3 id=0x6 (00.006759) Error (criu/cr-restore.c:2447): Restoring FAILED. (00.034538) Error (criu/cr-restore.c:1480): 98657 killed by signal 9: Killed

adrianreber commented 8 months ago

As mentioned before, the CentOS 7 kernel is really old and might be missing some features CRIU needs.

wdjiang commented 8 months ago

Can I use CENTOS 8 to debug this feature?Which kernel version fully supports CRIU functionality?Can we find any suspicion from this log?

adrianreber commented 8 months ago

Can I use CENTOS 8 to debug this feature?

If you try something different please directly try CentOS 9.

Which kernel version fully supports CRIU functionality?

Difficult questions to answer. It depends on the feature you need. It could also be you encounter a situation where CRIU does not support something you need or where CRIU has a bug.

Can we find any suspicion from this log?

Sure:

(00.006550) 98657: Error (criu/eventpoll.c:417): epoll: Can't add event on 0x000006: Operation not permitted

That is your problem. I am not directly familiar with this CRIU feature, but if you prefer CentOS/RHEL distributions then using the latest release of 9 would be a good starting point. If you then still see an error then it is either not implemented or a bug. The problem with CentOS 7 is, that we know that some features are missing from that kernel so it does not make much sense to debug on that kernel.

wdjiang commented 8 months ago

Thanks for your reply, I will use CENTOS 8 &9 to debug CRIU。What is the possible cause of this error?

wdjiang commented 8 months ago

log.txt When I was debugging using higher versions of CRIU (3.18) and CENTOS (8), I encountered the problem that I could not dump process

Snorch commented 8 months ago

log.txt When I was debugging using higher versions of CRIU (3.18) and CENTOS (8), I encountered the problem that I could not dump process

You still have a problem with eventpoll:

(00.010021) Dumping opened files (pid: 4265)
(00.010023) ----------------------------------------
...
(00.010537) 4265 fdinfo 3: pos:                0 flags:                2/0
(00.010569) epoll: find_tfd_bsearch: pid 4265 efd 3 tfd 6 toff 0
(00.010573) epoll: find_tfd_bsearch: no match pid 4265 efd 3 tfd 6 toff 0
(00.010575) epoll: Queueing eventpoll: id 0x00000a flags 0x02
...
(00.010628) 4265 fdinfo 6: pos:                0 flags:           102001/0
...
(00.034799) ----------------------------------------
(00.034814) epoll: kid_lookup_epoll: no match pid 4265 efd 3 tfd 6 toff 0
(00.034826) Error (criu/cr-dump.c:1674): Dump eventpoll (pid: 4265) failed with -1

Epoll support is sadly not full in CRIU, in your case CRIU can't find target fd of epoll in current process fds, having target fd in another process is not supported (see also https://github.com/checkpoint-restore/criu/issues/1366) having it closed is likely also a problem for current algorithm. I don't see any easy way to fix it except someone diving in current epoll algorithm and supporting all the unsupported cases.

Possible W/A: kill the process (4265) which has an unsupported epoll before migration.

wdjiang commented 8 months ago

Thank you very much for your reply.

github-actions[bot] commented 7 months ago

A friendly reminder that this issue had no activity for 30 days.