Open alexfrolov opened 5 months ago
Looking at the error message I would say that is a lxc bug. I have had similar errors in runc/crun.
The expectation from CRIU is that the file-system is setup in such a way that it can restore all mount points. The destination directory for /dev
in the container does not exist and this need to be created by lxc before calling CRIU.
I can't reproduce this error with self-compiled criu v3.19 (I see slightly different problem with non being able to kill cgroupd):
6.2.0-31-generic
22.04.1-Ubuntu
lxc 1:6.0.0+main~20240626-1908-0ubuntu1~jammy
Steps:
lxc-create --name=u2 --template=download -- --dist ubuntu --release xenial --arch amd64
cat | sudo tee -a /var/lib/lxc/u2/config << EOF
# hax for criu
lxc.console.path = none
lxc.tty.max = 0
lxc.cgroup.devices.deny = c 5:1 rwm
EOF
lxc-start u2
lxc-checkpoint -s -n u2 -D /tmp/u2 -v
lxc-checkpoint -r -n u2 -D /tmp/u2 -v
<hangs>
In gdb we see criu waiting for cgroupd to die:
#1 0x00007fbfac0ea3ab in __GI___waitpid (pid=<optimized out>, stat_loc=stat_loc@entry=0x0, options=options@entry=0) at ./posix/waitpid.c:38
#2 0x00005586bc7b9ebc in stop_cgroupd () at criu/cgroup.c:2052
#3 0x00005586bc7c7606 in restore_root_task (init=0x7fbfac66c058) at criu/cr-restore.c:2401
#4 0x00005586bc7c8abd in cr_restore_tasks () at criu/cr-restore.c:2652
#5 0x00005586bc79e75b in main (argc=<optimized out>, argv=0x7ffcc630b908, envp=<optimized out>) at criu/crtools.c:308
#2 0x00005586bc7b9ebc in stop_cgroupd () at criu/cgroup.c:2052
2052 waitpid(cgroupd_pid, NULL, 0);
(gdb) p cgroupd_pid
$1 = 70472
cgroupd stack:
#0 __recvmsg_syscall (flags=0, msg=0x7ffc3cccade0, fd=8) at ../sysdeps/unix/sysv/linux/recvmsg.c:27
#1 __libc_recvmsg (fd=fd@entry=8, msg=msg@entry=0x7ffc3cccade0, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recvmsg.c:41
#2 0x000055caa09649e1 in cgroupd (sk=8) at criu/cgroup.c:1968
#3 0x000055caa09a9b91 in start_unix_cred_daemon (pid=pid@entry=0x55caa0a98118 <cgroupd_pid>, daemon_func=daemon_func@entry=0x55caa0964930 <cgroupd>) at criu/namespaces.c:1489
#4 0x000055caa09673cf in prepare_cgroup_thread_sfd () at criu/cgroup.c:2064
#5 prepare_cgroup () at criu/cgroup.c:2242
#6 0x000055caa0975a9b in cr_restore_tasks () at criu/cr-restore.c:2643
#7 0x000055caa094b75b in main (argc=<optimized out>, argv=0x7ffc3cccc148, envp=<optimized out>) at criu/crtools.c:308
So if I just kill cgroupd (which waits for command on unix socket) everything else works:
:~# kill -9 70472
:~# tail /tmp/u2/restore.log
(405.72011) 70473 was stopped
(405.72014) 70473 was trapped
(405.72014) 70473 (native) is going to execute the syscall 11, required is 11
(405.72017) 70473 was stopped
(405.72017) Run late stage hook from criu master for external devices
(405.72018) restore late stage hook for external plugin failed
(405.72018) Running pre-resume scripts
(405.72018) Restore finished successfully. Tasks resumed.
(405.72018) Writing stats
(405.72023) Running post-resume scripts
:~# lxc-ls -f
NAME STATE AUTOSTART GROUPS IPV4 IPV6 UNPRIVILEGED
u2 RUNNING 0 - - fc11:4514:1919:810:216:3eff:fe39:55a9 false
Fix for cgroupd problem here https://github.com/checkpoint-restore/criu/pull/2427
A friendly reminder that this issue had no activity for 30 days.
Description
CRIU failed to restore LXC container
Steps to reproduce the issue:
or directly running CRIU from shell:
or directly running CRIU from shell:
Describe the results you received:
The restoring process fails with:
Describe the results you expected:
The successful completion of restore operation.
Additional information you deem important (e.g. issue happens only occasionally):
The same issue happens with CRIU build from master branch.
CRIU logs and information:
dump.log restore.log