checkpoint-restore / criu

Checkpoint/Restore tool
criu.org
Other
2.76k stars 559 forks source link

CRIU dump failed in docker container - runc did not terminate successfully #2368

Open uetaaam opened 3 months ago

uetaaam commented 3 months ago

I am trying to create checkpoint in my docker container using command: docker checkpoint create container chekcpoint1 and i am getting this error: Error response from daemon: Cannot checkpoint container container: runc did not terminate successfully: exit status 1: criu failed: type NOTIFY errno 0 path=

CRIU logs and information:

(00.112878) irmap: Refresh stat for /etc/ssl/private (00.112891) irmap: Refilling /etc/ssl/private dir (00.112905) irmap: Scanning /var/spool hint (00.112908) irmap: Refresh stat for /var/spool (00.112914) irmap: Refilling /var/spool dir (00.112927) irmap: Refresh stat for /var/spool/mail (00.112932) irmap: Scanning /var/log hint (00.112934) irmap: Refresh stat for /var/log (00.112939) irmap: Refilling /var/log dir (00.112962) irmap: Refresh stat for /var/log/apt (00.112967) irmap: Refilling /var/log/apt dir (00.112986) irmap: Refresh stat for /var/log/apt/eipp.log.xz (00.112991) irmap: Refresh stat for /var/log/apt/history.log (00.112997) irmap: Refresh stat for /var/log/apt/term.log (00.113002) irmap: Refresh stat for /var/log/btmp (00.113007) irmap: Refresh stat for /var/log/faillog (00.113011) irmap: Refresh stat for /var/log/lastlog (00.113016) irmap: Refresh stat for /var/log/wtmp (00.113021) irmap: Refresh stat for /var/log/dpkg.log (00.113026) irmap: Scanning /usr/share/dbus-1/system-services hint (00.113028) irmap: Refresh stat for /usr/share/dbus-1/system-services (00.113039) Error (criu/irmap.c:104): irmap: Can't stat /usr/share/dbus-1/system-services: No such file or directory (00.113045) irmap: Scanning /var/lib/polkit-1/localauthority hint (00.113048) irmap: Refresh stat for /var/lib/polkit-1/localauthority (00.113052) Error (criu/irmap.c:104): irmap: Can't stat /var/lib/polkit-1/localauthority: No such file or directory (00.113055) irmap: Scanning /usr/share/polkit-1/actions hint (00.113057) irmap: Refresh stat for /usr/share/polkit-1/actions (00.113062) irmap: Refilling /usr/share/polkit-1/actions dir (00.113076) irmap: Refresh stat for /usr/share/polkit-1/actions/org.dpkg.pkexec.update-alternatives.policy (00.113082) irmap: Scanning /lib/udev hint (00.113085) irmap: Refresh stat for /lib/udev (00.113090) irmap: Refilling /lib/udev dir (00.113102) irmap: Refresh stat for /lib/udev/rules.d (00.113108) irmap: Refilling /lib/udev/rules.d dir (00.113120) irmap: Refresh stat for /lib/udev/rules.d/96-e2scrub.rules (00.113126) irmap: Scanning /. hint (00.113128) irmap: Refresh stat for /. (00.113132) irmap: Scanning /no-such-path hint (00.113134) irmap: Refresh stat for /no-such-path (00.113138) Error (criu/irmap.c:104): irmap: Can't stat /no-such-path: No such file or directory (00.113141) Error (criu/fsnotify.c:284): fsnotify: Can't dump that handle (00.113179) ---------------------------------------- (00.113205) Error (criu/cr-dump.c:1669): Dump files (pid: 1850966) failed with -1 (00.113214) Waiting for 1850966 to trap (00.113274) Daemon 1850966 exited trapping (00.113290) Sent msg to daemon 3 0 0 pie: 1: __fetched msg: 3 0 0 pie: 1: 1: new_sp=0x7fe7cfff3848 ip 0x7fe97a181ad8 (00.113459) 1850966 was trapped (00.113487) 1850966 was trapped (00.113493) 1850966 (native) is going to execute the syscall 15, required is 15 (00.113541) 1850966 was stopped (00.113859) net: Unlock network (00.113866) Running network-unlock scripts (00.113869) RPC (00.138992) Unfreezing tasks into 1 (00.139030) Unseizing 1850966 into 1 (00.139579) Error (criu/cr-dump.c:2093): Dumping FAILED.

How can i solve this?

adrianreber commented 3 months ago

I think there was recently a discussion that inotify does not work on overlayfs with default options. So either you have to change the mount options or use a container that does not use inotify

uetaaam commented 3 months ago

I've changed storage driver to vfs and now i am getting another error: (00.975275) 0x7fe3c118f000-0x7fe3c1190000 (4K) prot 0x1 flags 0x2 fdflags 0 st 0x41 off 0 reg fp shmid: 0x5b (00.975279) 0x7fe3c1190000-0x7fe3c11b0000 (128K) prot 0x5 flags 0x2 fdflags 0 st 0x41 off 0x1000 reg fp shmid: 0x5b (00.975283) 0x7fe3c11b0000-0x7fe3c11b8000 (32K) prot 0x1 flags 0x2 fdflags 0 st 0x41 off 0x21000 reg fp shmid: 0x5b (00.975287) 0x7fe3c11b8000-0x7fe3c11b9000 (4K) prot 0x3 flags 0x22 fdflags 0 st 0x201 off 0 reg ap shmid: 0 (00.975291) 0x7fe3c11b9000-0x7fe3c11ba000 (4K) prot 0x1 flags 0x2 fdflags 0 st 0x41 off 0x29000 reg fp shmid: 0x5b (00.975296) 0x7fe3c11ba000-0x7fe3c11bb000 (4K) prot 0x3 flags 0x2 fdflags 0 st 0x41 off 0x2a000 reg fp shmid: 0x5b (00.975301) 0x7fe3c11bb000-0x7fe3c11bc000 (4K) prot 0x3 flags 0x22 fdflags 0 st 0x201 off 0 reg ap shmid: 0 (00.975306) 0x7fff5062f000-0x7fff50650000 (132K) prot 0x3 flags 0x122 fdflags 0 st 0x201 off 0 reg ap shmid: 0 (00.975310) 0x7fff50763000-0x7fff50767000 (16K) prot 0x1 flags 0x22 fdflags 0 st 0x1201 off 0 reg vvar ap shmid: 0 (00.975313) 0x7fff50767000-0x7fff50769000 (8K) prot 0x5 flags 0x22 fdflags 0 st 0x209 off 0 reg vdso ap shmid: 0 (00.975316) 0xffffffffff600000-0xffffffffff601000 (4K) prot 0x4 flags 0x22 fdflags 0 st 0x204 off 0 vsys ap shmid: 0 (00.975320) Obtaining task auvx ... (00.976075) Dumping path for -3 fd via self 19 [/app/AutoDispatcherServer] (00.976122) Dumping path for -3 fd via self 19 [/] (00.976132) Dumping task cwd id 0xcc root id 0xcd (00.976321) mnt: Dumping mountpoints (00.976332) mnt: 549: 46:/ @ ./sys/devices/virtual/powercap (00.984918) mnt: 548: 45:/ @ ./sys/firmware (00.991427) mnt: 547: 44:/ @ ./proc/scsi (00.997778) mnt: 546: 3a:/null @ ./proc/timer_list (00.997794) mnt: 545: 3a:/null @ ./proc/keys (00.997799) mnt: 544: 3a:/null @ ./proc/kcore (00.997803) mnt: 543: 43:/ @ ./proc/acpi (01.003814) mnt: 542: 39:/sysrq-trigger @ ./proc/sysrq-trigger (01.003828) mnt: 541: 39:/sys @ ./proc/sys (01.003832) mnt: 540: 39:/irq @ ./proc/irq (01.003836) mnt: 539: 39:/fs @ ./proc/fs (01.003839) mnt: 538: 39:/bus @ ./proc/bus (01.003842) mnt: 666: 10300001:/var/lib/docker/containers/bde2afe8c23fab10c828a4cbd6db9a20fa96f870be7b3c4437f104fd97a7ae41/hosts @ ./etc/hosts (01.003849) mnt: 665: 10300001:/var/lib/docker/containers/bde2afe8c23fab10c828a4cbd6db9a20fa96f870be7b3c4437f104fd97a7ae41/hostname @ ./etc/hostname (01.003853) mnt: 663: 10300001:/var/lib/docker/containers/bde2afe8c23fab10c828a4cbd6db9a20fa96f870be7b3c4437f104fd97a7ae41/resolv.conf @ ./etc/resolv.conf (01.003858) mnt: 661: 41:/ @ ./dev/shm (01.009696) mnt: 658: 36:/ @ ./dev/mqueue (01.009793) mnt: 657: 19:/ @ ./sys/fs/cgroup (01.009799) mnt: 655: 3d:/ @ ./sys (01.009803) mnt: 652: 3b:/ @ ./dev/pts (01.009806) mnt: 651: 3a:/ @ ./dev (01.009809) mnt: Mount is not fully visible ./dev(651) (01.009864) mnt: mount has children ./dev(651) (01.017304) mnt: 650: 39:/ @ ./proc (01.017319) mnt: 649: 10300001:/var/lib/docker/vfs/dir/ea77f4f8b0d6c5e5dd18a3f8397d4568dbbe0e9fc369bf0784803b4605d10406 @ ./ (01.017352) Dumping file-locks (01.017356) Error (criu/file-lock.c:110): Some file locks are hold by dumping tasks! You can try --file-locks to dump them. (01.017453) net: Unlock network (01.017603) Running network-unlock scripts (01.017609) RPC (01.040770) Unfreezing tasks into 1 (01.040803) Unseizing 390283 into 1 (01.041144) Error (criu/cr-dump.c:2093): Dumping FAILED.

adrianreber commented 3 months ago

Try to enable support to dump file-locks as described in the error message. With Podman you can do something like podman container checkpoint --file-locks. Not sure if this works in Docker.

You could also do echo "file-locks" >> /etc/criu/runc.conf to try to make work in Docker.

adrianreber commented 3 months ago

@uetaaam Can this be closed?

github-actions[bot] commented 2 months ago

A friendly reminder that this issue had no activity for 30 days.