checkpoint-restore / criu

Checkpoint/Restore tool
criu.org
Other
2.91k stars 583 forks source link

Question: Any chance to restore in an unprivileged container? #2322

Closed wata727 closed 9 months ago

wata727 commented 9 months ago

Description

I'm looking into CRIU as an option to speed up a slow-starting process that is run as a container. After looking into some issues, I understand that --privileged is required to invoke criu restore inside a container, but this constraint cannot apply to widely used serverless solutions such as AWS Fargate, Google Cloud Run, etc.

If you are restoring a trivial process that does not require restoring PID or network configurations, is it possible to restore it inside an unprivileged container?

As far as I investigated, it seemed impossible to overcome this constraint because CAP_SYS_RESOURCE is ultimately required for prctl(2) + PR_SET_MM below, but is this understanding correct? https://github.com/checkpoint-restore/criu/blob/v3.19/criu/pie/restorer.c#L1922

If this understanding is correct, are there any plans to change CRIU or the Linux kernel in the future to allow restores from inside an unprivileged container?

adrianreber commented 9 months ago

To run CRIU as non root we introduced CAP_CHECKPOINT_RESTORE. That is the minimum you need for really simple processes.

wata727 commented 9 months ago

Thank you for pointing this out. From the following commits I understand that if you have CAP_CHECKPOINT_RESTORE you can call prctl_set_mm_map without CAP_SYS_RESOURCE. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ebd6de6812387a2db9a52842cfbe004da1dd3be8 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=74858abbb1032222f922487fd1a24513bbed80f9

In Docker, it seems that you can restore from a container started with the following command:

docker run --rm --cap-add CHECKPOINT_RESTORE --security-opt systempaths=unconfined --security-opt apparmor=unconfined -it ubuntu:22.04 /bin/bash

https://github.com/checkpoint-restore/criu/pull/2311 seems to be necessary because restoring the tty fails with CRIU v3.19.

In the environment I tried, the restore failed, but I think this is probably another problem. Just in case, I'd like to share the debug log.

CRIU full dump/restore logs:

``` # ../criu/criu restore --unprivileged -j -v (00.000000) CRIU run id = 0xf000016d0000184e (00.000012) Version: 3.18 (gitid v3.18-190-g50aa6da65) (00.000018) Running on 6ced68a4c2b0 Linux 6.2.0-1018-azure #18~22.04.1-Ubuntu SMP Tue Nov 21 19:25:02 UTC 2023 x86_64 (00.000027) Warn (criu/kerndat.c:1153): $XDG_RUNTIME_DIR not set. Cannot find location for kerndat file (00.000064) sockets: Probing sock diag modules (00.000143) sockets: Done probing (00.000192) Pagemap provides flags only (00.000253) Found anon-shmem device at 1 (00.000286) Hugetlb size 2 Mb is supported but cannot get dev's number (00.000311) Hugetlb size 1024 Mb is supported but cannot get dev's number (00.000328) Reset 6222's dirty tracking (00.000421) ... done (00.000470) Dirty track supported on kernel (00.000483) Zero page detection failed, optimization turns off. (00.007829) Warn (criu/kerndat.c:1153): $XDG_RUNTIME_DIR not set. Cannot find location for kerndat file (00.007970) Reading image tree (00.008025) Add mnt ns 6 pid 3570 (00.008043) Add net ns 2 pid 3570 (00.008059) Add pid ns 1 pid 3570 (00.008081) Will restore in 0 namespaces (00.008120) sockets: Unable to set SO_SNDBUFFORCE/SO_RCVBUFFORCE, falling back to SO_SNDBUF/SO_RCVBUF (00.008149) Collecting 51/56 (flags 3) (00.008171) No memfd.img image (00.008194) Collecting 40/54 (flags 2) (00.008225) Collected [usr/bin/ruby3.0] ID 0x1 (00.008249) Collected [usr/lib/x86_64-linux-gnu/ruby/3.0.0/monitor.so] ID 0x2 (00.008268) Collected [usr/lib/x86_64-linux-gnu/ruby/3.0.0/enc/trans/transdb.so] ID 0x3 (00.008284) Collected [usr/lib/x86_64-linux-gnu/ruby/3.0.0/enc/encdb.so] ID 0x4 (00.008303) Collected [usr/lib/x86_64-linux-gnu/libm.so.6] ID 0x5 (00.008320) Collected [usr/lib/x86_64-linux-gnu/libcrypt.so.1.1.0] ID 0x6 (00.008345) Collected [usr/lib/x86_64-linux-gnu/libgmp.so.10.4.1] ID 0x7 (00.008362) Collected [usr/lib/x86_64-linux-gnu/libz.so.1.2.11] ID 0x8 (00.008382) Collected [usr/lib/x86_64-linux-gnu/libc.so.6] ID 0x9 (00.008395) Collected [usr/lib/x86_64-linux-gnu/libruby-3.0.so.3.0.2] ID 0xa (00.008405) Collected [usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2] ID 0xb (00.008414) Collected [dev/pts/0] ID 0xd (00.008433) eventfd: Collected : id 0x00000e flags 0x802 counter 0000000000000000 (00.008452) eventfd: Collected : id 0x00000f flags 0x802 counter 0000000000000000 (00.008472) Collected [tmp/criu/work2] ID 0x10 (00.008482) Collected [.] ID 0x11 (00.008494) Collecting 46/68 (flags 0) (00.008507) No remap-fpath.img image (00.008538) No apparmor.img image (00.008552) No cgroup.img image (00.008574) No pidns-1.img image (00.008656) Forking task with 3570 pid (flags 0x0) (00.009081) 3570: cg: Cgroup namespace inherited from parent (00.009096) 3570: cg: Cgroups 1 inherited from parent (00.009116) 3570: Calling restore_sid() for init (00.009143) 3570: Collecting 44/37 (flags 2) (00.009216) 3570: tty: Collected tty ID 0xc (pts) (00.009251) 3570: Collecting 45/51 (flags 0) (00.009271) 3570: No tty-data.img image (00.009769) 3570: Restoring namespaces 3570 flags 0x0 (00.009811) 3570: Preparing info about shared resources (00.009846) 3570: Collecting 48/38 (flags 0) (00.009859) 3570: No filelocks.img image (00.009872) 3570: Collecting 42/27 (flags 0) (00.009885) 3570: No pipes-data.img image (00.009892) 3570: Collecting 43/27 (flags 0) (00.009902) 3570: No fifo-data.img image (00.009909) 3570: Collecting 41/69 (flags 0) (00.009918) 3570: No sk-queues.img image (00.010006) 3570: vma 0x56058e867000 0x56058e868000 (00.010018) 3570: vma 0x56058e868000 0x56058e869000 (00.010023) 3570: vma 0x56058e869000 0x56058e86a000 (00.010028) 3570: vma 0x56058e86a000 0x56058e86b000 (00.010036) 3570: vma 0x56058e86b000 0x56058e86c000 (00.010040) 3570: vma 0x56058ec57000 0x56058efb6000 (00.010047) 3570: vma 0x7fc9427fc000 0x7fc9427fd000 (00.010056) 3570: vma 0x7fc9427fd000 0x7fc9427fe000 (00.010065) 3570: vma 0x7fc9427fe000 0x7fc9427ff000 (00.010073) 3570: vma 0x7fc9427ff000 0x7fc942800000 (00.010082) 3570: vma 0x7fc942800000 0x7fc942801000 (00.010095) 3570: vma 0x7fc942801000 0x7fc942802000 (00.010104) 3570: vma 0x7fc942802000 0x7fc942803000 (00.010108) 3570: vma 0x7fc942803000 0x7fc942804000 (00.010111) 3570: vma 0x7fc942804000 0x7fc942805000 (00.010114) 3570: vma 0x7fc942805000 0x7fc942806000 (00.010123) 3570: vma 0x7fc942806000 0x7fc942807000 (00.010132) 3570: vma 0x7fc942807000 0x7fc942808000 (00.010140) 3570: vma 0x7fc942808000 0x7fc942809000 (00.010149) 3570: vma 0x7fc942809000 0x7fc94280a000 (00.010183) 3570: vma 0x7fc94280a000 0x7fc94280b000 (00.010187) 3570: vma 0x7fc94280b000 0x7fc94280c000 (00.010192) 3570: vma 0x7fc94280c000 0x7fc9428ad000 (00.010197) 3570: vma 0x7fc9428ad000 0x7fc9428ae000 (00.010205) 3570: vma 0x7fc9428ae000 0x7fc94294f000 (00.010214) 3570: vma 0x7fc94294f000 0x7fc942950000 (00.010223) 3570: vma 0x7fc942950000 0x7fc9429f1000 (00.010231) 3570: vma 0x7fc9429f1000 0x7fc9429f2000 (00.010239) 3570: vma 0x7fc9429f2000 0x7fc942a93000 (00.010248) 3570: vma 0x7fc942a93000 0x7fc942a94000 (00.010257) 3570: vma 0x7fc942a94000 0x7fc942b35000 (00.010265) 3570: vma 0x7fc942b35000 0x7fc942b36000 (00.010273) 3570: vma 0x7fc942b36000 0x7fc942bd7000 (00.010278) 3570: vma 0x7fc942bd7000 0x7fc942bd8000 (00.010283) 3570: vma 0x7fc942bd8000 0x7fc942c79000 (00.010291) 3570: vma 0x7fc942c79000 0x7fc942c7a000 (00.010296) 3570: vma 0x7fc942c7a000 0x7fc942d1b000 (00.010303) 3570: vma 0x7fc942d1b000 0x7fc942d1c000 (00.010312) 3570: vma 0x7fc942d1c000 0x7fc942dbd000 (00.010319) 3570: vma 0x7fc942dbd000 0x7fc942dbe000 (00.010326) 3570: vma 0x7fc942dbe000 0x7fc942e5f000 (00.010331) 3570: vma 0x7fc942e5f000 0x7fc942e60000 (00.010340) 3570: vma 0x7fc942e60000 0x7fc942f01000 (00.010347) 3570: vma 0x7fc942f01000 0x7fc942f02000 (00.010354) 3570: vma 0x7fc942f02000 0x7fc942fa3000 (00.010361) 3570: vma 0x7fc942fa3000 0x7fc942fa4000 (00.010374) 3570: vma 0x7fc942fa4000 0x7fc943045000 (00.010381) 3570: vma 0x7fc943045000 0x7fc943046000 (00.010388) 3570: vma 0x7fc943046000 0x7fc9430e7000 (00.010395) 3570: vma 0x7fc9430e7000 0x7fc9430e8000 (00.010402) 3570: vma 0x7fc9430e8000 0x7fc943189000 (00.010409) 3570: vma 0x7fc943189000 0x7fc94318a000 (00.010416) 3570: vma 0x7fc94318a000 0x7fc94322b000 (00.010423) 3570: vma 0x7fc94322b000 0x7fc94322c000 (00.010430) 3570: vma 0x7fc94322c000 0x7fc9432cd000 (00.010438) 3570: vma 0x7fc9432cd000 0x7fc9432ce000 (00.010446) 3570: vma 0x7fc9432ce000 0x7fc94336f000 (00.010454) 3570: vma 0x7fc94336f000 0x7fc943370000 (00.010462) 3570: vma 0x7fc943370000 0x7fc943411000 (00.010471) 3570: vma 0x7fc943411000 0x7fc943412000 (00.010479) 3570: vma 0x7fc943412000 0x7fc9434b3000 (00.010486) 3570: vma 0x7fc9434b3000 0x7fc9434b4000 (00.010493) 3570: vma 0x7fc9434b4000 0x7fc943555000 (00.010500) 3570: vma 0x7fc943555000 0x7fc943556000 (00.010512) 3570: vma 0x7fc943556000 0x7fc9435f7000 (00.010519) 3570: vma 0x7fc9435f7000 0x7fc9435f8000 (00.010523) 3570: vma 0x7fc9435f8000 0x7fc943699000 (00.010531) 3570: vma 0x7fc943699000 0x7fc94369a000 (00.010539) 3570: vma 0x7fc94369a000 0x7fc94373b000 (00.010547) 3570: vma 0x7fc94373b000 0x7fc94373c000 (00.010555) 3570: vma 0x7fc94373c000 0x7fc9437dd000 (00.010560) 3570: vma 0x7fc9437dd000 0x7fc9437de000 (00.010567) 3570: vma 0x7fc9437de000 0x7fc94387f000 (00.010575) 3570: vma 0x7fc94387f000 0x7fc943880000 (00.010583) 3570: vma 0x7fc943880000 0x7fc943921000 (00.010592) 3570: vma 0x7fc943921000 0x7fc943922000 (00.010616) 3570: vma 0x7fc943922000 0x7fc9439c3000 (00.010621) 3570: vma 0x7fc9439c3000 0x7fc9439c4000 (00.010628) 3570: vma 0x7fc9439c4000 0x7fc943a65000 (00.010636) 3570: vma 0x7fc943a65000 0x7fc943a66000 (00.010644) 3570: vma 0x7fc943a66000 0x7fc943b07000 (00.010649) 3570: vma 0x7fc943b07000 0x7fc943b08000 (00.010656) 3570: vma 0x7fc943b08000 0x7fc943ba9000 (00.010665) 3570: vma 0x7fc943ba9000 0x7fc943baa000 (00.010673) 3570: vma 0x7fc943baa000 0x7fc945dbb000 (00.010678) 3570: vma 0x7fc945dbb000 0x7fc945dc9000 (00.010688) 3570: vma 0x7fc945dc9000 0x7fc945e45000 (00.010696) 3570: vma 0x7fc945e45000 0x7fc945ea0000 (00.010704) 3570: vma 0x7fc945ea0000 0x7fc945ea1000 (00.010712) 3570: vma 0x7fc945ea1000 0x7fc945ea2000 (00.010736) 3570: vma 0x7fc945ea2000 0x7fc945ea4000 (00.010739) 3570: vma 0x7fc945ea4000 0x7fc945eb8000 (00.010742) 3570: vma 0x7fc945eb8000 0x7fc945ed1000 (00.010745) 3570: vma 0x7fc945ed1000 0x7fc945ed2000 (00.010747) 3570: vma 0x7fc945ed2000 0x7fc945ed3000 (00.010750) 3570: vma 0x7fc945ed3000 0x7fc945ed4000 (00.010753) 3570: vma 0x7fc945ed4000 0x7fc945edc000 (00.010755) 3570: vma 0x7fc945edc000 0x7fc945ee6000 (00.010758) 3570: vma 0x7fc945ee6000 0x7fc945f45000 (00.010761) 3570: vma 0x7fc945f45000 0x7fc945f5c000 (00.010764) 3570: vma 0x7fc945f5c000 0x7fc945f5d000 (00.010766) 3570: vma 0x7fc945f5d000 0x7fc945f5e000 (00.010769) 3570: vma 0x7fc945f5e000 0x7fc945f60000 (00.010775) 3570: vma 0x7fc945f60000 0x7fc945f71000 (00.010780) 3570: vma 0x7fc945f71000 0x7fc945f77000 (00.010783) 3570: vma 0x7fc945f77000 0x7fc945f78000 (00.010786) 3570: vma 0x7fc945f78000 0x7fc945f79000 (00.010792) 3570: vma 0x7fc945f79000 0x7fc945f7a000 (00.010795) 3570: vma 0x7fc945f7a000 0x7fc945fa2000 (00.010798) 3570: vma 0x7fc945fa2000 0x7fc946137000 (00.010801) 3570: vma 0x7fc946137000 0x7fc94618f000 (00.010810) 3570: vma 0x7fc94618f000 0x7fc946193000 (00.010815) 3570: vma 0x7fc946193000 0x7fc946195000 (00.010822) 3570: vma 0x7fc946195000 0x7fc9461a2000 (00.010830) 3570: vma 0x7fc9461a2000 0x7fc9461cb000 (00.010838) 3570: vma 0x7fc9461cb000 0x7fc9463f9000 (00.010847) 3570: vma 0x7fc9463f9000 0x7fc9464ff000 (00.010855) 3570: vma 0x7fc9464ff000 0x7fc946506000 (00.010863) 3570: vma 0x7fc946506000 0x7fc946507000 (00.010871) 3570: vma 0x7fc946507000 0x7fc946517000 (00.010879) 3570: vma 0x7fc94651a000 0x7fc94651c000 (00.010888) 3570: vma 0x7fc94651c000 0x7fc94651e000 (00.010893) 3570: vma 0x7fc94651e000 0x7fc946548000 (00.010900) 3570: vma 0x7fc946548000 0x7fc946553000 (00.010905) 3570: vma 0x7fc946554000 0x7fc946556000 (00.010912) 3570: vma 0x7fc946556000 0x7fc946558000 (00.010921) 3570: vma 0x7ffdf9303000 0x7ffdf9b02000 (00.010925) 3570: vma 0x7ffdf9be7000 0x7ffdf9beb000 (00.010936) 3570: vma 0x7ffdf9beb000 0x7ffdf9bed000 (00.010943) 3570: vma 0xffffffffff600000 0xffffffffff601000 (00.010962) 3570: Collect fdinfo pid=3570 fd=0 id=0xc (00.010973) 3570: Collect fdinfo pid=3570 fd=1 id=0xc (00.010976) 3570: Collect fdinfo pid=3570 fd=2 id=0xc (00.010985) 3570: Collect fdinfo pid=3570 fd=3 id=0xe (00.010994) 3570: Collect fdinfo pid=3570 fd=4 id=0xf (00.011075) 3570: skqueue: Preparing SCMs (00.011086) 3570: tty: Inherit terminal for id 0xc (00.011092) 3570: tty: head driver pts id 0xc index 0 (master 0 sid 1 pgrp 3570 inherit 1) (00.011098) 3570: File descs: (00.011107) 3570: `- type 1 ID 0x1 (00.011110) 3570: `- type 1 ID 0x2 (00.011113) 3570: `- type 1 ID 0x3 (00.011117) 3570: `- type 1 ID 0x4 (00.011125) 3570: `- type 1 ID 0x5 (00.011128) 3570: `- type 1 ID 0x6 (00.011130) 3570: `- type 1 ID 0x7 (00.011133) 3570: `- type 1 ID 0x8 (00.011135) 3570: `- type 1 ID 0x9 (00.011142) 3570: `- type 1 ID 0xa (00.011150) 3570: `- type 1 ID 0xb (00.011153) 3570: `- type 11 ID 0xc (00.011160) 3570: `- FD 0 pid 3570 (00.011163) 3570: `- FD 1 pid 3570 (00.011170) 3570: `- FD 2 pid 3570 (00.011173) 3570: `- type 1 ID 0xd (00.011180) 3570: `- type 6 ID 0xe (00.011184) 3570: `- FD 3 pid 3570 (00.011191) 3570: `- type 6 ID 0xf (00.011198) 3570: `- FD 4 pid 3570 (00.011205) 3570: `- type 1 ID 0x10 (00.011210) 3570: `- type 1 ID 0x11 (00.011760) 3570: nr_restored_pages: 4059 (00.011773) 3570: nr_shared_pages: 0 (00.011777) 3570: nr_dropped_pages: 0 (00.011781) 3570: nr_lazy: 0 (00.011800) 3570: Shrunk premap area to 0x7ff125da5000(0) (00.011810) 3570: Restore on-core sigactions for 3570 (00.011874) 3570: Restoring children in alien sessions: (00.011884) 3570: Restoring children in our session: (00.011902) 3570: Restoring 3570 to 3570 pgid (00.011912) 3570: will call setpgid, mine pgid is 6222 (00.011923) 3570: Restoring resources (00.011942) 3570: Opening fdinfo-s (00.011947) 3570: tty: open driver pts id 0xc index 0 (master 0 sid 1 pgrp 3570 inherit 1) (00.011965) 3570: tty: Migrated slave peer 0xc -> to fd 0 (00.011990) 3570: Create fd for 0 (00.011999) 3570: Going to dup 0 into 1 (00.012009) 3570: Going to dup 0 into 2 (00.012016) 3570: Receive fd for 1 (00.012024) 3570: Receive fd for 2 (00.012040) 3570: Create fd for 3 (00.012053) 3570: Create fd for 4 (00.012061) 3570: Opening 0x0056058e867000-0x0056058e868000 0000000000000000 (41) vma (00.012108) 3570: Opening 0x0056058e868000-0x0056058e869000 0x00000000001000 (41) vma (00.012118) 3570: Opening 0x0056058e869000-0x0056058e86a000 0x00000000002000 (41) vma (00.012128) 3570: Opening 0x0056058e86a000-0x0056058e86b000 0x00000000002000 (41) vma (00.012137) 3570: Opening 0x0056058e86b000-0x0056058e86c000 0x00000000003000 (41) vma (00.012145) 3570: Opening 0x007fc9427fc000-0x007fc9427fd000 0000000000000000 (20000041) vma (00.012190) 3570: Opening 0x007fc9427fd000-0x007fc9427fe000 0x00000000001000 (20000041) vma (00.012198) 3570: Opening 0x007fc9427fe000-0x007fc9427ff000 0x00000000002000 (41) vma (00.012206) 3570: Opening 0x007fc9427ff000-0x007fc942800000 0x00000000002000 (41) vma (00.012214) 3570: Opening 0x007fc942800000-0x007fc942801000 0x00000000003000 (41) vma (00.012222) 3570: Opening 0x007fc942801000-0x007fc942802000 0000000000000000 (20000041) vma (00.012264) 3570: Opening 0x007fc942802000-0x007fc942803000 0x00000000001000 (20000041) vma (00.012273) 3570: Opening 0x007fc942803000-0x007fc942804000 0x00000000002000 (41) vma (00.012282) 3570: Opening 0x007fc942804000-0x007fc942805000 0x00000000002000 (41) vma (00.012290) 3570: Opening 0x007fc942805000-0x007fc942806000 0x00000000003000 (41) vma (00.012298) 3570: Opening 0x007fc942806000-0x007fc942807000 0000000000000000 (20000041) vma (00.012340) 3570: Opening 0x007fc942807000-0x007fc942808000 0x00000000001000 (20000041) vma (00.012345) 3570: Opening 0x007fc942808000-0x007fc942809000 0x00000000002000 (41) vma (00.012355) 3570: Opening 0x007fc942809000-0x007fc94280a000 0x00000000002000 (41) vma (00.012365) 3570: Opening 0x007fc94280a000-0x007fc94280b000 0x00000000003000 (41) vma (00.012369) 3570: Opening 0x007fc945dbb000-0x007fc945dc9000 0000000000000000 (20000041) vma (00.012417) 3570: Opening 0x007fc945dc9000-0x007fc945e45000 0x0000000000e000 (20000041) vma (00.012424) 3570: Opening 0x007fc945e45000-0x007fc945ea0000 0x0000000008a000 (41) vma (00.012434) 3570: Opening 0x007fc945ea0000-0x007fc945ea1000 0x000000000e4000 (41) vma (00.012443) 3570: Opening 0x007fc945ea1000-0x007fc945ea2000 0x000000000e5000 (41) vma (00.012450) 3570: Opening 0x007fc945ea2000-0x007fc945ea4000 0000000000000000 (20000041) vma (00.012494) 3570: Opening 0x007fc945ea4000-0x007fc945eb8000 0x00000000002000 (20000041) vma (00.012503) 3570: Opening 0x007fc945eb8000-0x007fc945ed1000 0x00000000016000 (20000041) vma (00.012511) 3570: Opening 0x007fc945ed1000-0x007fc945ed2000 0x0000000002f000 (41) vma (00.012518) 3570: Opening 0x007fc945ed2000-0x007fc945ed3000 0x0000000002f000 (41) vma (00.012527) 3570: Opening 0x007fc945ed3000-0x007fc945ed4000 0x00000000030000 (41) vma (00.012535) 3570: Opening 0x007fc945edc000-0x007fc945ee6000 0000000000000000 (20000041) vma (00.012576) 3570: Opening 0x007fc945ee6000-0x007fc945f45000 0x0000000000a000 (20000041) vma (00.012585) 3570: Opening 0x007fc945f45000-0x007fc945f5c000 0x00000000069000 (41) vma (00.012592) 3570: Opening 0x007fc945f5c000-0x007fc945f5d000 0x0000000007f000 (41) vma (00.012600) 3570: Opening 0x007fc945f5d000-0x007fc945f5e000 0x00000000080000 (41) vma (00.012607) 3570: Opening 0x007fc945f5e000-0x007fc945f60000 0000000000000000 (20000041) vma (00.012649) 3570: Opening 0x007fc945f60000-0x007fc945f71000 0x00000000002000 (20000041) vma (00.012660) 3570: Opening 0x007fc945f71000-0x007fc945f77000 0x00000000013000 (20000041) vma (00.012669) 3570: Opening 0x007fc945f77000-0x007fc945f78000 0x00000000019000 (41) vma (00.012677) 3570: Opening 0x007fc945f78000-0x007fc945f79000 0x00000000019000 (41) vma (00.012685) 3570: Opening 0x007fc945f79000-0x007fc945f7a000 0x0000000001a000 (41) vma (00.012693) 3570: Opening 0x007fc945f7a000-0x007fc945fa2000 0000000000000000 (20000041) vma (00.012738) 3570: Opening 0x007fc945fa2000-0x007fc946137000 0x00000000028000 (20000041) vma (00.012750) 3570: Opening 0x007fc946137000-0x007fc94618f000 0x000000001bd000 (41) vma (00.012754) 3570: Opening 0x007fc94618f000-0x007fc946193000 0x00000000214000 (41) vma (00.012757) 3570: Opening 0x007fc946193000-0x007fc946195000 0x00000000218000 (41) vma (00.012763) 3570: Opening 0x007fc9461a2000-0x007fc9461cb000 0000000000000000 (20000041) vma (00.012823) 3570: Opening 0x007fc9461cb000-0x007fc9463f9000 0x00000000029000 (20000041) vma (00.012836) 3570: Opening 0x007fc9463f9000-0x007fc9464ff000 0x00000000257000 (41) vma (00.012841) 3570: Opening 0x007fc9464ff000-0x007fc946506000 0x0000000035c000 (41) vma (00.012849) 3570: Opening 0x007fc946506000-0x007fc946507000 0x00000000363000 (41) vma (00.012857) 3570: Opening 0x007fc94651c000-0x007fc94651e000 0000000000000000 (20000041) vma (00.012901) 3570: Opening 0x007fc94651e000-0x007fc946548000 0x00000000002000 (20000041) vma (00.012910) 3570: Opening 0x007fc946548000-0x007fc946553000 0x0000000002c000 (20000041) vma (00.012919) 3570: Opening 0x007fc946554000-0x007fc946556000 0x00000000037000 (41) vma (00.012927) 3570: Opening 0x007fc946556000-0x007fc946558000 0x00000000039000 (41) vma (00.012987) 3570: `- render 1023 iovs (0x56058e868000:4096...) (00.013041) 3570: `- render 61 iovs (0x7fc945ae0000:4096...) (00.013051) 3570: Restore via sigreturn (00.013290) 3570: 1 threads require 144K of memory (00.013304) 3570: Found bootstrap VMA hint at: 0x10000 (needs ~168K) (00.013420) 3570: Thread 0 stack 0x1c080 rt_sigframe 0x24080 (00.013496) 3570: Restoring umask to 22 (00.013529) 3570: task_args: 0x29000 task_args->pid: 3570 task_args->nr_threads: 1 task_args->clone_restore_fn: 0x11fc0 task_args->thread_args: 0x29580 (00.013546) pie: 3570: Switched to the restorer 3570 (00.013993) pie: 3570: vdso: Using gettimeofday() on vdso at 0x38bd0 (00.179057) pie: 3570: vdso: Runtime vdso/vvar matches dumpee, remap inplace (00.179123) pie: 3570: vdso: Using gettimeofday() on vdso at 0x7ffdf9bebbd0 (00.179217) pie: 3570: Restoring scheduler params 0.0.0 (00.179249) pie: 3570: 3570: Restored (00.179297) net: Unlock network (00.179359) pie: 3570: Error (criu/pie/restorer.c:343): Unable to restore capabilities: -1 (00.194825) pie: 3570: Error (criu/pie/restorer.c:2168): BUG at criu/pie/restorer.c:2168 (00.194905) Error (compel/src/lib/infect.c:1612): Task 3570 is in unexpected state: b7f (00.195061) Error (compel/src/lib/infect.c:1618): Task stopped with 11: Segmentation fault (00.195117) Error (criu/cr-restore.c:2469): Can't stop all tasks on rt_sigreturn (00.195151) Error (criu/cr-restore.c:2530): Killing processes because of failure on restore. The Network was unlocked so some data or a connection may have been lost. (00.195270) Error (criu/cr-restore.c:2557): Restoring FAILED. ```

Output of `criu --version`:

``` # ../criu/criu --version Version: 3.18 GitID: v3.18-190-g50aa6da65 ```

Output of `criu check --all`:

``` # ../criu/criu check --all Error (criu/tun.c:85): tun: Unable to create tun: No such file or directory Warn (criu/sk-unix.c:224): unix: Unable to open a socket file: Operation not permitted Error (criu/net.c:3770): net: Unable create a network namespace: Operation not permitted Warn (criu/net.c:3826): net: NSID isn't reported for network links Warn (criu/net.c:3486): net: Unable to get socket network namespace Warn (criu/kerndat.c:1593): CRIU was built without libnftables support Error (criu/kerndat.c:1063): Fail to mount tmfps to /tmp/.criu.move_mount_set_group.vCRKnu: Operation not permitted Error (criu/kerndat.c:1722): kerndat_has_move_mount_set_group failed when initializing kerndat. Error (criu/crtools.c:263): Could not initialize kernel features detection. ```

rst0git commented 9 months ago

@wata727 As mentioned in CRIU's manpage (man 8 criu), in addition to CAP_CHECKPOINT_RESTORE, several other capabilities are currently required for checkpoint/restore. The required capabilities depend on the specific CRIU features that are being used during checkpoint/restore (e.g., CAP_NET_ADMIN, CAP_SYS_CHROOT, CAP_SETUID / CAP_SETGID, CAP_SYS_RESOURCE). The following readme file contains a list with some of the capabilities that might be required: https://github.com/checkpoint-restore/criu/blob/criu-dev/test/javaTests/README.md

wata727 commented 9 months ago

Thanks @rst0git, I understand that additional capabilities are required for complete restoring, but I wanted to know if there was a way to restore a trivial process (e.g. simple loop) in an unprivileged container.

My use case only requires processing that is not directly related to the kernel, such as file loading, parsing, and loading Ruby classes, so if this is possible, it would be possible to speed up the startup process while still using Docker.

After digging deeper into the cause of the restore failure, I found that it was failing with capset(2). https://github.com/checkpoint-restore/criu/blob/50aa6da65adbeea50152753b86fb02f22bb88a22/criu/pie/restorer.c#L341

The dumped process was running in a privileged container, so trying to set all the capabilities here was causing an error. I restored a process running in an unprivileged container with criu dump -t <pid> --unprivileged -j and everything worked perfectly fine.

My conclusion here is that vendors like AWS and Google may be able to support CAP_CHECKPOINT_RESTORE in containers in the future to achieve faster startups using CRIU. Thank you both for answering my questions.