Closed dleviminzi closed 6 days ago
@dleviminzi It might be worth reading the following article, which provides an example and describes how to use the cuda-checkpoint
tool with CRIU.
https://developer.nvidia.com/blog/checkpointing-cuda-applications-with-criu/ https://github.com/NVIDIA/cuda-checkpoint?tab=readme-ov-file#example
The CUDA plugin integrates this functionality and automates some of these steps. Note that to use the CUDA plugin for CRIU you need to make sure that the cuda-checkpoint
tool is installed in PATH
. Since this is an optional feature, the plugin is provided as a separate package in Fedora:
sudo dnf install criu-cuda-plugin
@dleviminzi It might be worth reading the following article, which provides an example and describes how to use the
cuda-checkpoint
tool with CRIU.https://developer.nvidia.com/blog/checkpointing-cuda-applications-with-criu/ https://github.com/NVIDIA/cuda-checkpoint?tab=readme-ov-file#example
The CUDA plugin integrates this functionality and automates some of these steps. Note that to use the CUDA plugin for CRIU you need to make sure that the
cuda-checkpoint
tool is installed inPATH
. Since this is an optional feature, the plugin is provided as a separate package in Fedora:sudo dnf install criu-cuda-plugin
Hi @rst0git thank you for the response! I read through the cuda-checkpoint
repo earlier and had tried using it directly before realizing that you had introduced this plugin. I've installed the plugin and cuda-checkpoint
is in my PATH
. It seems that the plugin is being used, but nonetheless my checkpoint fails. I am attempting to checkpoint the following simple flask server:
import torch
from flask import Flask, jsonify
app = Flask(__name__)
# Global GPU state
gpu_counter = torch.tensor([0])
def init_gpu_state():
global gpu_counter
gpu_counter = torch.tensor([100], device="cuda")
print(f"Initialized counter on {gpu_counter.device}")
@app.route("/increment", methods=["POST"])
def increment():
global gpu_counter
gpu_counter += 1
return jsonify({"value": int(gpu_counter.item())})
@app.route("/value", methods=["GET"])
def get_value():
return jsonify({"value": int(gpu_counter.item())})
if __name__ == "__main__":
init_gpu_state()
app.run(host="0.0.0.0", port=8000)
I run the following to create the dump
sudo criu dump -t 384314 -D demo -v4 -o "out.log"
It does not seem to succeed though the dump folder is not empty. Here are the logs. Not sure why the format is getting messed up, but you can read them more clearly by entering the edit mode of this comment.
I tried removing the cuda plugin and following the cuda-checkpoint instructions. Running cuda-checkpoint --toggle --pid 410602
had the expected effect and the pid was no longer reported by nvidia-smi
. Then I tried to create the dump and this was the result:
sudo criu dump --shell-job --images-dir demo --tree 410602
Warn (compel/arch/x86/src/lib/infect.c:418): Will restore 410602 with interrupted system call
Error (criu/files-ext.c:94): Can't dump file 15 of that type [20666] (chr 195:255)
Error (criu/cr-dump.c:1681): Dump files (pid: 410602) failed with -1
Error (criu/cr-dump.c:2111): Dumping FAILED.
My guess would be that this is what is happening when the plugin is used as well, but I'm not sure.
@dleviminzi This problem has been reported in https://github.com/NVIDIA/cuda-checkpoint/issues/4:
Single-process pytorch support is planned to be released in early 2025!
@dleviminzi This problem has been reported in NVIDIA/cuda-checkpoint#4:
Single-process pytorch support is planned to be released in early 2025!
Ahh, thank you!
Description
I'm guessing I'm doing something wrong/missing some documentation. In the 4.0 release, I saw this line "CUDA plugin: Introduced a plugin to support checkpointing and restoring NVIDIA CUDA applications" which I wanted to try out. I have 4.0 installed in Fedora and I'm driver version 560 for nvidia. I'm not sure if I need to pass some flag or something to get it to use the cuda plugin.
Steps to reproduce the issue:
Describe the results you received:
Describe the results you expected: A successful dump.
Additional information you deem important (e.g. issue happens only occasionally):
CRIU logs and information:
CRIU full dump/restore logs:
``` (00.000000) CRIU run id = 0xeffffffc0001ccf5 (00.000005) Version: 4.0 (gitid 0) (00.000007) Running on nobara Linux 6.11.3-200.fsync.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC TKG Fri Oct 11 08:23:43 UTC 2024 x86_64 (00.000018) Loaded kdat cache from /run/criu/criu.kdat (00.000033) Hugetlb size 2 Mb is supported but cannot get dev's number (00.000043) Hugetlb size 1024 Mb is supported but cannot get dev's number (00.000344) cpu: x86_family 25 x86_vendor_id AuthenticAMD x86_model_id AMD Ryzen 7 7800X3D 8-Core Processor (00.000351) cpu: fpu: xfeatures_mask 0x2e5 xsave_size 2440 xsave_size_max 2440 xsaves_size 2456 (00.000354) cpu: fpu: x87 floating point registers xstate_offsets 0 / 0 xstate_sizes 160 / 160 (00.000359) cpu: fpu: AVX registers xstate_offsets 576 / 576 xstate_sizes 256 / 256 (00.000361) cpu: fpu: AVX-512 opmask xstate_offsets 832 / 832 xstate_sizes 64 / 64 (00.000363) cpu: fpu: AVX-512 Hi256 xstate_offsets 896 / 896 xstate_sizes 512 / 512 (00.000364) cpu: fpu: AVX-512 ZMM_Hi256 xstate_offsets 1408 / 1408 xstate_sizes 1024 / 1024 (00.000365) cpu: fpu: Protection Keys User registers xstate_offsets 2432 / 2432 xstate_sizes 8 / 8 (00.000375) ======================================== (00.000385) Dumping processes (pid: 78925 comm: pt_main_thread) (00.000387) ======================================== (00.000389) rlimit: RLIMIT_NOFILE unlimited for self (00.000395) Running pre-dump scripts (00.000417) irmap: Searching irmap cache in work dir (00.000427) No irmap-cache image (00.000429) irmap: Searching irmap cache in parent (00.000433) No parent images directory provided (00.000435) irmap: No irmap cache (00.000436) cpu: fpu:1 fxsr:1 xsave:1 xsaveopt:1 xsavec:1 xgetbv1:1 xsaves:1 (00.000545) cg-prop: Parsing controller "cpu" (00.000549) cg-prop: Strategy "replace" (00.000551) cg-prop: Property "cpu.shares" (00.000553) cg-prop: Property "cpu.cfs_period_us" (00.000555) cg-prop: Property "cpu.cfs_quota_us" (00.000557) cg-prop: Property "cpu.rt_period_us" (00.000559) cg-prop: Property "cpu.rt_runtime_us" (00.000560) cg-prop: Parsing controller "memory" (00.000562) cg-prop: Strategy "replace" (00.000563) cg-prop: Property "memory.limit_in_bytes" (00.000566) cg-prop: Property "memory.memsw.limit_in_bytes" (00.000568) cg-prop: Property "memory.swappiness" (00.000570) cg-prop: Property "memory.soft_limit_in_bytes" (00.000572) cg-prop: Property "memory.move_charge_at_immigrate" (00.000575) cg-prop: Property "memory.oom_control" (00.000576) cg-prop: Property "memory.use_hierarchy" (00.000577) cg-prop: Property "memory.kmem.limit_in_bytes" (00.000581) cg-prop: Property "memory.kmem.tcp.limit_in_bytes" (00.000584) cg-prop: Parsing controller "cpuset" (00.000586) cg-prop: Strategy "replace" (00.000588) cg-prop: Property "cpuset.cpus" (00.000589) cg-prop: Property "cpuset.mems" (00.000593) cg-prop: Property "cpuset.memory_migrate" (00.000594) cg-prop: Property "cpuset.cpu_exclusive" (00.000596) cg-prop: Property "cpuset.mem_exclusive" (00.000598) cg-prop: Property "cpuset.mem_hardwall" (00.000600) cg-prop: Property "cpuset.memory_spread_page" (00.000602) cg-prop: Property "cpuset.memory_spread_slab" (00.000604) cg-prop: Property "cpuset.sched_load_balance" (00.000606) cg-prop: Property "cpuset.sched_relax_domain_level" (00.000608) cg-prop: Parsing controller "blkio" (00.000610) cg-prop: Strategy "replace" (00.000611) cg-prop: Property "blkio.weight" (00.000613) cg-prop: Parsing controller "freezer" (00.000614) cg-prop: Strategy "replace" (00.000616) cg-prop: Parsing controller "perf_event" (00.000617) cg-prop: Strategy "replace" (00.000619) cg-prop: Parsing controller "net_cls" (00.000621) cg-prop: Strategy "replace" (00.000622) cg-prop: Property "net_cls.classid" (00.000623) cg-prop: Parsing controller "net_prio" (00.000625) cg-prop: Strategy "replace" (00.000627) cg-prop: Property "net_prio.ifpriomap" (00.000628) cg-prop: Parsing controller "pids" (00.000630) cg-prop: Strategy "replace" (00.000632) cg-prop: Property "pids.max" (00.000634) cg-prop: Parsing controller "devices" (00.000635) cg-prop: Strategy "replace" (00.000637) cg-prop: Property "devices.list" (00.000653) Preparing image inventory (version 1) (00.000668) Add pid ns 1 pid 118005 (00.000674) Add net ns 2 pid 118005 (00.000680) Add ipc ns 3 pid 118005 (00.000686) Add uts ns 4 pid 118005 (00.000691) Add time ns 5 pid 118005 (00.000702) Add mnt ns 6 pid 118005 (00.000707) Add user ns 7 pid 118005 (00.000713) Add cgroup ns 8 pid 118005 (00.000715) cg: Dumping cgroups for thread 118005 (00.000727) cg: `- New css ID 1 (00.000728) cg: `- [] -> [/user.slice/user-1000.slice/user@1000.service/tmux-spawn-edbf1a6b-3527-4c3f-b291-6700d55c1385.scope] [0] (00.000730) cg: Set 1 is criu one (00.000738) Detected cgroup V1 freezer (00.000772) Seized task 78925, state 1 (00.000774) seccomp: Collected tid_real 78925 mode 0 (00.000795) Seizing 78925's 78948 thread (00.000827) seccomp: Collected tid_real 78948 mode 0 (00.000829) Seizing 78925's 78949 thread (00.000856) seccomp: Collected tid_real 78949 mode 0 (00.000858) Seizing 78925's 78955 thread (00.000884) seccomp: Collected tid_real 78955 mode 0 (00.000885) Seizing 78925's 78956 thread (00.000910) seccomp: Collected tid_real 78956 mode 0 (00.000912) Seizing 78925's 78957 thread (00.000937) seccomp: Collected tid_real 78957 mode 0 (00.000939) Seizing 78925's 78958 thread (00.000971) seccomp: Collected tid_real 78958 mode 0 (00.000973) Seizing 78925's 78959 thread (00.000998) seccomp: Collected tid_real 78959 mode 0 (00.001000) Seizing 78925's 78960 thread (00.001026) seccomp: Collected tid_real 78960 mode 0 (00.001039) Collected (3 attempts, 0 in_progress) (00.001077) Collected (4 attempts, 0 in_progress) (00.001092) Collected 78925 in 1 state (00.001123) net: Lock network (00.001168) type btrfs source /dev/nvme1n1p3 mnt_id 66 s_dev 0x20 /@ @ ./ flags 0x300000 options compress=zstd:1,ssd,discard=async,space_cache=v2,subvolid=256,subvol=/@ (00.001175) type devtmpfs source devtmpfs mnt_id 35 s_dev 0x6 / @ ./dev flags 0x1100002 options size=4096k,nr_inodes=12228966,mode=755,inode64 (00.001178) type tmpfs source tmpfs mnt_id 36 s_dev 0x18 / @ ./dev/shm flags 0x1100006 options inode64 (00.001183) type devpts source devpts mnt_id 37 s_dev 0x19 / @ ./dev/pts flags 0x30000a options gid=5,mode=620,ptmxmode=000 (00.001192) type sysfs source sysfs mnt_id 38 s_dev 0x17 / @ ./sys flags 0x30000e options (00.001195) type securityfs source securityfs mnt_id 39 s_dev 0x7 / @ ./sys/kernel/security flags 0x30000e options (00.001200) type cgroup2 source cgroup2 mnt_id 40 s_dev 0x1b / @ ./sys/fs/cgroup flags 0x30000e options nsdelegate,memory_recursiveprot (00.001203) type pstore source pstore mnt_id 41 s_dev 0x1c / @ ./sys/fs/pstore flags 0x30000e options (00.001207) type efivarfs source efivarfs mnt_id 42 s_dev 0x1d / @ ./sys/firmware/efi/efivars flags 0x30000e options (00.001210) type bpf source bpf mnt_id 43 s_dev 0x1e / @ ./sys/fs/bpf flags 0x30000e options mode=700 (00.001231) type configfs source configfs mnt_id 44 s_dev 0x1f / @ ./sys/kernel/config flags 0x30000e options (00.001234) type proc source proc mnt_id 45 s_dev 0x16 / @ ./proc flags 0x30000e options (00.001239) type tmpfs source tmpfs mnt_id 46 s_dev 0x1a / @ ./run flags 0x1100006 options size=19586988k,nr_inodes=819200,mode=755,inode64 (00.001243) type autofs source systemd-1 mnt_id 24 s_dev 0x24 / @ ./proc/sys/fs/binfmt_misc flags 0x300000 options fd=37,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11588 (00.001247) type mqueue source mqueue mnt_id 25 s_dev 0x15 / @ ./dev/mqueue flags 0x30000e options (00.001250) type debugfs source debugfs mnt_id 27 s_dev 0x8 / @ ./sys/kernel/debug flags 0x30000e options (00.001253) type hugetlbfs source hugetlbfs mnt_id 28 s_dev 0x25 / @ ./dev/hugepages flags 0x300006 options pagesize=2M (00.001258) type tracefs source tracefs mnt_id 31 s_dev 0xd / @ ./sys/kernel/tracing flags 0x30000e options (00.001260) skipping fs mounted at /sys/kernel/tracing (00.001264) type fusectl source fusectl mnt_id 32 s_dev 0x26 / @ ./sys/fs/fuse/connections flags 0x30000e options (00.001288) type btrfs source /dev/nvme1n1p3 mnt_id 47 s_dev 0x20 /@home @ ./home flags 0x300000 options compress=zstd:1,ssd,discard=async,space_cache=v2,subvolid=257,subvol=/@home (00.001291) type tmpfs source tmpfs mnt_id 50 s_dev 0x2a / @ ./tmp flags 0x100400 options inode64 (00.001294) type ext4 source /dev/nvme1n1p2 mnt_id 53 s_dev 0x10300007 / @ ./boot flags 0x300000 options (00.001298) type vfat source /dev/nvme1n1p1 mnt_id 56 s_dev 0x10300006 / @ ./boot/efi flags 0x300000 options fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=mixed,errors=remount-ro (00.001301) type binfmt_misc source binfmt_misc mnt_id 189 s_dev 0x2f / @ ./proc/sys/fs/binfmt_misc flags 0x30000e options (00.001305) type tmpfs source tmpfs mnt_id 29 s_dev 0x45 / @ ./run/user/982 flags 0x300006 options size=9793492k,nr_inodes=2448373,mode=700,uid=982,gid=980,inode64 (00.001309) type tmpfs source tmpfs mnt_id 766 s_dev 0x49 / @ ./run/user/1000 flags 0x300006 options size=9793492k,nr_inodes=2448373,mode=700,uid=1000,gid=1000,inode64 (00.001322) type overlay source overlay mnt_id 830 s_dev 0x46 / @ ./var/lib/docker/overlay2/959ba2d3aa57439dc785df67b98421b2e21be78d9c429c6b0f77e3c962647472/merged flags 0x300000 options lowerdir=/var/lib/docker/overlay2/l/SNSRKQRSBFT7PFQYCEVZRIVB2N:/var/lib/docker/overlay2/l/XHXB2JHQQENP7HRHT3ALOZ4K2R:/var/lib/docker/overlay2/l/ETMBHKKWYDJBXZI3TIQGTW2PIX:/var/lib/docker/overlay2/l/O7RA2RBA6KID6P4B2Z4OOCJCYV:/var/lib/docker/overlay2/l/MDNBXK2E2PMRCRE7W3XPDGBTCK:/var/lib/docker/overlay2/l/YYRH2V2CM3GCH57OREDTTYT5Q3,upperdir=/var/lib/docker/overlay2/959ba2d3aa57439dc785df67b98421b2e21be78d9c429c6b0f77e3c962647472/diff,workdir=/var/lib/docker/overlay2/959ba2d3aa57439dc785df67b98421b2e21be78d9c429c6b0f77e3c962647472/work (00.001343) type overlay source overlay mnt_id 913 s_dev 0x48 / @ ./var/lib/docker/overlay2/df49f38dfbd116ba366730a5962f83d837fc67e183017dc6ce7a7aa661079d82/merged flags 0x300000 options lowerdir=/var/lib/docker/overlay2/l/THSCTYBYGN5YVNXT2F652JPJAG:/var/lib/docker/overlay2/l/MIM5M7VVZLODXXWNVKF637N57N:/var/lib/docker/overlay2/l/227JTTG3RWYCYWV4NNXO2QRKOR:/var/lib/docker/overlay2/l/AWVXGVUUVHTT2T3HJPES7GPFFK:/var/lib/docker/overlay2/l/KG6FNFU6UMPZWMJ6FYKA4UKME6:/var/lib/docker/overlay2/l/UMFL7SSXVFUIELWEROAOWSKYRF:/var/lib/docker/overlay2/l/ROYSGGSOQPIJX2MQJKD3JXTPTS:/var/lib/docker/overlay2/l/6YO75MENNQR7LBQQXFOQ32YXJS,upperdir=/var/lib/docker/overlay2/df49f38dfbd116ba366730a5962f83d837fc67e183017dc6ce7a7aa661079d82/diff,workdir=/var/lib/docker/overlay2/df49f38dfbd116ba366730a5962f83d837fc67e183017dc6ce7a7aa661079d82/work (00.001356) type overlay source overlay mnt_id 914 s_dev 0x47 / @ ./var/lib/docker/overlay2/a101d55daacec2cfae2cb52a61f942f22d405071f49168cf22566281685d7a53/merged flags 0x300000 options lowerdir=/var/lib/docker/overlay2/l/YHHVOVGYOOZD5DNYEJU46XQPO3:/var/lib/docker/overlay2/l/VFS32GPVOS2OHLUZOIIQMADFZM:/var/lib/docker/overlay2/l/ZPEGJWBD3DCNGNTY74KRNRDXNB:/var/lib/docker/overlay2/l/RUOS47HJBYCR3PVDKRKZCBEO6Q:/var/lib/docker/overlay2/l/H5FOXX3JI6DO4AZDM2ICKGTHRV:/var/lib/docker/overlay2/l/NWCPR3LID2PBL5ZKXHHFY7M2RN:/var/lib/docker/overlay2/l/L27WKD7W6NX4MKLGHCW6ZXECKR:/var/lib/docker/overlay2/l/O6732P2NC6ZHGD4GDK7GHFJWSS:/var/lib/docker/overlay2/l/OTBTM32RHPTZ7YUVS3YMC67CMS:/var/lib/docker/overlay2/l/4V5A4TJ44D7CKAVLVF4NHCASHY:/var/lib/docker/overlay2/l/6V274XMYFFO4ZWWGUSR6GNATPS:/var/lib/docker/overlay2/l/YYJFOPYPHVC3KQDNO6MKV7WI5C:/var/lib/docker/overlay2/l/URZIG66PTBLZPPTB4KCRKUCZZS:/var/lib/docker/overlay2/l/LSI52KRAJFBHQ6UMS6IH5L3NM5,upperdir=/var/lib/docker/overlay2/a101d55daacec2cfae2cb52a61f942f22d405071f49168cf22566281685d7a53/diff,workdir=/var/lib/docker/overlay2/a101d55daacec2cfae2cb52a61f942f22d405071f49168cf22566281685d7a53/work (00.001363) type nsfs source nsfs mnt_id 1081 s_dev 0x4 net:[4026533227] @ ./run/docker/netns/7c3ab68df6f4 flags 0x1100000 options (00.001368) type nsfs source nsfs mnt_id 1104 s_dev 0x4 net:[4026533340] @ ./run/docker/netns/3c4d73991bca flags 0x1100000 options (00.001371) type nsfs source nsfs mnt_id 1029 s_dev 0x4 net:[4026533112] @ ./run/docker/netns/da5fecd57c33 flags 0x1100000 options (00.001375) mnt: Building mountpoints tree (00.001376) mnt: Building plain mount tree (00.001379) mnt: Working on 1029->46 (00.001381) mnt: Working on 1104->46 (00.001382) mnt: Working on 1081->46 (00.001388) mnt: Working on 914->66 (00.001390) mnt: Working on 913->66 (00.001394) mnt: Working on 830->66 (00.001396) mnt: Working on 766->46 (00.001397) mnt: Working on 29->46 (00.001399) mnt: Working on 189->24 (00.001401) mnt: Working on 56->53 (00.001403) mnt: Working on 53->66 (00.001418) mnt: Working on 50->66 (00.001422) mnt: Working on 47->66 (00.001423) mnt: Working on 32->38 (00.001425) mnt: Working on 28->35 (00.001427) mnt: Working on 27->38 (00.001429) mnt: Working on 25->35 (00.001432) mnt: Working on 24->45 (00.001434) mnt: Working on 46->66 (00.001436) mnt: Working on 45->66 (00.001437) mnt: Working on 44->38 (00.001439) mnt: Working on 43->38 (00.001441) mnt: Working on 42->38 (00.001443) mnt: Working on 41->38 (00.001444) mnt: Working on 40->38 (00.001446) mnt: Working on 39->38 (00.001447) mnt: Working on 38->66 (00.001449) mnt: Working on 37->35 (00.001451) mnt: Working on 36->35 (00.001452) mnt: Working on 35->66 (00.001455) mnt: Working on 66->1 (00.001456) mnt: Resorting children of 66 in mount order (00.001460) mnt: Resorting children of 914 in mount order (00.001461) mnt: Resorting children of 913 in mount order (00.001462) mnt: Resorting children of 830 in mount order (00.001463) mnt: Resorting children of 53 in mount order (00.001465) mnt: Resorting children of 56 in mount order (00.001468) mnt: Resorting children of 50 in mount order (00.001470) mnt: Resorting children of 47 in mount order (00.001471) mnt: Resorting children of 46 in mount order (00.001473) mnt: Resorting children of 1029 in mount order (00.001474) mnt: Resorting children of 1104 in mount order (00.001476) mnt: Resorting children of 1081 in mount order (00.001477) mnt: Resorting children of 766 in mount order (00.001479) mnt: Resorting children of 29 in mount order (00.001481) mnt: Resorting children of 45 in mount order (00.001482) mnt: Resorting children of 24 in mount order (00.001483) mnt: Resorting children of 189 in mount order (00.001484) mnt: Resorting children of 38 in mount order (00.001486) mnt: Resorting children of 32 in mount order (00.001488) mnt: Resorting children of 42 in mount order (00.001489) mnt: Resorting children of 27 in mount order (00.001491) mnt: Resorting children of 44 in mount order (00.001494) mnt: Resorting children of 43 in mount order (00.001496) mnt: Resorting children of 41 in mount order (00.001498) mnt: Resorting children of 40 in mount order (00.001499) mnt: Resorting children of 39 in mount order (00.001502) mnt: Resorting children of 35 in mount order (00.001504) mnt: Resorting children of 28 in mount order (00.001505) mnt: Resorting children of 25 in mount order (00.001506) mnt: Resorting children of 37 in mount order (00.001509) mnt: Resorting children of 36 in mount order (00.001511) mnt: Done: (00.001513) mnt: [./](66->1) (00.001515) mnt: [./var/lib/docker/overlay2/a101d55daacec2cfae2cb52a61f942f22d405071f49168cf22566281685d7a53/merged](914->66) (00.001516) mnt: <-- (00.001518) mnt: [./var/lib/docker/overlay2/df49f38dfbd116ba366730a5962f83d837fc67e183017dc6ce7a7aa661079d82/merged](913->66) (00.001520) mnt: <-- (00.001521) mnt: [./var/lib/docker/overlay2/959ba2d3aa57439dc785df67b98421b2e21be78d9c429c6b0f77e3c962647472/merged](830->66) (00.001523) mnt: <-- (00.001524) mnt: [./boot](53->66) (00.001525) mnt: [./boot/efi](56->53) (00.001527) mnt: <-- (00.001528) mnt: <-- (00.001529) mnt: [./tmp](50->66) (00.001530) mnt: <-- (00.001531) mnt: [./home](47->66) (00.001532) mnt: <-- (00.001532) mnt: [./run](46->66) (00.001533) mnt: [./run/docker/netns/da5fecd57c33](1029->46) (00.001535) mnt: <-- (00.001536) mnt: [./run/docker/netns/3c4d73991bca](1104->46) (00.001538) mnt: <-- (00.001538) mnt: [./run/docker/netns/7c3ab68df6f4](1081->46) (00.001540) mnt: <-- (00.001540) mnt: [./run/user/1000](766->46) (00.001542) mnt: <-- (00.001543) mnt: [./run/user/982](29->46) (00.001545) mnt: <-- (00.001546) mnt: <-- (00.001547) mnt: [./proc](45->66) (00.001548) mnt: [./proc/sys/fs/binfmt_misc](24->45) (00.001549) mnt: [./proc/sys/fs/binfmt_misc](189->24) (00.001551) mnt: <-- (00.001552) mnt: <-- (00.001554) mnt: <-- (00.001555) mnt: [./sys](38->66) (00.001557) mnt: [./sys/fs/fuse/connections](32->38) (00.001558) mnt: <-- (00.001559) mnt: [./sys/firmware/efi/efivars](42->38) (00.001560) mnt: <-- (00.001561) mnt: [./sys/kernel/debug](27->38) (00.001562) mnt: <-- (00.001563) mnt: [./sys/kernel/config](44->38) (00.001564) mnt: <-- (00.001566) mnt: [./sys/fs/bpf](43->38) (00.001567) mnt: <-- (00.001568) mnt: [./sys/fs/pstore](41->38) (00.001569) mnt: <-- (00.001570) mnt: [./sys/fs/cgroup](40->38) (00.001571) mnt: <-- (00.001572) mnt: [./sys/kernel/security](39->38) (00.001574) mnt: <-- (00.001574) mnt: <-- (00.001575) mnt: [./dev](35->66) (00.001577) mnt: [./dev/hugepages](28->35) (00.001577) mnt: <-- (00.001579) mnt: [./dev/mqueue](25->35) (00.001580) mnt: <-- (00.001581) mnt: [./dev/pts](37->35) (00.001582) mnt: <-- (00.001584) mnt: [./dev/shm](36->35) (00.001585) mnt: <-- (00.001586) mnt: <-- (00.001587) mnt: <-- (00.001593) mnt: The mount 1104 is bind for 1029 (@./run/docker/netns/3c4d73991bca -> @./run/docker/netns/da5fecd57c33) (00.001595) mnt: The mount 1081 is bind for 1029 (@./run/docker/netns/7c3ab68df6f4 -> @./run/docker/netns/da5fecd57c33) (00.001599) net: Collecting netns 2/118005 (00.001745) unix: Collected: ino 382290 peer_ino 375373 family 1 type 1 state 1 name /run/containerd/ REMOVED SOME STUFF HERE BECAUSE OF GITHUB BODY LENGTH LIMIT (00.017868) netlink: Collect netlink sock 0x805 (00.017873) Collecting pidns 1/118005 (00.017899) No parent images directory provided (00.017999) rmrf: removing .criu.temp-aa-policy.NDUM2o (00.018057) ======================================== (00.018066) Dumping task (pid: 78925 comm: pt_main_thread) (00.018067) ======================================== (00.018068) Obtaining task stat ... (00.018085) (00.018086) Collecting mappings (pid: 78925) (00.018087) ---------------------------------------- (00.018443) Handling VMA with the following smaps entry: 200000000-200200000 ---p 00000000 00:00 0 (00.018456) Handling VMA with the following smaps entry: 200200000-200400000 rw-s 00000000 00:06 964 /dev/nvidia0 (00.018475) Error (criu/proc_parse.c:114): handle_device_vma plugin failed: No such file or directory (00.018477) Error (criu/proc_parse.c:629): Can't handle non-regular mapping on 78925's map 200200000 (00.018483) Error (criu/cr-dump.c:1570): Collect mappings (pid: 78925) failed with -1 (00.018524) net: Unlock network (00.018527) Unfreezing tasks into 1 (00.018529) Unseizing 78925 into 1 (00.018549) Error (criu/cr-dump.c:2111): Dumping FAILED. ```
Output of `criu --version`:
``` Version 4.0 ```
Output of `criu check --all`:
``` Looks good. ```
Additional environment details: