checkpoint-restore / criu

Checkpoint/Restore tool
criu.org
Other
2.93k stars 584 forks source link

Error (criu/page-xfer.c:400): page-xfer: No parent image found, though parent directory is set: No such file or directory #1510

Open adrianreber opened 3 years ago

adrianreber commented 3 years ago

CI tests with multiple pre-dumps always produce the following message in the final dump:

Error (criu/page-xfer.c:400): page-xfer: No parent image found, though parent directory is set: No such file or directory

The pre-dump seems to work correctly, but there is still this error message from CRIU. Not sure if it is false error message or if something does not work as expected. I see @rppt as the author. Any ideas if this message is correct? Is it really an error in the pre-dumping?

avagin commented 3 years ago

Looks like a process exited between pre-dump-s.

Snorch commented 3 years ago

Reproduces on my machine:

[root@fedora criu]# ./test/zdtm.py run -t zdtm/static/env00 --pre=2 --flavor=h --ignore-taint
mem_dirty_track is supported
userns is supported
=== Run 1/1 ================ zdtm/static/env00
timens isn't supported on 5.12.6-300.snorch.fc34.x86_64
========================== Run zdtm/static/env00 in h ==========================
Start test
./env00 --pidfile=env00.pid --outfile=env00.out --envname=ENV_00_TEST
Run criu pre-dump
Run criu pre-dump
Run criu dump
=[log]=> dump/zdtm/static/env00/56/3/dump.log
------------------------ grep Error ------------------------
b'(00.002466) Dumping memfd:/dev/zero contents (id 0x1, shmid: 0x12e22, size: 4096)'
b'(00.002474) page-pipe: Create page pipe for 1 segs'
b'(00.002478) page-pipe: Will grow page pipe (iov off is 0)'
b'(00.002539) No pagemap-shmem-77346.img image'
b'(00.002550) Error (criu/page-xfer.c:400): page-xfer: No parent image found, though parent directory is set: No such file or directory'
------------------------ ERROR OVER ------------------------
Run criu restore
Send the 15 signal to  56
Wait for zdtm/static/env00(56) to die for 0.100000
Removing dump/zdtm/static/env00/56
========================= Test zdtm/static/env00 PASS ==========================

Will investigate it.

Snorch commented 3 years ago

There is no such pagemap-shmem in previous images at the moment of failure:

[root@fedora criu]# find test/dump/zdtm/static/env00/56 | grep pagemap
test/dump/zdtm/static/env00/56/1/pagemap-56.img
test/dump/zdtm/static/env00/56/2/pagemap-56.img
test/dump/zdtm/static/env00/56/3/pagemap-shmem-103560.img

Though /proc/pid/maps content does not change.

Stack where we fail:

#3  0x000000000045a5ca in do_open_image (img=img@entry=0xfcc280, dfd=dfd@entry=20, type=type@entry=35, oflags=oflags@entry=0, 
    path=path@entry=0x7ffd32e22dd0 "pagemap-shmem-103560.img") at criu/image.c:455
#4  0x000000000045a8c8 in open_image_at (dfd=dfd@entry=20, type=35, flags=140725457141440, flags@entry=0) at criu/image.c:354
#5  0x000000000047d323 in open_page_read_at (dfd=dfd@entry=20, img_id=img_id@entry=103560, pr=0xfc7470, pr_flags=<optimized out>, pr_flags@entry=1) at criu/pagemap.c:820
#6  0x0000000000479541 in open_page_local_xfer (xfer=0x7ffd32e23f90, fd_type=<optimized out>, img_id=103560) at criu/page-xfer.c:398
#7  0x000000000047a01c in open_page_xfer (xfer=xfer@entry=0x7ffd32e23f90, fd_type=fd_type@entry=35, img_id=<optimized out>) at criu/page-xfer.c:430
#8  0x000000000048eeb9 in do_dump_one_shmem (fd=fd@entry=15, addr=addr@entry=0x7ff362c58000, si=si@entry=0x7ffd32e24010) at criu/shmem.c:724
#9  0x000000000048fb63 in dump_one_memfd_shmem (fd=fd@entry=15, shmid=shmid@entry=103560, size=4096) at criu/shmem.c:837
#10 0x0000000000465b1a in dump_memfd_inode (st=0x7ffd32e25228, name=0x7ffd32e241c1 "/dev/zero", inode=0xfcf490, fd=15) at criu/memfd.c:86
#11 dump_unique_memfd_inode (st=0x7ffd32e25228, name=0x7ffd32e241c1 "/dev/zero", lfd=<optimized out>) at criu/memfd.c:133
#12 dump_one_memfd (lfd=<optimized out>, id=3, p=0x7ffd32e25210) at criu/memfd.c:167
#13 0x0000000000465ce0 in dump_one_memfd_cond (lfd=<optimized out>, id=<optimized out>, parms=<optimized out>) at criu/memfd.c:187
#14 0x000000000044088d in dump_filemap (vma_area=vma_area@entry=0xfda900, fd=12) at criu/cr-dump.c:423
#15 0x0000000000483c6b in parse_smaps (pid=pid@entry=56, vma_area_list=vma_area_list@entry=0x7ffd32e26610, dump_filemap=dump_filemap@entry=0x440770 <dump_filemap>)
    at criu/proc_parse.c:824
#16 0x00000000004409a7 in collect_mappings (pid=56, vma_area_list=0x7ffd32e26610, dump_file=0x440770 <dump_filemap>) at criu/cr-dump.c:126
#17 0x0000000000441d27 in dump_one_task (parent_ie=0xfd0980, item=0xfc89c0) at criu/cr-dump.c:1269
#18 cr_dump_tasks (pid=<optimized out>) at criu/cr-dump.c:1907
Snorch commented 3 years ago

Here is a helpful improvement to easier debug predumps in zdtm: https://github.com/checkpoint-restore/criu/pull/1536

Snorch commented 3 years ago

In pre_dump_one_task() we call collect_mappings() with argument dump_file == NULL. This means that the lack of this (pagemap-shmem for memfd) image in parent pre-dump directories is expected.

We have three options:

1) We can rework the code so that iterative memory migration for such file-shmem-mappings works and error would go away. 2) We can suppress this error, but I don't think that it worth the effort, because we would probably need to add new "don't print error" argument to too many functions. 3) Convert this error ("No parent image found, though parent directory is set..." to warning unconditionally. 4) We can leave it as is, because there is no functional problem.

I like (1) and (3). @avagin Your opinion?

github-actions[bot] commented 3 years ago

A friendly reminder that this issue had no activity for 30 days.