checkpoint-restore / criu

Checkpoint/Restore tool
criu.org
Other
2.99k stars 599 forks source link

VDSO errors on ppc64le on 5.11.x #1417

Open adrianreber opened 3 years ago

adrianreber commented 3 years ago

While trying to debug #1415 I updated to a newer kernel (5.11.7-100.fc32.ppc64le) and now CRIU completely fails on ppc64le:

# criu/criu check -v4
(00.000002) Version: 3.15 (gitid v3.14-340-g480605824)
(00.000012) Running on ibm-p9z-18-lp5.virt.pnr.lab.eng.rdu2.redhat.com Linux 5.11.7-100.fc32.ppc64le #1 SMP Wed Mar 17 18:39:54 UTC 2021 ppc64le
(00.000024) File /run/criu.kdat does not exist
(00.000044) sockets: Probing sock diag modules
(00.000095) sockets: Done probing
(00.003082) Pagemap is fully functional
(00.003113) Found anon-shmem device at 1
(00.003120) Reset 1201's dirty tracking
(00.003367)  ... done
(00.003455) Dirty track supported on kernel
(00.003534) Found task size of 2000000000000
(00.007536) Restoring netdev veth idx 10
(00.007789) Dumping netns links
(00.007816)     LD: Got link 1, type 772
(00.007824)     LD: Got link 10, type 1
(00.008982) vdso: Parsing at 7fffb9070000 7fffb9090000
(00.008988) Error (criu/pie-util-vdso.c:97): vdso: ELF header magic mismatch
(00.008991) Error (criu/vdso.c:634): vdso: Failed to fill self vdso symtable
(00.008997) Error (criu/kerndat.c:1152): kerndat_vdso_fill_symtable failed when initializing kerndat.
(00.009096) Adjust mmap_min_addr 0x1000 -> 0x10000
(00.009101) Found mmap_min_addr 0x10000
(00.009113) files stat: fs/nr_open 1073741816
(00.009118) Error (criu/crtools.c:213): Could not initialize kernel features detection.

The error looks the same if booting with vdso=0 or without.

I added some debug code to see why it fails and read ehdr->e_ident is different from the expected.

CRIU expects 7F 45 4C 46 2 1 1 0 0 0 0 0 0 0 0 0 but CRIU actually reads 53 59 53 54 45 4D 43 46 47 3A 50 50 43 36 34 0

@0x7f454c46 maybe you have and idea. This was still working on 5.8.18-100.fc31.ppc64le

@mihalicyn just FYI if you are planning to upgrade the Jenkins hosts you might run into this error.

rppt commented 3 years ago
ldu4 commented 3 years ago

That may be tied to the switch of the powerpc VDSO to generic C implementation: https://lore.kernel.org/linuxppc-dev/cover.1604426550.git.christophe.leroy@csgroup.eu/

ldu4 commented 3 years ago

BTW, 53 59 53 54 45 4D 43 46 47 3A 50 50 43 36 34 0 = "SYSTEMCFG:PPC64" and 7F 45 4C 46 2 1 1 0 0 0 0 0 0 0 0 0 = "ELF!"

chleroy commented 3 years ago

Probably linked to commit https://github.com/linuxppc/linux/commit/511157ab641eb6bedd00d62673388e78a4f871cf which has put the data page up front like most other architectures.

ldu4 commented 3 years ago

This can be managed by reading the aux vector AT_SYSINFO_EHDR. I wrote a patch ldu4/criu@e870e1434648 which makes criu check happy.

@adrianreber could you give it a try?

0x7f454c46 commented 3 years ago

Hi guys, @chleroy @ldu4, what do you think about this kernel patch? https://github.com/0x7f454c46/linux/commit/783c7a2532d2219edbcf555cc540eab05f698d2a (untested, working on it, I appreciate if someone helps with it)

0x7f454c46 commented 3 years ago

Hi @ldu4, one thing I find a bit strange about your patch is that you fixed vdso_fill_self_symtable(), but probably restorer has to be fixed too: it will read vdso vma from images, but than it will try to parse it in vdso_proxify(). Where it will try to parse ELF at the start of the VMA.

ldu4 commented 3 years ago

Hi @0x7f454c46, I probably missed other part, I did that in hurry and didn't get more time to do C/R test. I'll investigate on that. That being said, I think your kernel patch introducing the vvar mapping for PowerPC makes sense.

github-actions[bot] commented 3 years ago

A friendly reminder that this issue had no activity for 30 days.

adrianreber commented 3 years ago

Looks like the kernel changes made it to 5.13, 5.11.20 and 5.12.3