checkpoint-restore / criu

Checkpoint/Restore tool
criu.org
Other
2.79k stars 565 forks source link

compel: Add support for ppc64le scv syscalls #2261

Closed ymanton closed 10 months ago

ymanton commented 10 months ago

Power ISA 3.0 added a new syscall instruction. Kernel 5.9 added corresponding support.

https://github.com/torvalds/linux/commit/912237ea166428edcbf3c137adf12cb987c477f2 https://github.com/torvalds/linux/commit/4e0e45b07d790253643ee05300784ab2156e2d5e https://github.com/torvalds/linux/commit/7fa95f9adaee7e5cbb195d3359741120829e488b https://github.com/torvalds/linux/commit/5665bc35c1ed917ac8fd06cb651317bb47a65b10

Add CRIU support to recognize the new instruction and kernel ABI changes to properly dump and restore threads executing in syscalls. Without this change threads executing in syscalls using the scv instruction will not be restored to re-execute the syscall, they will be restored to execute the following instruction and will return unexpected error codes (ERESTARTSYS, etc) to user code.

Problem is caught by zdtm/static/sigpending if you run it on a Power 10 in RHEL 9 or UBI 9 because a libpthreads build that will use the new syscall is present.

Output without this patch is:

$ sudo ./zdtm.py run -t zdtm/static/sigpending -f h
userns is supported
=== Run 1/1 ================ zdtm/static/sigpending
======================= Run zdtm/static/sigpending in h ========================
Start test
./sigpending --pidfile=sigpending.pid --outfile=sigpending.out
Run criu dump
=[log]=> dump/zdtm/static/sigpending/56/1/dump.log
------------------------ grep Error ------------------------
b"(00.001168) \tSeizing 56's 57 thread"
b'(00.001273) seccomp: Collected tid_real 57 mode 0'
b'(00.001298) Collected (3 attempts, 0 in_progress)'
b'(00.001325) Seized task 58, state 0'
b'(00.001333) Warn  (compel/src/lib/infect.c:133): Unable to interrupt task: 58 (Operation not permitted)'
------------------------ ERROR OVER ------------------------
Run criu restore
Send the 15 signal to  56
Wait for zdtm/static/sigpending(56) to die for 0.100000
tail: cannot open 'zdtm/static/sigpending.out' for reading: No such file or directory
==================== zdtm/static/sigpending.out.inprogress =====================
The futex facility returned an unexpected error code.

==================== zdtm/static/sigpending.out.inprogress =====================
############### Test zdtm/static/sigpending FAIL at result check ###############
Test output: ================================
The futex facility returned an unexpected error code.

 <<< ================================
##################################### FAIL #####################################

Error message comes from https://github.com/bminor/glibc/blob/a43003ebf674f7af8c4b8d6d1b682244f1a28719/nptl/futex-internal.c#L119

With the patch it passes. Also checked on a Power 9 machine that does not use the new syscalls and it still works with the patch.

The only question I can't answer definitively is whether we need to do anything to prevent double restarts. Kernel changed from clearing pt_regs.trap to setting a bit 0x10 instead to remember that syscall is going to be restarted to not rewind NIP more than once, in case multiple signals arrive at the same time.

CRIU always clears .trap before checkpointing the registers. Is that still the correct behaviour?

I don't think CRIU needs to coordinate with the kernel to prevent double restarts since on restore if a signal arrives it should not try to restart the syscall we were executing at checkpoint time, it should just return control to user space so the thread can enter the syscall naturally, so clearing .trap seems like it should do the right thing regardless of how the kernel is managing double restarts.

codecov-commenter commented 10 months ago

Codecov Report

Patch coverage: 66.66% and project coverage change: -0.25% :warning:

Comparison is base (82bfb67) 70.67% compared to head (1d9f88d) 70.42%. Report is 3 commits behind head on criu-dev.

:exclamation: Current head 1d9f88d differs from pull request most recent head 62b2ea5. Consider uploading reports for the commit 62b2ea5 to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## criu-dev #2261 +/- ## ============================================ - Coverage 70.67% 70.42% -0.25% ============================================ Files 133 132 -1 Lines 33330 33503 +173 ============================================ + Hits 23556 23595 +39 - Misses 9774 9908 +134 ``` | [Files Changed](https://app.codecov.io/gh/checkpoint-restore/criu/pull/2261?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=checkpoint-restore) | Coverage Δ | | |---|---|---| | [criu/include/parasite.h](https://app.codecov.io/gh/checkpoint-restore/criu/pull/2261?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=checkpoint-restore#diff-Y3JpdS9pbmNsdWRlL3BhcmFzaXRlLmg=) | `100.00% <ø> (ø)` | | | [criu/kerndat.c](https://app.codecov.io/gh/checkpoint-restore/criu/pull/2261?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=checkpoint-restore#diff-Y3JpdS9rZXJuZGF0LmM=) | `57.83% <62.50%> (ø)` | | | [criu/parasite-syscall.c](https://app.codecov.io/gh/checkpoint-restore/criu/pull/2261?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=checkpoint-restore#diff-Y3JpdS9wYXJhc2l0ZS1zeXNjYWxsLmM=) | `86.06% <100.00%> (+0.04%)` | :arrow_up: | ... and [20 files with indirect coverage changes](https://app.codecov.io/gh/checkpoint-restore/criu/pull/2261/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=checkpoint-restore)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

avagin commented 10 months ago

LGTM

@ymanton do we have a test that trigger the issue (fails without this change??

@mihalicyn could you run tests on ppc with this patch?

ymanton commented 10 months ago

@ymanton do we have a test that trigger the issue (fails without this change??

Yes, zdtm/static/sigpending will trigger it if you have a new kernel and new libpthreads (RHEL9, UBI9) and are running on a P10.

The minimum requirements are a P9, new kernel, and some hand-written assembly to call a blocking syscall via scv on a background thread.

mihalicyn commented 10 months ago

Dear @ymanton,

thanks for working on this!

The minimum requirements are a P9, new kernel, and some hand-written assembly to call a blocking syscall via scv on a background thread.

that explains why we didn't reproduce it on our test VM with:

[root@criu criu]# cat /proc/cpuinfo 
processor   : 0
cpu     : POWER8 (architected), altivec supported
clock       : 3425.000000MHz
revision    : 2.1 (pvr 004b 0201)

timebase    : 512000000
platform    : pSeries
model       : IBM pSeries (emulated by qemu)
machine     : CHRP IBM pSeries (emulated by qemu)
MMU     : Hash
[root@criu criu]# uname -a
Linux criu.novalocal 6.5.3-300.fc39.ppc64le #1 SMP Wed Sep 13 12:19:24 UTC 2023 ppc64le GNU/Linux