checkpoint-restore / criu

Checkpoint/Restore tool
criu.org
Other
2.86k stars 576 forks source link

port to riscv64 #2234

Open ancientmodern opened 1 year ago

ancientmodern commented 1 year ago

Add riscv64 support to CRIU. Hopefully solve issue #1702

Present status of zdtm tests:

################### 6 TEST(S) FAILED (TOTAL 454/SKIPPED 46) ####################
 * zdtm/static/apparmor_stacking(h)
 * zdtm/static/netns_lock_iptables(h)
 * zdtm/static/socket-tcp-closed-last-ack(uns)
 * zdtm/static/socket-tcp-nfconntrack(h)
 * zdtm/static/socket-tcp-syn-sent(uns)
 * zdtm/transition/maps007(unknown)
##################################### FAIL #####################################

@mihalicyn @felicitia for you reference

WIP: I'll enrich this pull request with more details by tomorrow. And some sections might have been affected when I removed debug info from our code :)

mihalicyn commented 1 year ago

If you did this work together with Yixue, you can add Co-authored-by to the commit messages. https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors

Great work! I'll take a look and review this.

ancientmodern commented 1 year ago

If you did this work together with Yixue, you can add Co-authored-by to the commit messages. https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors

Great work! I'll take a look and review this.

Thank you Alex! I will make the updates with Yixue tomorrow. If you'd like to test it locally, this branch should be functional with the kernel patch below, though with tons of messy commits and debugging stuff 😂

Patch:

---
 arch/riscv/kernel/signal.c | 71 +++++++++++++++++++++++++++++++-------
 1 file changed, 58 insertions(+), 13 deletions(-)

diff --git a/arch/riscv/kernel/signal.c b/arch/riscv/kernel/signal.c
index 9aff9d720590..01ccb949d26e 100644
--- a/arch/riscv/kernel/signal.c
+++ b/arch/riscv/kernel/signal.c
@@ -16,6 +16,7 @@

 #include <asm/ucontext.h>
 #include <asm/vdso.h>
+#include <asm/syscall.h>
 #include <asm/signal.h>
 #include <asm/signal32.h>
 #include <asm/switch_to.h>
@@ -284,35 +285,79 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)

 void arch_do_signal_or_restart(struct pt_regs *regs)
 {
+   unsigned long continue_addr = 0, restart_addr = 0;
+   int retval = 0;
    struct ksignal ksig;
-
-   if (get_signal(&ksig)) {
-       /* Actually deliver the signal */
-       handle_signal(&ksig, regs);
-       return;
-   }
+   bool syscall = (regs->cause == EXC_SYSCALL);

    /* Did we come from a system call? */
-   if (regs->cause == EXC_SYSCALL) {
+   if (syscall) {
+       continue_addr = regs->epc;
+       restart_addr = continue_addr - 4;
+       retval = regs->a0;
+
        /* Avoid additional syscall restarting via ret_from_exception */
        regs->cause = -1UL;

-       /* Restart the system call - no handlers present */
-       switch (regs->a0) {
+       /*
+        * Prepare for system call restart. We do this here so that a
+        * debugger will see the already changed PC.
+        */
+       switch (retval) {
        case -ERESTARTNOHAND:
        case -ERESTARTSYS:
        case -ERESTARTNOINTR:
-                        regs->a0 = regs->orig_a0;
-           regs->epc -= 0x4;
+           regs->a0 = regs->orig_a0;
+           regs->epc = restart_addr;
            break;
        case -ERESTART_RESTARTBLOCK:
-                        regs->a0 = regs->orig_a0;
+           regs->a0 = regs->orig_a0;
            regs->a7 = __NR_restart_syscall;
-           regs->epc -= 0x4;
+           regs->epc = restart_addr;
            break;
        }
    }

+   // printk("~RISCV~ Before get_signal: a7 = %ld, a0 = %ld, orig_a0 = %ld, cause = %ld\n",
+   //        regs->a7, regs->a0, regs->orig_a0, regs->cause);
+
+   /*
+    * Get the signal to deliver. When running under ptrace, at this point
+    * the debugger may change all of our registers.
+    */
+   if (get_signal(&ksig)) {
+       /*
+        * Depending on the signal settings, we may need to revert the
+        * decision to restart the system call, but skip this if a
+        * debugger has chosen to restart at a different PC.
+        */
+       if (regs->epc == restart_addr &&
+           (retval == -ERESTARTNOHAND ||
+            retval == -ERESTART_RESTARTBLOCK ||
+            (retval == -ERESTARTSYS &&
+             !(ksig.ka.sa.sa_flags & SA_RESTART)))) {
+           syscall_set_return_value(current, regs, -EINTR, 0);
+           regs->epc = continue_addr;
+       }
+
+       /* Actually deliver the signal */
+       handle_signal(&ksig, regs);
+       return;
+   }
+
+   // printk("~RISCV~ After get_signal: a7 = %ld, a0 = %ld, orig_a0 = %ld, cause = %ld\n",
+   //        regs->a7, regs->a0, regs->orig_a0, regs->cause);
+
+   /*
+    * Handle restarting a different system call. As above, if a debugger
+    * has chosen to restart at a different PC, ignore the restart.
+    */
+   if (syscall && regs->epc == restart_addr) {
+       if (retval == -ERESTART_RESTARTBLOCK)
+           regs->a7 = __NR_restart_syscall;
+       // user_rewind_single_step(current);
+   }
+
    /*
     * If there is no signal to deliver, we just put the saved
     * sigmask back.

Copied from my gitter msg:

Sorry for this very late reply as I forgot to turn on gitter notification :( But I wanted to share an exciting update about the development of the RISC-V version of CRIU. It now only fails 7 out of 454 zdtm tests! Here are the final results of ./zdtm.py run -a --keep-going:

################### 7 TEST(S) FAILED (TOTAL 454/SKIPPED 46) ####################
 * zdtm/static/apparmor_stacking(h)
 * zdtm/static/cmdlinenv00(ns)
 * zdtm/static/netns_lock_iptables(h)
 * zdtm/static/socket-tcp-closed-last-ack(uns)
 * zdtm/static/socket-tcp-nfconntrack(h)
 * zdtm/static/socket-tcp-syn-sent(uns)
 * zdtm/transition/maps007(unknown)
##################################### FAIL #####################################

I think failed tests related to iptables and sockets might be due to some disabled kernel configs. I haven't confirmed this yet, as compiling an Ubuntu kernel locally takes quite a bit of time on my side :( I've also set up the cross-compile CI for RISC-V. As riscv64 is not an official stable arch in Debian yet (but it will be soon), I had to take a strange approach using Ubuntu ports for riscv64 cross-compiling. It now works with some dirty work in the Dockerfile. The major problem right now is a critical bug related to ptrace and syscalls in riscv kernel. You can find more details here. I'm going to finalize and send the patch (changing arch_do_signal_or_restart() to arm64 style) to LKML, but I'm not sure if this is a proper solution and how long it will take. Unfortunately, I cannot think of any workarounds (the orig_a0 approach also relies on kernel change, and it seems not reasonable according to the root cause). As it stands, without this fix, kernels <= 6.4 might only handle naive jobs, such as simple loops. I'm planning to create a pull request soon, and I'll focus on organizing the commits and changes and fixing any potential issues (including the failed tests and CI setup). I'm really looking forward to your feedback and suggestions!

codecov-commenter commented 1 year ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 70.23%. Comparing base (cb39c62) to head (5f79f37).

:exclamation: Current head 5f79f37 differs from pull request most recent head 91d3cfd. Consider uploading reports for the commit 91d3cfd to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## criu-dev #2234 +/- ## ============================================ + Coverage 70.21% 70.23% +0.02% ============================================ Files 132 131 -1 Lines 34372 34370 -2 ============================================ + Hits 24134 24141 +7 + Misses 10238 10229 -9 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

ancientmodern commented 1 year ago

Update: The zdtm/static/cmdlinenv00 test now passes after syncing AT_VECTOR_SIZE to the correct value (64) in the riscv Linux kernel, and zdtm/static/apparmor_stacking should be fixed with #2235 :)

@mihalicyn, I have a small question about running make lint locally. I'm experiencing many more errors with flake8 on my side, even after reverting to the version used by CI (flake8 == 5.0.3). Do you have any idea what might be causing this difference? Thanks!

haorong at haorong:/criu [riscv64-upstream]$ make lint
flake8 --version
5.0.3 (mccabe: 0.7.0, pycodestyle: 2.9.1, pyflakes: 2.5.0, tryceratops: 2.3.2) CPython 3.10.4 on Linux
flake8 --config=scripts/flake8.cfg test/zdtm.py
test/zdtm.py:492:13: TRY200 Use 'raise from' to specify exception cause
test/zdtm.py:550:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:590:17: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:648:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:979:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:981:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:983:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:998:21: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:1276:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:1489:17: TRY300 Consider moving this statement to an 'else' block
test/zdtm.py:1664:17: TRY201 Simply use 'raise' without specifying exception object again
test/zdtm.py:1687:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:1700:17: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:1713:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:1722:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:1937:21: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:2011:17: TRY002 Create your own exception
test/zdtm.py:2040:13: TRY002 Create your own exception
test/zdtm.py:2113:17: TRY201 Simply use 'raise' without specifying exception object again
test/zdtm.py:2230:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:2363:13: TRY002 Create your own exception
test/zdtm.py:2363:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:2368:13: TRY002 Create your own exception
test/zdtm.py:2368:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:2374:13: TRY002 Create your own exception
test/zdtm.py:2374:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:2628:9: TRY300 Consider moving this statement to an 'else' block
make: *** [Makefile:439: lint] Error 1
github-actions[bot] commented 11 months ago

A friendly reminder that this PR had no activity for 30 days.

ancientmodern commented 10 months ago

Really sorry for the delay. The past two months turned out to be more "fast-paced" than I had anticipated, and the riscv ubuntu VM I've been using for development got messed up after I installed a buggy kernel :( This led me to somewhat negatively sideline this PR and the corresponding kernel patch. Fortunately, I dug out an old VM snapshot from deep in my hard drive and successfully restored it to a dev-ready state today. I'll return to the PR as soon as my work schedule lightens up. Thanks for your patience and continued support! 🙏