checkpoint-restore / criu

Checkpoint/Restore tool
criu.org
Other
2.86k stars 576 forks source link

CRIU 3.18: unittest and zdtm test failed on both x86_64 and arm64 #2225

Open ancientmodern opened 1 year ago

ancientmodern commented 1 year ago

Description

When running make unittest and make test using criu v3.18 on an arm64 VM, both tests unfortunately failed with error I can hardly understood. The same unittest also failed when attempted on an x86 VM, albeit with identical output. The zdtm test on the x86 VM failed on a different test. The detailed information is attached below. Thanks for looking into this :)

Update: They work well when running the simple loop case here.

Steps to reproduce the issue:

  1. Clone criu source code
  2. git checkout v3.18
  3. make unittest
  4. make test or make zdtm && cd test && ./zdtm.py run -a

Describe the results you received:

unit test:

``` make[2]: Nothing to be done for 'criu/string.c'. make[2]: Nothing to be done for 'all'. Error (criu/config.c:127): Unsupported configuration file format. Please consult man page criu(8) Error (criu/config.c:238): Unsupported configuration file format. Please consult man page criu(8) Error (criu/config.c:172): Unsupported configuration file format. Please consult man page criu(8) Error (criu/config.c:238): Unsupported configuration file format. Please consult man page criu(8) OK ```

zdtm test 25 terminated on arm64:

``` === Run 25/454 ---------------- zdtm/static/deleted_unix_sock userns is supported ==================== Run zdtm/static/deleted_unix_sock in h ==================== Start test ./deleted_unix_sock --pidfile=deleted_unix_sock.pid --outfile=deleted_unix_sock.out --filename=deleted_unix_sock.test make: *** [Makefile:490: deleted_unix_sock.pid] Error 1 Test zdtm/static/deleted_unix_sock FAIL at ['make', '--no-print-directory', '-C', 'zdtm/static', 'deleted_unix_sock.pid'] Test output: ================================ 22:40:23.134: 38: ERR: fs.c:106: access check for bit X for current dir path '/root/criu/test/zdtm/static' failed for uid:18943,gid:58467, error: 13(Permission denied). Bit 'x' should be set in all path components of this directory 22:40:23.135: 38: ERR: unix.c:11: failed to get current working directory with valid permissions. 22:40:23.135: 37: ERR: test.c:320: Test exited unexpectedly with code 1 <<< ================================ ##################################### FAIL ##################################### ```

zdtm test 20 show error on arm64 (but PASS):

``` === Run 20/454 ---------------- zdtm/static/timens_nested userns is supported ===================== Run zdtm/static/timens_nested in uns ===================== Start test Test is SUID ./timens_nested --pidfile=timens_nested.pid --outfile=timens_nested.out Run criu dump =[log]=> dump/zdtm/static/timens_nested/46/1/dump.log ------------------------ grep Error ------------------------ b'(00.000777) Will take mnt namespace in the image' b'(00.000778) Add mnt ns 14 pid 46' b'(00.000780) Will take user namespace in the image' b'(00.000780) Add user ns 15 pid 46' b"(00.000806) Error (criu/namespaces.c:460): Can't dump nested time namespace for 51" b"(00.000807) Error (criu/namespaces.c:727): Can't make timens id" b'(00.000810) net: Unlock network' b'(00.000811) Running network-unlock scripts' b'iptables-restore: line 6 failed' b'(00.001692) Error (criu/util.c:643): exited, status=1' b'ip6tables-restore: line 6 failed' b'(00.002581) Error (criu/util.c:643): exited, status=1' b'(00.002587) Unfreezing tasks into 1' b'(00.002588) \tUnseizing 46 into 1' b'(00.002593) \tUnseizing 50 into 1' b'(00.002599) \tUnseizing 51 into 1' b'(00.002607) Error (criu/cr-dump.c:2093): Dumping FAILED.' ------------------------ ERROR OVER ------------------------ Send the 15 signal to 46 Wait for zdtm/static/timens_nested(46) to die for 0.100000 Removing dump/zdtm/static/timens_nested/46 ===================== Test zdtm/static/timens_nested PASS ====================== ====================== Run zdtm/static/timens_nested in h ====================== Start test Test is SUID ./timens_nested --pidfile=timens_nested.pid --outfile=timens_nested.out Run criu dump =[log]=> dump/zdtm/static/timens_nested/91/1/dump.log ------------------------ grep Error ------------------------ b'(00.000658) Collected (4 attempts, 0 in_progress)' b'(00.000661) Collected 92 in 1 state' b'(00.000667) Collected (3 attempts, 0 in_progress)' b'(00.000668) Collected 91 in 1 state' b"(00.000691) Error (criu/namespaces.c:460): Can't dump nested time namespace for 92" b"(00.000692) Error (criu/namespaces.c:727): Can't make timens id" b'(00.000694) net: Unlock network' b'(00.000695) Unfreezing tasks into 1' b'(00.000696) \tUnseizing 91 into 1' b'(00.000706) \tUnseizing 92 into 1' b'(00.000709) Error (criu/cr-dump.c:2093): Dumping FAILED.' ------------------------ ERROR OVER ------------------------ Send the 15 signal to 91 Wait for zdtm/static/timens_nested(91) to die for 0.100000 Removing dump/zdtm/static/timens_nested/91 ===================== Test zdtm/static/timens_nested PASS ====================== ===================== Run zdtm/static/timens_nested in ns ====================== Start test Test is SUID ./timens_nested --pidfile=timens_nested.pid --outfile=timens_nested.out Run criu dump =[log]=> dump/zdtm/static/timens_nested/137/1/dump.log ------------------------ grep Error ------------------------ b'(00.000769) Will take time namespace in the image' b'(00.000770) Add time ns 13 pid 137' b'(00.000773) Will take mnt namespace in the image' b'(00.000774) Add mnt ns 14 pid 137' b"(00.000800) Error (criu/namespaces.c:460): Can't dump nested time namespace for 142" b"(00.000801) Error (criu/namespaces.c:727): Can't make timens id" b'(00.000804) net: Unlock network' b'(00.000804) Running network-unlock scripts' b'iptables-restore: line 6 failed' b'(00.001686) Error (criu/util.c:643): exited, status=1' b'ip6tables-restore: line 6 failed' b'(00.002563) Error (criu/util.c:643): exited, status=1' b'(00.002569) Unfreezing tasks into 1' b'(00.002571) \tUnseizing 137 into 1' b'(00.002591) \tUnseizing 141 into 1' b'(00.002596) \tUnseizing 142 into 1' b'(00.002603) Error (criu/cr-dump.c:2093): Dumping FAILED.' ------------------------ ERROR OVER ------------------------ Send the 15 signal to 137 Wait for zdtm/static/timens_nested(137) to die for 0.100000 Removing dump/zdtm/static/timens_nested/137 ===================== Test zdtm/static/timens_nested PASS ====================== ```

zdtm test 93 terminated on x86_64:

``` === Run 93/454 ===------------- zdtm/static/socket-tcp-closed-last-ack userns is supported =============== Run zdtm/static/socket-tcp-closed-last-ack in ns =============== Start test Test is SUID ./socket-tcp-closed-last-ack --pidfile=socket-tcp-closed-last-ack.pid --outfile=socket-tcp-closed-last-ack.out State Recv-Q Send-Q Local Address:Port Peer Address:Port Process LAST-ACK 1 1 127.0.0.1:8880 127.0.0.1:60974 TIME-WAIT 0 0 127.0.0.1:60974 127.0.0.1:8880 State Recv-Q Send-Q Local Address:Port Peer Address:Port Process LAST-ACK 1 1 127.0.0.1:8880 127.0.0.1:60974 Check TCP images Running zdtm/static/socket-tcp-closed-last-ack.hook(--post-start) Running zdtm/static/socket-tcp-closed-last-ack.hook(--pre-dump) Run criu dump Running zdtm/static/socket-tcp-closed-last-ack.hook(--pre-restore) Run criu restore Running zdtm/static/socket-tcp-closed-last-ack.hook(--post-restore) Send the 15 signal to 94 Wait for zdtm/static/socket-tcp-closed-last-ack(94) to die for 0.100000 ####### Test zdtm/static/socket-tcp-closed-last-ack FAIL at result check ####### Test output: ================================ iptables: Bad rule (does a matching rule exist in that chain?). <<< ================================ Running zdtm/static/socket-tcp-closed-last-ack.hook(--clean) ```

CRIU logs and information:

Output of `criu --version`:

``` root at ubuntu-arm:~/criu/test [(v3.18)]# criu --version Version: 3.18 GitID: v3.18-42-gf018893d2 haorong at haorong:/scratch/criu-x86 [(v3.18)]$ criu --version Version: 3.18 GitID: v3.18-31-gc6ee1ba05 ```

Output of `criu check --all`:

``` root at ubuntu-arm:~/criu/test [(v3.18)]# criu check --all Warn (criu/kerndat.c:1153): Stale /run/criu.kdat file Warn (criu/cr-check.c:822): Dirty tracking is OFF. Memory snapshot will not work. Warn (criu/cr-check.c:1255): Do not have API to map vDSO - will use mremap() to restore vDSO Warn (criu/cr-check.c:1175): CRIU built without CONFIG_COMPAT - can't C/R compatible tasks Looks good but some kernel features are missing which, depending on your process tree, may cause dump or restore failure. haorong at haorong:/scratch/criu-x86 [(v3.18)]$ sudo criu check --all Warn (criu/kerndat.c:1153): Stale /run/criu.kdat file sudo: mon_handle_sigchld: waitpid: No child processes Looks good. ```

Additional environment details: arm64 kernel config: arm64-config-5.15.0-76-generic.txt

x86_64 kernel config: x86_64-config-5.19.0-43-generic.txt

rst0git commented 1 year ago
Error (criu/config.c:127): Unsupported configuration file format. Please consult man page criu(8)
Error (criu/config.c:238): Unsupported configuration file format. Please consult man page criu(8)
Error (criu/config.c:172): Unsupported configuration file format. Please consult man page criu(8)
Error (criu/config.c:238): Unsupported configuration file format. Please consult man page criu(8)
OK

These "error" messages above are expected. We are testing if CRIU would fail if an invalid configuration format is used. The message OK at the end indicates that all unit tests have passed: https://github.com/checkpoint-restore/criu/blob/a4bb3f9a063cea2aede654317c53b35b608d042e/criu/unittest/unit.c#L146

The same applies for the timens_nested ZDMT test.

b'(00.000709) Error (criu/cr-dump.c:2093): Dumping FAILED.'
------------------------ ERROR OVER ------------------------
Send the 15 signal to  91
Wait for zdtm/static/timens_nested(91) to die for 0.100000
Removing dump/zdtm/static/timens_nested/91
===================== Test zdtm/static/timens_nested PASS ======================

The crfail flag in the timens_nested.desc file is used to specify that we expect CRIU to fail with this test: https://github.com/checkpoint-restore/criu/blob/a4bb3f9a063cea2aede654317c53b35b608d042e/test/zdtm/static/timens_nested.desc#L1

ancientmodern commented 1 year ago

@rst0git Thanks Radostin! I forgot that some zdtm cases are meant to fail. Makes sense.

Any idea about the test case that actually "failed"? As I run make test under root, the permission error in arm64 seems odd to me. Also I could not quite understand the error in x86_64 :(

Snorch commented 1 year ago

access check for bit X for current dir path normally that can be fixed by putting your criu directory to /criu instead of /root/criu, e.g. note the difference

[ ~]# ls /criu -ld
drwxr-xr-x 16 root root 4096 Jun  2 05:37 /criu
[ ~]# ls /root -ld
dr-xr-x--- 11 root root 4096 Jul  5 07:47 /root
ancientmodern commented 1 year ago

access check for bit X for current dir path normally that can be fixed by putting your criu directory to /criu instead of /root/criu, e.g. note the difference

[ ~]# ls /criu -ld
drwxr-xr-x 16 root root 4096 Jun  2 05:37 /criu
[ ~]# ls /root -ld
dr-xr-x--- 11 root root 4096 Jul  5 07:47 /root

Thank you Pavel, it works! Now the arm64 version fails with a similar iptables error to x86_64 in test case 59:

=== Run 59/454 ==-------------- zdtm/static/socket-tcp-syn-sent
userns is supported
================== Run zdtm/static/socket-tcp-syn-sent in uns ==================
Start test
Test is SUID
Running zdtm/static/socket-tcp-syn-sent.hook(--post-start)
./socket-tcp-syn-sent --pidfile=socket-tcp-syn-sent.pid --outfile=socket-tcp-syn-sent.out
Running zdtm/static/socket-tcp-syn-sent.hook(--pre-dump)
State                Recv-Q             Send-Q                         Local Address:Port                            Peer Address:Port             Process
LISTEN               0                  1                                    0.0.0.0:8880                                 0.0.0.0:*
SYN-SENT             0                  1                                  127.0.0.1:59903                              127.0.0.1:8880
Run criu dump
Running zdtm/static/socket-tcp-syn-sent.hook(--pre-restore)
Run criu restore
Running zdtm/static/socket-tcp-syn-sent.hook(--post-restore)
State                Recv-Q             Send-Q                         Local Address:Port                            Peer Address:Port             Process
LISTEN               0                  1                                    0.0.0.0:8880                                 0.0.0.0:*
SYN-SENT             0                  1                                  127.0.0.1:59903                              127.0.0.1:8880
Check TCP images
Send the 15 signal to  112
Wait for zdtm/static/socket-tcp-syn-sent(112) to die for 0.100000
########## Test zdtm/static/socket-tcp-syn-sent FAIL at result check ###########
Test output: ================================
iptables: Bad rule (does a matching rule exist in that chain?).

 <<< ================================
Running zdtm/static/socket-tcp-syn-sent.hook(--clean)
##################################### FAIL #####################################
Snorch commented 1 year ago

@ancientmodern this error seems real, you can try to add some debug to confirm that iptables rule disappears:

diff --git a/test/zdtm/static/socket-tcp-syn-sent.c b/test/zdtm/static/socket-tcp-syn-sent.c
index 8265d5fb7..a8488688a 100644
--- a/test/zdtm/static/socket-tcp-syn-sent.c
+++ b/test/zdtm/static/socket-tcp-syn-sent.c
@@ -93,9 +93,13 @@ int main(int argc, char **argv)
                return -1;
        }

+       system("iptables -w -t filter -L");
+
        test_daemon();
        test_waitsig();

+       system("iptables -w -t filter -L");
+
        snprintf(cmd, sizeof(cmd), "iptables -w -t filter --protocol tcp -D INPUT --dport %d -j DROP", port);
        if (system(cmd))
                return -1;
github-actions[bot] commented 1 year ago

A friendly reminder that this issue had no activity for 30 days.