Open derekbruening opened 3 years ago
Some problems that have been observed include:
A crash in qemu:
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
An assert in DR:
<Application xxx (zzz). Internal Error: DynamoRIO debug check failure: core/unix/signal.c:1469 rc == 0 IF_VMX86(|| (sig >= 63 && rc == -EINVAL))
Another assert in DR, this one using unmodified versions of everything (the above are local qemu environments; not quite sure what all the local changes are):
$ qemu-arm -L /usr/arm-linux-gnueabihf bin32/drrun -- suite/tests/bin/simple_app
<Application /home/bruening/dr/git/build_a32_dbg_tests/suite/tests/bin/simple_app (4017678) DynamoRIO usage error : Failed to read ELF interpreter headers.>
<Usage error: Failed to read ELF interpreter headers. (/home/bruening/dr/git/src/core/unix/loader.c, line 2000)
version 8.0.18663, custom build
-early_inject -emulate_brk
/home/bruening/dr/git/build_a32_dbg_tests/lib32/debug/libdynamorio.so=0x71000000>
Quick analysis of the final assert above and what happens after it:
DR needs to be passed a root dir for the dynamic loader (just like QEMU's -L). But we have a problem: we don't init DR and its options until after we map the app's interpreter. The comment says we want the app mapped, and we want dr_get_proc_address() to work. Hmm. Maybe split up the init so we only initialize heap and options first? Or maybe it would work out to do full init before mapping the interpreter.
Hardcoding a fix there to see what's next, DR initializes but then hangs trying to take over a non-app thread: a QEMU-added thread?
<Attached to 1/2 threads in application /home/bruening/dr/git/build_a32_dbg_tests/suite/tests/bin/simple_app (4167127)>
Hardcoding ignoring the other thread, we then hit a problem where the app's loader can't find libc. But code cache operation seems to work well as it builds 727 blocks:
$ qemu-arm -L /usr/arm-linux-gnueabihf bin32/drrun -rstats_to_stderr -- suite/tests/bin/simple_app
<Starting application /home/bruening/dr/git/build_a32_dbg_tests/suite/tests/bin/simple_app (4180146)>
<Initial options = -no_dynamic_options -rstats_to_stderr -code_api -stack_size 56K -signal_stack_size 32K -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
<Paste into GDB to debug DynamoRIO clients:
set confirm off
add-symbol-file '/home/bruening/dr/git/build_a32_dbg_tests/lib32/debug/libdynamorio.so' 0x71024300
>
/home/bruening/dr/git/build_a32_dbg_tests/bin32/../lib32/debug/libdynamorio.so: error while loading shared libraries: libc.so.6: cannot open shared object file: No such file or directory
<Stopping application /home/bruening/dr/git/build_a32_dbg_tests/suite/tests/bin/simple_app (4180146)>
DynamoRIO statistics:
Peak threads under DynamoRIO control : 1
Threads ever created : 1
System calls, pre : 139
System calls, post : 137
Application mmaps : 1
Basic block fragments generated : 727
Peak fcache combined capacity (bytes) : 61440
Peak fcache units on live list : 2
Peak fcache units on free list : 2
Peak special heap capacity (bytes) : 4096
Peak heap units on live list : 8
Peak heap units on free list : 5
Peak stack capacity (bytes) : 147456
Peak heap capacity (bytes) : 221184
Peak total memory from OS (bytes) : 696320
Peak vmm blocks for unreachable heap : 91
Peak vmm blocks for stack : 42
Peak vmm blocks for unreachable special heap : 4
Peak vmm blocks for unreachable special mmap : 7
Peak vmm blocks for reachable heap : 1
Peak vmm blocks for cache : 19
Peak vmm blocks for reachable special mmap : 7
Peak vmm virtual memory in use (bytes) : 700416
Running a pure-static app at this point works:
$ qemu-arm -L /usr/arm-linux-gnueabihf bin32/drrun -rstats_to_stderr -- suite/tests/bin/common.allasm_arm
<Starting application /home/bruening/dr/git/build_a32_dbg_tests/suite/tests/bin/common.allasm_arm (4188999)>
<Initial options = -no_dynamic_options -rstats_to_stderr -code_api -stack_size 56K -signal_stack_size 32K -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
<Paste into GDB to debug DynamoRIO clients:
set confirm off
add-symbol-file '/home/bruening/dr/git/build_a32_dbg_tests/lib32/debug/libdynamorio.so' 0x71024300
>
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
All done
<Stopping application /home/bruening/dr/git/build_a32_dbg_tests/suite/tests/bin/common.allasm_arm (4188999)>
DynamoRIO statistics:
Peak threads under DynamoRIO control : 1
Threads ever created : 1
System calls, pre : 2
System calls, post : 1
Basic block fragments generated : 22
Peak fcache combined capacity (bytes) : 8192
Peak fcache units on live list : 2
Peak fcache units on free list : 2
Peak special heap capacity (bytes) : 4096
Peak heap units on live list : 7
Peak heap units on free list : 5
Peak stack capacity (bytes) : 147456
Peak heap capacity (bytes) : 155648
Peak total memory from OS (bytes) : 638976
Peak vmm blocks for unreachable heap : 77
Peak vmm blocks for stack : 42
Peak vmm blocks for unreachable special heap : 4
Peak vmm blocks for unreachable special mmap : 7
Peak vmm blocks for reachable heap : 1
Peak vmm blocks for cache : 19
Peak vmm blocks for reachable special mmap : 7
Peak vmm virtual memory in use (bytes) : 643072
I have local fixes for the above issues and now have drreg-test passing under QEMU for AArch64. (Private libs need the path prefix as well, with precedence over native.) It fails for ARM but it seems to be drreg bugs or sthg.
So this is looking quite viable: it may not be much further work at all to insert qemu into the test commands and set up GA CI. The bulk of the work is going to be dealing with all the broken tests on ARM after several years of unreliable testing.
It is a little slow so we may want to pare down the longer tests but we can deal with that when we hit it.
This is all with:
$ qemu-arm --version
qemu-arm version 5.2.0 (Debian 1:5.2+dfsg-3)
The crash and signal assert listed up top are for a different version. I'll try to dig out the details there: there are layers in between.
With a local tree that auto-adds qemu to test command lines, we have:
AArch64: 34% tests passed, 156 tests failed out of 235 arm: 32% tests passed, 142 tests failed out of 210
So about 1/3 pass. A big portion of the failures are drcachesim and other tool tests where the command line setup is not yet there to insert qemu everywhere. Some failures are timeouts where QEMU is just too slow even with a 4x increase in the time allowed. There are also a bunch of what look like internal QEMU crashes: qemu: unhandled CPU exception 0x10001 - aborting
. And the rest may be DR bugs or QEMU bugs or who knows: it will take some effort.
I think the next step should be to set up GA CI under QEMU with a list of those 1/3 tests, to get some regression tests in place. Going through the rest of the tests can then be done incrementally over time.
I added instructions for running under QEMU at https://github.com/DynamoRIO/dynamorio/wiki/Test-Suite#testing-aarchxx
That link on the new site is: https://dynamorio.org/page_test_suite.html#autotoc_md263
Summarizing the status:
We are able to successfully run 1/3 of our tests under QEMU for both AArch64 and ARM and we are doing so on GA CI.
For the failing tests: as noted above, some are missing infrastructure to set up library paths. But a number are hitting what look like internal QEMU bugs, of two varieties:
unhandled CPU exception 0x10001
We may want to file these in the QEMU tracker.
- QEMU fails with
unhandled CPU exception 0x10001
This one seems to be that QEMU does not handle the WFI
opcode (?!). DR uses that in spinlocks. Passing -spinlock_count_on_SMP 0
is a workaround. We could try adding that to some tests when using QEMU to enable more tests.
With the 0x10001 WFI fix, over a dozen ARM tests not on the suite list pass locally, but fail on the GA CI. They fail mostly with asserts, most frequently a memcache.c assert as shown below. If I could easily reproduce locally I would try to fix it to get more ARM testing. For GA CI: we're on Ubuntu 20. Maybe with an Ubuntu 20 VM I can repro.
$ egrep '\*Failed|Internal E' 'arm-cross-compile/6_Run Suite.txt'
2021-06-18T21:52:47.7467537Z 21: <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/lib32/debug/libdynamorio.so (13622). Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/os.c:9368 iter->vm_start == executable_start
2021-06-18T21:52:47.7475281Z 10/83 Test #21: code_api|linux.execve-rec .......................***Failed Required regular expression not found. Regex=[^under DynamoRIO
2021-06-18T21:52:47.7490146Z <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/lib32/debug/libdynamorio.so (13622). Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/os.c:9368 iter->vm_start == executable_start
2021-06-18T21:52:47.7539416Z 16/83 Test #27: code_api|linux.prctl ............................***Failed Required regular expression not found. Regex=[^basename argv\[0\]: linux\.prctl
2021-06-18T21:54:04.7387119Z 65: <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/linux.eintr (13785). Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:04.7393515Z 32/83 Test #65: code_api|linux.eintr ............................***Failed Required regular expression not found. Regex=[^sending SIGURG
2021-06-18T21:54:04.7397267Z <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/linux.eintr (13785). Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:04.7409629Z 66: <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/linux.eintr (13791). Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:04.7418416Z 33/83 Test #66: code_api|linux.eintr-noinline ...................***Failed Required regular expression not found. Regex=[^sending SIGURG
2021-06-18T21:54:04.7422933Z <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/linux.eintr (13791). Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:04.7435522Z 67: <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/linux.sigsuspend (13799). Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:04.7483065Z 34/83 Test #67: code_api|linux.sigsuspend .......................***Failed Required regular expression not found. Regex=[^sending SIGUSR1
2021-06-18T21:54:04.7487322Z <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/linux.sigsuspend (13799). Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:20.9363858Z 68: <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/linux.signest (13806). Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:20.9371414Z 35/83 Test #68: code_api|linux.signest ..........................***Failed Required regular expression not found. Regex=[^sending 2 signals
2021-06-18T21:54:20.9375904Z <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/linux.signest (13806). Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:31.3834114Z 79: <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/pthreads.pthreads (13852). Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:31.3845335Z 42/83 Test #79: code_api|pthreads.pthreads ......................***Failed Required regular expression not found. Regex=[^Estimation of pi is 3\.142425985001098
2021-06-18T21:54:31.3848514Z <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/pthreads.pthreads (13852). Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:31.3863909Z 80: <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/pthreads.pthreads_exit (13857). Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:31.3875897Z 43/83 Test #80: code_api|pthreads.pthreads_exit .................***Failed Required regular expression not found. Regex=[^$
2021-06-18T21:54:31.3879396Z <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/pthreads.pthreads_exit (13857). Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:31.3895152Z 81: <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/pthreads.ptsig (13866). Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:31.3903603Z 44/83 Test #81: code_api|pthreads.ptsig .........................***Failed Required regular expression not found. Regex=[^Estimation of pi is 3\.142425985001098
2021-06-18T21:54:31.3906091Z <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/pthreads.ptsig (13866). Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:55:11.0091145Z 109: <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/client.process-id (13966). Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:55:11.0102756Z 61/83 Test #109: code_api|client.process-id ......................***Failed Required regular expression not found. Regex=[^thread exit: different process id
2021-06-18T21:55:11.0108001Z <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/client.process-id (13966). Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:55:11.0163180Z 62/83 Test #112: code_api|client.drreg-flow ......................***Failed Required regular expression not found. Regex=[^Hello, world!
2021-06-18T21:55:11.0176672Z 63/83 Test #113: code_api|client.drreg-cross .....................***Failed Required regular expression not found. Regex=[^Hello, world!
2021-06-18T21:55:29.8577245Z 123: <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/client.stolen-reg (14020). Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:55:29.8587509Z 69/83 Test #123: code_api|client.stolen-reg ......................***Failed Required regular expression not found. Regex=[^Got SIGSEGV
2021-06-18T21:55:29.8598791Z <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/client.stolen-reg (14020). Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
For the memcache assert, pasting from https://github.com/DynamoRIO/dynamorio/pull/4956#issuecomment-1015852326
You can try using tmate to ssh into the runner and get more logs out: https://github.com/marketplace/actions/debugging-with-tmate
To aid in testing on ARM (and AArch64, for that matter) it would be helpful to be able to run DR under the QEMU emulator. Today that does not work out of the box. It is not clear whether there are only a handful of surface issues, or whether deeper problems will be hit in both DR and QEMU and we would need significant changes to both.