Closed egberts closed 6 years ago
What kernel version is this? Since nobody else has hit this it makes it seem like this is an unusual kernel.
I'd suggest looking at the components used to compute pend_unit_size
. Is the kernel signal frame or xstate size unusual for your kernel?
$ uname -a
Linux arca 3.16.0-5-amd64 #1 SMP Debian 3.16.51-3+deb8u1 (2018-01-08) x86_64 GNU/Linux
size_t of pend_unit_size is 4. Define of AVX_ALIGNMENT is 64. Big difference.
$ cc --version
cc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
...
cd /home/work/instrumentations/emulation/drmemory/dynamorio/debug/api/samples && \
/usr/bin/cc -DSHOW_RESULTS -DSHOW_SYMBOLS -Dsignal_EXPORTS \
-I/home/work/instrumentations/emulation/drmemory/dynamorio/debug \
-I/home/work/instrumentations/emulation/drmemory/dynamorio/debug/cmake/../include \
-I/home/work/instrumentations/emulation/drmemory/dynamorio/debug/api/samples/../../ext/include \
-fPIC -DDEBUG -DX86_64 -DLINUX -DUSE_VISIBILITY_ATTRIBUTES \
-m64 -fno-strict-aliasing -fno-stack-protector -fvisibility=internal -std=gnu99 \
-g3 -fno-omit-frame-pointer -fno-builtin-strcmp -Wall -Werror -Wwrite-strings \
-Wno-unused-but-set-variable -O2 -fno-stack-protector \
-o CMakeFiles/signal.dir/signal.c.o \
-c /home/work/instrumentations/emulation/drmemory/dynamorio/api/samples/signal.c
void
signal_thread_init(dcontext_t *dcontext, void *os_data)
{
thread_sig_info_t *info =
HEAP_TYPE_ALLOC(dcontext, thread_sig_info_t, ACCT_OTHER, PROTECTED);
size_t pend_unit_size = sizeof(sigpending_t) +
/* include alignment for xsave on xstate */
signal_frame_extra_size(true)
/* sigpending_t has xstate inside it already */
IF_LINUX(IF_X86(-sizeof(kernel_xstate_t)));
IF_LINUX(IF_X86(ASSERT(ALIGNED(pend_unit_size, AVX_ALIGNMENT))));
`
Looks like the stack-based pend_unit_size
local variable is not 64-byte-aligned.
The seemingly workaround is to add -static
option because /bin/ls
was not linked with DynamoRio library as in:
$ debug/bin64/drrun -debug -nocheck -verbose -stats -mem -static /bin/ls /
INFO: targeting application: "/bin/ls"
INFO: app cmdline: "/bin/ls" "/"
INFO: configuration directory is "/home/steve/.dynamorio"
INFO: will exec /bin/ls
bin debootstrap etc initrd.img lib lost+found mnt proc run srv tmp var vmlinuz.old
boot dev home initrd.img.old lib64 media opt root sbin sys usr vmlinuz
Looks like the stack-based pend_unit_size local variable is not 64-byte-aligned.
? The local's storage alignment is irrelevant: the value it holds is what needs to match. Like I said above, "I'd suggest looking at the components used to compute pend_unit_size. Is the kernel signal frame or xstate size unusual for your kernel?"
The seemingly workaround is to add -static option because /bin/ls was not linked with DynamoRio library as in:
No, this is the reverse of what you want: you would only use -static if the target is linked with DR, which only happens if you built the target yourself and wanted a special build with DR inside it, which is rare. Using -static with a regular binary will simply run it natively without DR at all.
No, I am using stock kernel from Debian. I will expand the macro next.
I did notice one blurb in your HowTo regarding the improper use of -DCMAKE_BUILD_TYPE=Debug
when -DDEBUG=ON
should have been used instead.
Update: It did not change this issue, assert is still happening.
Pity... printf()
is a forbidden function. My GDB doesn't go into exec()
function. I'm using this .gdbinit
add-auto-load-safe-path /home/work/instrumentations/emulation/drmemory/dynamorio/.gdbinit$
set follow-fork-mode child
set detach-on-fork off
set follow-exec-mode new
catch exec
catch fork
catch vfork
catch syscall 59
set args -verbose -debug -follow_children -- /bin/ls
run
# At syscall59/exec()
break main
run
break reload_dynamorio
# We're now inside a new process
b
I got into a bit further using GDB:
(gdb) ni
reloaded_xfer () at /home/steve/work/instrumentations/emulation/drmemory/dynamorio/core/arch/x86/x86.asm:1178
1178 /* We maintain 16-byte alignment not just for MacOS but also for
(gdb) ni
1181 lea REG_XSP, [-ARG_SZ + REG_XSP]
(gdb) ni
1182 push REG_XSI
(gdb) ni
1183 push REG_XDI
(gdb) ni
1187 jmp GLOBAL_REF(unexpected_return)
(gdb) ni
Probably should have done GDBstep into
command instead...
<Starting application /bin/ls (1467)>
<Initial options = -no_dynamic_options -code_api -stack_size 56K -signal_stack_size 32K -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
<Application /bin/ls (1467). Internal Error: DynamoRIO debug check failure: /home/steve/work/instrumentations/emulation/drmemory/dynamorio/core/unix/signal.c:511 ALIGNED(pend_unit_size, AVX_ALIGNMENT)
(Error occurred @0 frags)
version 7.0.17768, custom build
-no_dynamic_options -code_api -stack_size 56K -signal_stack_size 32K -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct
0x00007fffffffcfc0 0x00000000710ca054
0x00007fffffffd210 0x00000000712a785e
0x00007fffffffd2c0 0x0000000071290fd6
0x00007fffffffd370 0x000000007104379c
0x00007fffffffd3c0 0x0000000071040d37
0x00007fffffffdbf0 0x00000000712c4f78
0x00007fffffffe630 0x0000000071276127>
[Inferior 2 (process 1467) exited with code 0377]
(gdb)
What tracing function or method can I use to output such low-level macro values?
I normally break on report_dynamorio_problem
(reports the assert failure) once past the exec and from there go up the callstack and print the locals
Mmmm.... Too deep for GDB to track or evoke any report_dynamorio_problem()
work/instrumentations/emulation/drmemory/dynamorio$ gdb debug/bin64/drrun
Reading symbols from debug/bin64/drrun...Reading symbols from /home/work/instrumentations/emulation/drmemory/dynamorio/debug/bin64/drrun.debug...done.
done.
Catchpoint 1 (exec)
Catchpoint 2 (fork)
Catchpoint 3 (vfork)
warning: Could not load the syscall XML file `/usr/share/gdb/syscalls/amd64-linux.xml'.
warning: GDB will not be able to display syscall names nor to verify if
any provided syscall numbers are valid.
Catchpoint 4 (syscall 59)
INFO: targeting application: "/bin/ls"
INFO: app cmdline: "/bin/ls"
INFO: configuration directory is "/home/work/.dynamorio"
INFO: will exec /bin/ls
Catchpoint 4 (call to syscall 59), 0x00007ffff7af2647 in execve () at ../sysdeps/unix/syscall-template.S:84
84 ../sysdeps/unix/syscall-template.S: No such file or directory.
Breakpoint 5 at 0x55555557e18e: file /home/work/instrumentations/emulation/drmemory/dynamorio/tools/drdeploy.c, line 978.
Breakpoint 5, main (argc=6, targv=0x7fffffffe628) at /home/work/instrumentations/emulation/drmemory/dynamorio/tools/drdeploy.c:978
---Type <return> to continue, or q <return> to quit---
978 char *dr_root = NULL;
Function "reload_dynamorio" not defined.
Make breakpoint pending on future shared library load? (y or [n]) [answered N; input not from terminal]
(gdb) b report_dynamorio_problem
Breakpoint 6 at 0x55555558633a: file /home/work/instrumentations/emulation/drmemory/dynamorio/core/unix/injector.c, line 197.
(gdb) c
Continuing.
INFO: targeting application: "/bin/ls"
INFO: app cmdline: "/bin/ls"
INFO: configuration directory is "/home/steve/.dynamorio"
INFO: will exec /bin/ls
Catchpoint 4 (call to syscall 59), 0x00007ffff7af2647 in execve () at ../sysdeps/unix/syscall-template.S:84
84 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) b report_dynamorio_problem
Note: breakpoint 6 also set at pc 0x55555558633a.
Breakpoint 7 at 0x55555558633a: file /home/work/instrumentations/emulation/drmemory/dynamorio/core/unix/injector.c, line 197.
(gdb) c
Continuing.
process 1523 is executing new program: /home/work/instrumentations/emulation/drmemory/dynamorio/debug/lib64/debug/libdynamorio.so
[New process 1523]
Thread 2.1 "libdynamorio.so" hit Catchpoint 1 (exec'd /home/work/instrumentations/emulation/drmemory/dynamorio/debug/lib64/debug/libdynamorio.so), _start () at /home/work/instrumentations/emulation/drmemory/dynamorio/core/arch/x86/x86.asm:1167
1167 mov REG_XDI, 0 /* xdi should be callee-saved but is not always: i#2641 */
(gdb) b report_dynamorio_problem
Note: breakpoints 6 and 7 also set at pc 0x55555558633a.
Note: breakpoints 6 and 7 also set at pc 0x7ffff7abf0a8.
Breakpoint 8 at 0x55555558633a: report_dynamorio_problem. (2 locations)
It would appear that the symbol table got 'resetted' to empty due to the reload_dynamorio. GDB is going in blind (x86 instructions only, no source lines).
Is there a symbol I can load or is this the APP time (/bin/ld)? Or...
Is there some kind of C MACRO trick that I can use to print out the values and I can then compile it into the code?
Put the breakpoint in after the exec. The SIGILL is a convenient spot: it should be before the assert you're hitting.
E.g.:
~/.gdbinit
has:
set follow-fork-mode child
$ gdb --args bin64/drrun -- /bin/ls
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from bin64/drrun...Reading symbols from /home/bruening/dr/git/build_x64_dbg_tests/bin64/drrun.debug...done.
done.
(gdb) run
Starting program: /home/bruening/dr/git/build_x64_dbg_tests/bin64/drrun -- /bin/ls
process 168131 is executing new program: /home/bruening/dr/git/build_x64_dbg_tests/lib64/debug/libdynamorio.so
<Starting application /bin/ls (168131)>
Program received signal SIGILL, Illegal instruction.
syscall_ready () at /home/bruening/dr/git/src/core/arch/x86/x86_shared.asm:180
180 pop REG_XBX
(gdb) b report_dynamorio_problem
Breakpoint 1 at 0x7f88162ad245: file /home/bruening/dr/git/src/core/utils.c, line 2127.
(gdb) c
Continuing.
<Initial options = -no_dynamic_options -code_api -stack_size 56K -signal_stack_size 32K -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
Breakpoint 1, report_dynamorio_problem (dcontext=0x0, dumpcore_flag=8, exception_addr=0x0, report_ebp=0x0,
fmt=0x7f88164e80a8 "DynamoRIO debug check failure: %s:%d %s\n(Error occurred @%d frags)") at /home/bruening/dr/git/src/core/utils.c:2127
2127 synchronize_dynamic_options();
(gdb) bt
#0 report_dynamorio_problem (dcontext=0x0, dumpcore_flag=8, exception_addr=0x0, report_ebp=0x0,
fmt=0x7f88164e80a8 "DynamoRIO debug check failure: %s:%d %s\n(Error occurred @%d frags)") at /home/bruening/dr/git/src/core/utils.c:2127
#1 0x00007f88162a81fc in internal_error (file=0x7f881653dba8 "/home/bruening/dr/git/src/core/unix/signal.c", line=512, expr=0x7f881653dc3f "false")
at /home/bruening/dr/git/src/core/utils.c:177
#2 0x00007f88164860c5 in signal_thread_init (dcontext=0x7f8802002940, os_data=0x0) at /home/bruening/dr/git/src/core/unix/signal.c:512
#3 0x00007f881646f7db in os_thread_init (dcontext=0x7f8802002940, os_data=0x0) at /home/bruening/dr/git/src/core/unix/os.c:2224
#4 0x00007f8816221802 in dynamo_thread_init (dstack_in=0x0, mc=0x0, os_data=0x0, client_thread=false) at /home/bruening/dr/git/src/core/dynamo.c:2330
#5 0x00007f881621ed9d in dynamorio_app_init () at /home/bruening/dr/git/src/core/dynamo.c:627
#6 0x00007f88164a4784 in privload_early_inject (sp=0x7ffea4f7ee50, old_libdr_base=0x0, old_libdr_size=140731666132112)
at /home/bruening/dr/git/src/core/unix/loader.c:1916
#7 0x00007f88164548a4 in reloaded_xfer () at /home/bruening/dr/git/src/core/arch/x86/x86.asm:1187
#8 0x0000000000000001 in ?? ()
#9 0x00007ffea4f81177 in ?? ()
#10 0x0000000000000000 in ?? ()
(gdb) up 2
#2 0x00007f88164860c5 in signal_thread_init (dcontext=0x7f8802002940, os_data=0x0) at /home/bruening/dr/git/src/core/unix/signal.c:512
512 IF_LINUX(IF_X86(ASSERT(false)));//NOCHECK
(gdb) info local
info = 0x7f8802071140
pend_unit_size = 1664
(gdb) p sizeof(sigpending_t)
$1 = 1344
(gdb) p sizeof(kernel_xstate_t)
$2 = 832
(gdb) p signal_frame_extra_size(true)
$3 = 1152
(gdb)
Or change the sources to do print_file(STDERR, "foo=%d\n", foo);
for each
I did the print_file
approach and inserted it into core/unix/signal.c
:
print_file(STDERR, "sizeof(kernel_xstate_t) %d\n", sizeof(kernel_xstate_t));
print_file(STDERR, "signal_frame_extra_size(true) %d\n", signal_frame_extra_size(true));
print_file(STDERR, "sizeof(sigpending_t): %d\n", sizeof(sigpending_t));
print_file(STDERR, "pend_unit_size: %d\n", pend_unit_size);
IF_LINUX(IF_X86(ASSERT(ALIGNED(pend_unit_size, AVX_ALIGNMENT))));
And outputted my debug "printf" value of:
sizeof(kernel_xstate_t) 832
signal_frame_extra_size(true) 524
sizeof(sigpending_t): 1344
pend_unit_size: 1036
Now attempting GDB approach...
Update: GDB approach does not stop on SIGILL even with GDB catch signal
added.
If there's no SIGILL then that code is not being triggered: see signal_arch_init
in signal_linux_x86.c. Your processor does not have AVX?
If I pretend mine doesn't have AVX I can reproduce this assert. Apparently no Travis or Appveyor or CDash automated test machines are lacking AVX.
Presumably the assert should just be relaxed for the !YMM_ENABLED case
Since you have a processor for testing this maybe you could try out a patch like this on some apps with signals, and maybe submit a pull request?
IF_LINUX(IF_X86(ASSERT(!YMM_ENABLED() || ALIGNED(pend_unit_size, AVX_ALIGNMENT))));
Non-debug build works fine?
Non-debug build works fine?
Yes. Non-debug release build works fine.
[100%] Built target htmldocs
~/instrumentations/emulation/drmemory/dynamorio/release$ bin64/drrun -follow_children -- /bin/ls
sizeof(kernel_xstate_t) 832
signal_frame_extra_size(true) 524
sizeof(sigpending_t): 1344
pend_unit_size: 1036
api cmake cmake_install.cmake configure_temp.h CPackSourceConfig.cmake drcpusim.drrun64 include logs
bin64 CMakeCache.txt configure_defines.h core drcachesim.drrun64 event_strings.h lib64 Makefile
clients CMakeFiles configure.h CPackConfig.cmake drcov.drrun64 ext libutil tools
~/work/instrumentations/emulation/drmemory/dynamorio/release
Back to the -DDEBUG=1 (and not release build), I rebuild debug build with your patch to the core/unix/signal.c
:
Since you have a processor for testing this maybe you could try out a patch like this on some apps with signals, and maybe submit a pull request?
IF_LINUX(IF_X86(ASSERT(!YMM_ENABLED() || ALIGNED(pend_unit_size, AVX_ALIGNMENT))));
With the asserting line replaced with your patch, and that patch works now in Debug build with some unexpected outputs:
[100%] Built target htmldocs
~/instrumentations/emulation/drmemory/dynamorio/debug$ bin64/drrun -follow_children -- /bin/ls
<Starting application /bin/ls (2859)>
<Initial options = -no_dynamic_options -code_api -stack_size 56K -signal_stack_size 32K -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
sizeof(kernel_xstate_t) 832
signal_frame_extra_size(true) 524
sizeof(sigpending_t): 1344
pend_unit_size: 1036
<Paste into GDB to debug DynamoRIO clients:
set confirm off
add-symbol-file '/home/work/instrumentations/emulation/drmemory/dynamorio/debug/lib64/debug/libdynamorio.so' 0x00007fa848934618
>
<(1+x) Handling our fault in a TRY at 0x00007fa848b7b504>
<spurious rep/repne prefix @0x00007fa8486f4208 (f2 41 ff e3): >
api cmake cmake_install.cmake configure_temp.h CPackSourceConfig.cmake drcpusim.drrun64 include logs
bin64 CMakeCache.txt configure_defines.h core drcachesim.drrun64 event_strings.h lib64 Makefile
clients CMakeFiles configure.h CPackConfig.cmake drcov.drrun64 ext libutil tools
<Stopping application /bin/ls (2859)>
~/instrumentations/emulation/drmemory/dynamorio/debug$
But it works.
Oh yea, I just discovered an avx-related variable in GDB. It was disabled for me.
(gdb) print ("%d\n", proc_avx_enabled())
$4 = 0x0
Hardware CPU is:
$ sudo cat /proc/cpuinfo
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz
stepping : 2
microcode : 0x56
cpu MHz : 2133.000
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov \
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall \
nx lm constant_tsc arch_perfmon pebs bts nopl aperfmperf pni \
dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm \
dtherm tpr_shadow
bogomips : 4253.32
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
Do we want to eliminate this log message before making a pull request?
<(1+x) Handling our fault in a TRY at 0x00007fa848b7b504>
<spurious rep/repne prefix @0x00007fa8486f4208 (f2 41 ff e3): >
found in core/unix/signal.c:4864
?
if (is_safe_read_ucxt(ucxt) ||$
(!dynamo_initialized && global_try_except.try_except_state != NULL) ||$
dcontext->try_except.try_except_state != NULL) {$
/* handle our own TRY/EXCEPT */$
try_except_context_t *try_cxt;$
#ifdef HAVE_MEMINFO$
/* our probe produces many of these every run */$
/* since we use for safe_*, making a _ONCE */$
SYSLOG_INTERNAL_WARNING_ONCE("(1+x) Handling our fault in a TRY at " PFX, pc);$
#endif$
LOG(THREAD, LOG_ALL, level, "TRY fault at " PFX "\n", pc);$
if (TEST(DUMPCORE_TRY_EXCEPT, DYNAMO_OPTION(dumpcore_mask)))$
os_dump_core("try/except fault");$
No, those 2 are unrelated to this issue. "spurious rep" is probably #1978.
The internal try fault is not uncommon when examining the app at startup
Make a pull request then?
Yes please.
The error message is:
On Debian 9 (stretch), I git-cloned the master and built it as followed:
And got the following ASSERT error in
unix/signal.c:511
:git log shows: