Open abhinav92003 opened 3 years ago
Ran the test locally and also set VERBOSE in the test. I see that there's actually a segmentation fault:
$ ./bin32/drrun -loglevel 3 -logdir logs -- suite/tests/bin/linux.signal0000
<log dir=logs/linux.signal0000.587500.00000000>
<Starting application dynamorio/suite/tests/bin/linux.signal0000 (587500)>
<Initial options = -no_dynamic_options -logdir 'logs' -loglevel 3 -code_api -stack_size 56K -signal_stack_size 32K -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
...
Sending SIGUSR1
signal_handler: sig=10, retaddr=0xf7e15b3c, ucxt=0xffe745fc
Got SIGUSR1 @ 0xf7be055e
Completed handling sig=10
Segmentation fault
The log file ends with:
set next tag to 0xf7ba155e, sc->SC_XIP to 0x41663fc0
for sigreturn, set sys_param1 to 0x00000000
So the param to sigreturn should be the next tag https://github.com/DynamoRIO/dynamorio/blob/638fb106d30d84887956853f8be1ce534cfa0f25/core/dispatch.c#L2042 but for some reason it is being set to zero.
The code that's supposed to set the next tag is
https://github.com/DynamoRIO/dynamorio/blob/0dc7ef1ed506749c5813776e320916579da9c9fa/core/unix/signal.c#L6700
It is a bit confusing; the log says "set next tag to " but then prints next_pc
instead of dcontext->next_tag
. I found that next_tag
is indeed set to zero at this point, probably because xax
is also zero, but it should've been the next fragment's tag.
I found that next_tag is indeed set to zero at this point, probably because xax is also zero, but it should've been the next fragment's tag.
So this is actually not an issue. As described by https://github.com/DynamoRIO/dynamorio/blob/0dc7ef1ed506749c5813776e320916579da9c9fa/core/unix/signal.c#L6662 xax
is saved intentionally to next_tag
; the app xax
may be zero at this point, so this doesn't indicate a problem.
Back to the crash: After the app signal handler completes handling the signal, I see that handle_system_call
enters fcache at https://github.com/DynamoRIO/dynamorio/blob/638fb106d30d84887956853f8be1ce534cfa0f25/core/dispatch.c#L2084 targeting the do_syscall
gencode, the first instruction of which is an int 0x80
https://github.com/DynamoRIO/dynamorio/blob/8469b44748777decfe42508fcb86155bd871c2ba/core/arch/emit_utils_shared.c#L4779.
// do_sycall gencode
0x4ab383c0: int $0x80
0x4ab383c2: mov %eax,0x4aaea05c
0x4ab383c7: mov $0xf7ec868c,%eax
0x4ab383cc: jmp 0x4ab37fc0 // fcache_return
At the int 0x80
, we have
eax 0xad 173
so the syscall is sigreturn
.
The crash happens exactly at the sigreturn
(I had to be careful not to overstep in gdb, as gdb simply crashes without any useful information at this instr).
I'm trying to figure out why the sigreturn
crashed. Is something messing up the stack, so it has trouble restoring pre-signal state?
Using do_int_syscall
Entry into do_syscall to execute a non-ignorable system call
system call 173
rt_sigreturn()
xsp is 0xffffbf10
signal was 10 (did == param 10)
Setting app signal stack to 0x00000000-0xffffffff 2=disabled
blocked signals are now:
14 = blocked
23 = blocked
returning-to sigcontext 0xffffbfb0:
...
set next tag to 0xf7c0b55e/0x00000000, sc->SC_XIP to 0x4e1a7fc0
for sigreturn, set sys_param1 to 0x00000000
AAA enter_fcache from handle_system_call
I had a look at kernel logs in dmesg
and found the following:
$ sudo dmesg
[1228341.107006] usercopy: Kernel memory overwrite attempt detected to SLUB object 'task_struct' (offset 5248, size 2688)!
[1228341.107013] ------------[ cut here ]------------
[1228341.107014] kernel BUG at mm/usercopy.c:99!
[1228341.107019] invalid opcode: 0000 [#241] SMP PTI
...
[1228341.107026] RIP: 0010:usercopy_abort+0x74/0x76
[1228341.107028] Code: da ee 88 51 48 0f 45 d6 49 c7 c3 d4 e1 ef 88 4c 89 d1 57 48 c7 c6 11 87 ee 88 48 c7 c7 a0 e2 ef 88 49 0f 45 f3 e8 09 7d ff ff <0f> 0b 4c 89 e1 49 89 d8 44 89 ea 31 f6 48 29 c1 48 c7 c7 16 e2 ef
[1228341.107029] RSP: 0000:ffff948f0bedbd90 EFLAGS: 00010246
[1228341.107031] RAX: 0000000000000069 RBX: 0000000000000a80 RCX: 0000000000000000
[1228341.107031] RDX: 0000000000000000 RSI: ffff899240018a00 RDI: ffff899240018a00
[1228341.107032] RBP: ffff8985db4d9f00 R08: 0000000000000000 R09: ffff948f0bedbbb8
[1228341.107033] R10: ffff948f0bedbbb0 R11: ffffffff894cbb80 R12: ffff8985db4d9480
[1228341.107034] R13: 0000000000000000 R14: 00000000000000ff R15: ffff8985db4d9440
[1228341.107035] FS: 000000004a63f000(0073) GS:ffff899240000000(0063) knlGS:000000004a643b70
[1228341.107036] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[1228341.107036] CR2: 000000004a6b3000 CR3: 0000000140b96005 CR4: 00000000003706e0
[1228341.107037] Call Trace:
[1228341.107042] __check_heap_object+0xe0/0x110
[1228341.107045] __check_object_size+0x136/0x150
[1228341.107047] __fpu__restore_sig+0x1b9/0x650
[1228341.107051] ia32_restore_sigcontext+0x137/0x170
[1228341.107052] __do_compat_sys_rt_sigreturn+0x76/0xd0
[1228341.107055] do_int80_syscall_32+0x3e/0x50
[1228341.107058] entry_INT80_compat+0x88/0x8d
[1228341.107063] RIP: 0023:0x4a676fc0
[1228341.107064] Code: 8b 35 44 90 62 4a 8b 3d 40 90 62 4a 8b 2d 48 90 62 4a 8b 25 4c 90 62 4a ff 25 e0 92 62 4a 90 90 90 90 90 90 90 90 90 90 90 90 <89> 1d 50 90 62 4a 89 0d 58 90 62 4a 89 15 54 90 62 4a 89 35 44 90
[1228341.107065] RSP: 002b:00000000ffb5b7ac EFLAGS: 00000286
[1228341.107066] RAX: 00000000f7ea5698 RBX: 0000000000255e87 RCX: 000000000000000a
[1228341.107067] RDX: 00000000f39fa000 RSI: 00000000f7b94000 RDI: 00000000ffb5b8d4
[1228341.107067] RBP: 00000000ffb5b8f8 R08: 0000000000000000 R09: 0000000000000000
[1228341.107068] R10: 0000000000000000 R11: 0000000000000286 R12: 0000000000000000
[1228341.107069] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
The call trace refers to this ia32_restore_sigcontext
: https://elixir.bootlin.com/linux/v5.10.14/source/arch/x86/ia32/ia32_signal.c#L60, and the __check_heap_object
routine is at https://elixir.bootlin.com/linux/v5.10.14/source/mm/slub.c#L4081.
I obtained the image for my kernel with debug symbols and used it with gdb to find the exact source code locations for the above stack trace. As many routines are inlined, in some cases I had to look at the disassembly of the routine to find the exact call sites in the original code.
[1228341.107037] Call Trace: [1228341.107042] __check_heap_object+0xe0/0x110 [1228341.107045] __check_object_size+0x136/0x150 [1228341.107047] fpurestore_sig+0x1b9/0x650 [1228341.107051] ia32_restore_sigcontext+0x137/0x170 [1228341.107052] __do_compat_sys_rt_sigreturn+0x76/0xd0 [1228341.107055] do_int80_syscall_32+0x3e/0x50 [1228341.107058] entry_INT80_compat+0x88/0x8d
usercopy: Kernel memory overwrite attempt detected to SLUB object 'task_struct' (offset 5248, size 2688)
The following call @https://elixir.bootlin.com/linux/v5.10.40/source/arch/x86/kernel/fpu/signal.c#L434 goes bad.
ret = __copy_from_user(&fpu->state.fxsave, buf_fx, state_size);
The field being written is task_struct.thread.fpu.
I think the __check_heap_object
call should have returned at https://elixir.bootlin.com/linux/v5.10.46/source/mm/slub.c#L4058 for a normal access, but maybe n
is too large. n
is the state_size
param in the __copy_from_user
call above, which apparently is greater than the task_struct
object's size.
state_size
is computed based on the given frame at https://elixir.bootlin.com/linux/v5.10.40/source/arch/x86/kernel/fpu/signal.c#L320. I'm trying to check now if we're setting the xstate_size
field properly in the frame.
On a side note, I tried setting f_old->uc.uc_mcontext.fpstate
to NULL in fixup_rtframe_pointers
, and the crash goes away with that. So the issue is definitely with FP state.
Now that I know where to look, I compared the fpstate
s from the succeeding native run and the run under DR:
Under DR, using frame seen at record_pending_signal
:
(gdb) ptype kernel_fpstate_t
type = struct _kernel_fpstate_t {
unsigned long cw;
unsigned long sw;
unsigned long tag;
unsigned long ipoff;
unsigned long cssel;
unsigned long dataoff;
unsigned long datasel;
kernel_fpreg_t _st[8];
unsigned short status;
unsigned short magic;
unsigned long _fxsr_env[6];
unsigned long mxcsr;
unsigned long reserved;
kernel_fpxreg_t _fxsr_st[8];
kernel_xmmreg_t _xmm[8];
unsigned long padding1[44];
union {
unsigned long padding2[12];
kernel_fpx_sw_bytes_t sw_reserved;
};
}
(gdb) p (*(struct _kernel_fpstate_t*)frame->uc->uc_mcontext->fpstate).sw_reserved
$14 = {magic1 = 1179670611, extended_size = 2804, xstate_bv = 255, xstate_size = 2688, padding = {0, 0, 0, 0, 0, 0, 0}}
(gdb)
At native app handler:
(gdb) ptype struct _fpstate
type = struct _fpstate {
__uint32_t cw;
__uint32_t sw;
__uint32_t tag;
__uint32_t ipoff;
__uint32_t cssel;
__uint32_t dataoff;
__uint32_t datasel;
struct _fpreg _st[8];
unsigned short status;
unsigned short magic;
__uint32_t _fxsr_env[6];
__uint32_t mxcsr;
__uint32_t __glibc_reserved1;
struct _fpxreg _fxsr_st[8];
struct _xmmreg _xmm[8];
__uint32_t __glibc_reserved2[56];
}
(gdb) p (*((struct sigcontext)ucxt->uc_mcontext)->fpstate).__glibc_reserved2
$16 = {0 <repeats 32 times>, 52, 384, 384, 4, 4, 3, 1661848, 1661848, 1661848, 19, 19, 4, 1179670611, 2804, 255, 0, 2688, 0, 0, 0, 0, 0, 0, 0}
(gdb) p (*((struct sigcontext)ucxt->uc_mcontext)->fpstate).__glibc_reserved2[44]
$17 = 1179670611
(gdb) p (*((struct sigcontext)ucxt->uc_mcontext)->fpstate).__glibc_reserved2[45]
$18 = 2804
(gdb) p (*((struct sigcontext)ucxt->uc_mcontext)->fpstate).__glibc_reserved2[46]
$19 = 255
(gdb) p (*((struct sigcontext)ucxt->uc_mcontext)->fpstate).__glibc_reserved2[47]
$20 = 0
The sw_reserved
field gives details about the extended state (_fpx_sw_bytes
). The magic bytes and the extended_size
(which denotes total size of fpstate
) match. But I see that xstate_size
(the actual state size) is actually zero under the native app.
I'm looking into why xstate_size
is zero in the frame during native run.
xstate_size
is set at init time using fpu_user_xstate_size
:
https://elixir.bootlin.com/linux/v5.10.40/source/arch/x86/kernel/fpu/signal.c#L524
fpu_user_xstate_size
is in turn also set at init time: https://elixir.bootlin.com/linux/v5.10.40/source/arch/x86/kernel/fpu/xstate.c#L766
This is done using cpu_id
: https://elixir.bootlin.com/linux/v5.10.40/source/arch/x86/kernel/fpu/xstate.c#L721
I know for sure that the init doesn't fail or anything because I saw this log in dmesg
on rebooting: https://elixir.bootlin.com/linux/v5.10.40/source/arch/x86/kernel/fpu/xstate.c#L862
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64
[ 0.000000] x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64
[ 0.000000] x86/fpu: xstate_offset[5]: 960, xstate_sizes[5]: 64
[ 0.000000] x86/fpu: xstate_offset[6]: 1024, xstate_sizes[6]: 512
[ 0.000000] x86/fpu: xstate_offset[7]: 1536, xstate_sizes[7]: 1024
[ 0.000000] x86/fpu: Enabled xstate features 0xff, context size is 2560 bytes, using 'compacted' format.
Note that 2560 is fpu_kernel_xstate_size
not fpu_user_xstate_size
. I think they are different because the former signifies the max accounted-for size? (https://elixir.bootlin.com/linux/v5.10.40/source/arch/x86/kernel/fpu/xstate.c#L760)
task_struct
size accounts for fpu_kernel_xstate_size
: https://elixir.bootlin.com/linux/v5.10.40/source/arch/x86/kernel/fpu/init.c#L171. But what's happening in the crash (see message: " usercopy: Kernel memory overwrite attempt detected to SLUB object 'task_struct' (offset 5248, size 2688)!") is that we're writing 2688 bytes which is more than that.
But I see that xstate_size (the actual state size) is actually zero under the native app.
It's actually not. Keeping the struct below in mind,
type = struct _kernel_fpx_sw_bytes_t { u32 magic1; u32 extended_size; u64 xstate_bv; __u32 xstate_size; u32 padding[7]; }
(gdb) p (*((struct sigcontext)ucxt->uc_mcontext)->fpstate).__glibc_reserved2 $16 = {0 <repeats 32 times>, 52, 384, 384, 4, 4, 3, 1661848, 1661848, 1661848, 19, 19, 4, 1179670611, 2804, 255, 0, 2688, 0, 0, 0, 0, 0, 0, 0}
As __glibc_reserved2
is __uint32_t
, gdb is printing it 4 bytes at a time, and it's little endian so xstate_bv
is 255 (255,0
above) and xstate_size
is 2688 -- same as the run under DR :/
The only other difference I see in the two fpstate
s is in some padding bytes: https://gist.github.com/abhinav92003/17ff1a27973a387216a2fcc5444abcc9. Checking that now.
The padding difference seems insignificant -- it stores XMM values (https://elixir.bootlin.com/linux/v5.10.40/source/arch/x86/include/uapi/asm/sigcontext.h#L128). The DR version is 0xABABABAB
which signifies just unused memory I think, so the native version might be garbage.
I also compared the frame just before the crashing sigreturn (added to above gist). That seems same too as the other two.
I might have found the issue: if _fxsr_env
is not 64 byte aligned, the kernel falls back to restoring fxregs_save
instead of xregs_save
(https://elixir.bootlin.com/linux/v5.10.40/source/arch/x86/kernel/fpu/signal.c#L335). But in this path state_size
is already modified to use fx_sw_user.xstate_size
, so it goes to write that many bytes to fxregs_save
structure in kernel.
I did check if it was aligned on gdb and it did turned out to be multiple of 0x40
, but when I tried adding back the assert at https://github.com/DynamoRIO/dynamorio/blob/f9f95d9c66b0dfe6d03428dac968ab0f4360fc85/core/unix/signal_linux_x86.c#L228, it failed. So maybe I was looking at something wrong.
<Application /usr/local/google/home/sharmaabhinav/dr/build/i5079-1_32/suite/tests/bin/linux.signal0000 (140439). Internal Error: DynamoRIO debug check failure: /usr/local/google/home/sharmaabhinav/dr/src/i5079-1/core/unix/signal_linux_x86.c:237 ALIGNED(xsave_start, AVX_ALIGNMENT)
This code path is disabled DISABLED_ISSUE_3256
due to #3256 right now.
If the kernel does fall back to a different path due to missing 64-byte alignment, it seems to me that it should also have made sure that state_size
is correct. I don't see that pattern in v5.14 code anymore: https://elixir.bootlin.com/linux/v5.14/source/arch/x86/kernel/fpu/signal.c#L322
I see that we do make space for alignment on the signal stack https://github.com/DynamoRIO/dynamorio/blob/9357d87adb3b13740d682d2524cb70486628e775/core/unix/signal.c#L3130
but we align fpstate
not fpstate._fxsr_env
: https://github.com/DynamoRIO/dynamorio/blob/9357d87adb3b13740d682d2524cb70486628e775/core/unix/signal.c#L3256
Our approach to alignment is interesting here. We reserve extra bytes and then ALIGN_FORWARD
to pick the place to start. I'll need to do something similar except for a point in the middle of the struct.
Xref the issue for upgrading Ubuntu version to 20.04 for our Github CI #4953. Creating a separate one for the following problem:
Many tests seem to be failing due to an abrupt end of logs:
https://github.com/DynamoRIO/dynamorio/pull/4981/checks?check_run_id=2954404259
All of the following tests seem to have a similar issue: