32-bit signal tests failing on Ubuntu 20.04

DynamoRIO / dynamorio

Dynamic Instrumentation Tool Platform

Other

2.67k stars 562 forks source link

32-bit signal tests failing on Ubuntu 20.04 #5079

Open abhinav92003 opened 3 years ago

abhinav92003 commented 3 years ago

Xref the issue for upgrading Ubuntu version to 20.04 for our Github CI #4953. Creating a separate one for the following problem:

Many tests seem to be failing due to an abrupt end of logs:

Test  #42: code_api|linux.signal0000 ........................................***Failed  Required regular expression not found. Regex=[^Sending SIGUSR2
2021-06-30T16:31:59.0520704Z Sending SIGUSR1
2021-06-30T16:31:59.0521168Z in signal handler
2021-06-30T16:31:59.0521616Z Got SIGUSR1
2021-06-30T16:31:59.0522067Z Sending SIGRTMAX
2021-06-30T16:31:59.0522530Z in signal handler
2021-06-30T16:31:59.0522978Z Got SIGRTMAX
2021-06-30T16:31:59.0523454Z Generating SIGSEGV
2021-06-30T16:31:59.0523928Z in signal handler
2021-06-30T16:31:59.0524373Z Got SIGSEGV
2021-06-30T16:31:59.0524760Z 250006\.749760
2021-06-30T16:31:59.0525108Z $
2021-06-30T16:31:59.0525465Z ]  0.77 sec
2021-06-30T16:31:59.0525884Z Sending SIGUSR2
2021-06-30T16:31:59.0526334Z Sending SIGUSR1
2021-06-30T16:31:59.0526796Z in signal handler
2021-06-30T16:31:59.0527239Z Got SIGUSR1
2021-06-30T16:31:59.0527508Z

https://github.com/DynamoRIO/dynamorio/pull/4981/checks?check_run_id=2954404259

All of the following tests seem to have a similar issue:

code_api|linux.signalxxxx
code_api|linux.sigplainxxx

code_api|linux.syscall_pwait 
code_api|linux.sigcontext 

code_api|linux.signal_race
code_api|linux.alarm 
code_api|linux.bad-signal-stack 
Code_api|linux.sigsuspend
Code_api|linux.signest
Code_api|client.events
Code_api|client.events_cpp
Code_api|client.cleancallsig
Code_api|client.signal
code_api|api.static_signal 
Code_api,opt_memory|client.events
Code_api,opt_speed|client.events
Code_api,thread_private|client.events
Code_api,disable_traces|client.events
Code_api,thread_private,disable_traces|client.events

abhinav92003 commented 3 years ago

Ran the test locally and also set VERBOSE in the test. I see that there's actually a segmentation fault:

$ ./bin32/drrun -loglevel 3 -logdir logs   -- suite/tests/bin/linux.signal0000
<log dir=logs/linux.signal0000.587500.00000000>
<Starting application dynamorio/suite/tests/bin/linux.signal0000 (587500)>
<Initial options = -no_dynamic_options -logdir 'logs' -loglevel 3 -code_api -stack_size 56K -signal_stack_size 32K -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
...
Sending SIGUSR1
signal_handler: sig=10, retaddr=0xf7e15b3c, ucxt=0xffe745fc
Got SIGUSR1 @ 0xf7be055e
Completed handling sig=10
Segmentation fault

The log file ends with:

set next tag to 0xf7ba155e, sc->SC_XIP to 0x41663fc0
for sigreturn, set sys_param1 to 0x00000000

So the param to sigreturn should be the next tag https://github.com/DynamoRIO/dynamorio/blob/638fb106d30d84887956853f8be1ce534cfa0f25/core/dispatch.c#L2042 but for some reason it is being set to zero.

The code that's supposed to set the next tag is https://github.com/DynamoRIO/dynamorio/blob/0dc7ef1ed506749c5813776e320916579da9c9fa/core/unix/signal.c#L6700 It is a bit confusing; the log says "set next tag to " but then prints next_pc instead of dcontext->next_tag. I found that next_tag is indeed set to zero at this point, probably because xax is also zero, but it should've been the next fragment's tag.

abhinav92003 commented 3 years ago

I found that next_tag is indeed set to zero at this point, probably because xax is also zero, but it should've been the next fragment's tag.

So this is actually not an issue. As described by https://github.com/DynamoRIO/dynamorio/blob/0dc7ef1ed506749c5813776e320916579da9c9fa/core/unix/signal.c#L6662 xax is saved intentionally to next_tag; the app xax may be zero at this point, so this doesn't indicate a problem.

Back to the crash: After the app signal handler completes handling the signal, I see that handle_system_call enters fcache at https://github.com/DynamoRIO/dynamorio/blob/638fb106d30d84887956853f8be1ce534cfa0f25/core/dispatch.c#L2084 targeting the do_syscall gencode, the first instruction of which is an int 0x80 https://github.com/DynamoRIO/dynamorio/blob/8469b44748777decfe42508fcb86155bd871c2ba/core/arch/emit_utils_shared.c#L4779.

// do_sycall gencode
   0x4ab383c0:  int    $0x80
   0x4ab383c2:  mov    %eax,0x4aaea05c
   0x4ab383c7:  mov    $0xf7ec868c,%eax
   0x4ab383cc:  jmp    0x4ab37fc0 // fcache_return

At the int 0x80, we have

eax            0xad                173

so the syscall is sigreturn.

The crash happens exactly at the sigreturn (I had to be careful not to overstep in gdb, as gdb simply crashes without any useful information at this instr).

I'm trying to figure out why the sigreturn crashed. Is something messing up the stack, so it has trouble restoring pre-signal state?

Using do_int_syscall
Entry into do_syscall to execute a non-ignorable system call
system call 173
rt_sigreturn()
    xsp is 0xffffbf10
    signal was 10 (did == param 10)
Setting app signal stack to 0x00000000-0xffffffff 2=disabled
blocked signals are now:
    14 = blocked
    23 = blocked
returning-to sigcontext 0xffffbfb0:
...
set next tag to 0xf7c0b55e/0x00000000, sc->SC_XIP to 0x4e1a7fc0
for sigreturn, set sys_param1 to 0x00000000
AAA enter_fcache from handle_system_call

abhinav92003 commented 3 years ago

I had a look at kernel logs in dmesg and found the following:

$ sudo dmesg
[1228341.107006] usercopy: Kernel memory overwrite attempt detected to SLUB object 'task_struct' (offset 5248, size 2688)!
[1228341.107013] ------------[ cut here ]------------
[1228341.107014] kernel BUG at mm/usercopy.c:99!
[1228341.107019] invalid opcode: 0000 [#241] SMP PTI
...
[1228341.107026] RIP: 0010:usercopy_abort+0x74/0x76
[1228341.107028] Code: da ee 88 51 48 0f 45 d6 49 c7 c3 d4 e1 ef 88 4c 89 d1 57 48 c7 c6 11 87 ee 88 48 c7 c7 a0 e2 ef 88 49 0f 45 f3 e8 09 7d ff ff <0f> 0b 4c 89 e1 49 89 d8 44 89 ea 31 f6 48 29 c1 48 c7 c7 16 e2 ef
[1228341.107029] RSP: 0000:ffff948f0bedbd90 EFLAGS: 00010246
[1228341.107031] RAX: 0000000000000069 RBX: 0000000000000a80 RCX: 0000000000000000
[1228341.107031] RDX: 0000000000000000 RSI: ffff899240018a00 RDI: ffff899240018a00
[1228341.107032] RBP: ffff8985db4d9f00 R08: 0000000000000000 R09: ffff948f0bedbbb8
[1228341.107033] R10: ffff948f0bedbbb0 R11: ffffffff894cbb80 R12: ffff8985db4d9480
[1228341.107034] R13: 0000000000000000 R14: 00000000000000ff R15: ffff8985db4d9440
[1228341.107035] FS:  000000004a63f000(0073) GS:ffff899240000000(0063) knlGS:000000004a643b70
[1228341.107036] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[1228341.107036] CR2: 000000004a6b3000 CR3: 0000000140b96005 CR4: 00000000003706e0
[1228341.107037] Call Trace:
[1228341.107042]  __check_heap_object+0xe0/0x110
[1228341.107045]  __check_object_size+0x136/0x150
[1228341.107047]  __fpu__restore_sig+0x1b9/0x650
[1228341.107051]  ia32_restore_sigcontext+0x137/0x170
[1228341.107052]  __do_compat_sys_rt_sigreturn+0x76/0xd0
[1228341.107055]  do_int80_syscall_32+0x3e/0x50
[1228341.107058]  entry_INT80_compat+0x88/0x8d
[1228341.107063] RIP: 0023:0x4a676fc0
[1228341.107064] Code: 8b 35 44 90 62 4a 8b 3d 40 90 62 4a 8b 2d 48 90 62 4a 8b 25 4c 90 62 4a ff 25 e0 92 62 4a 90 90 90 90 90 90 90 90 90 90 90 90 <89> 1d 50 90 62 4a 89 0d 58 90 62 4a 89 15 54 90 62 4a 89 35 44 90
[1228341.107065] RSP: 002b:00000000ffb5b7ac EFLAGS: 00000286
[1228341.107066] RAX: 00000000f7ea5698 RBX: 0000000000255e87 RCX: 000000000000000a
[1228341.107067] RDX: 00000000f39fa000 RSI: 00000000f7b94000 RDI: 00000000ffb5b8d4
[1228341.107067] RBP: 00000000ffb5b8f8 R08: 0000000000000000 R09: 0000000000000000
[1228341.107068] R10: 0000000000000000 R11: 0000000000000286 R12: 0000000000000000
[1228341.107069] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

The call trace refers to this ia32_restore_sigcontext: https://elixir.bootlin.com/linux/v5.10.14/source/arch/x86/ia32/ia32_signal.c#L60, and the __check_heap_object routine is at https://elixir.bootlin.com/linux/v5.10.14/source/mm/slub.c#L4081.

abhinav92003 commented 3 years ago

I obtained the image for my kernel with debug symbols and used it with gdb to find the exact source code locations for the above stack trace. As many routines are inlined, in some cases I had to look at the disassembly of the routine to find the exact call sites in the original code.

[1228341.107037] Call Trace: [1228341.107042] __check_heap_object+0xe0/0x110 [1228341.107045] __check_object_size+0x136/0x150 [1228341.107047] fpurestore_sig+0x1b9/0x650 [1228341.107051] ia32_restore_sigcontext+0x137/0x170 [1228341.107052] __do_compat_sys_rt_sigreturn+0x76/0xd0 [1228341.107055] do_int80_syscall_32+0x3e/0x50 [1228341.107058] entry_INT80_compat+0x88/0x8d

abhinav92003 commented 3 years ago

usercopy: Kernel memory overwrite attempt detected to SLUB object 'task_struct' (offset 5248, size 2688)

The following call @https://elixir.bootlin.com/linux/v5.10.40/source/arch/x86/kernel/fpu/signal.c#L434 goes bad.

ret = __copy_from_user(&fpu->state.fxsave, buf_fx, state_size);

The field being written is task_struct.thread.fpu.

I think the __check_heap_object call should have returned at https://elixir.bootlin.com/linux/v5.10.46/source/mm/slub.c#L4058 for a normal access, but maybe n is too large. n is the state_size param in the __copy_from_user call above, which apparently is greater than the task_struct object's size.

state_size is computed based on the given frame at https://elixir.bootlin.com/linux/v5.10.40/source/arch/x86/kernel/fpu/signal.c#L320. I'm trying to check now if we're setting the xstate_size field properly in the frame.

On a side note, I tried setting f_old->uc.uc_mcontext.fpstate to NULL in fixup_rtframe_pointers, and the crash goes away with that. So the issue is definitely with FP state.

abhinav92003 commented 3 years ago

Now that I know where to look, I compared the fpstates from the succeeding native run and the run under DR: Under DR, using frame seen at record_pending_signal:

(gdb) ptype kernel_fpstate_t
type = struct _kernel_fpstate_t {
    unsigned long cw;
    unsigned long sw;
    unsigned long tag;
    unsigned long ipoff;
    unsigned long cssel;
    unsigned long dataoff;
    unsigned long datasel;
    kernel_fpreg_t _st[8];
    unsigned short status;
    unsigned short magic;
    unsigned long _fxsr_env[6];
    unsigned long mxcsr;
    unsigned long reserved;
    kernel_fpxreg_t _fxsr_st[8];
    kernel_xmmreg_t _xmm[8];
    unsigned long padding1[44];
    union {
        unsigned long padding2[12];
        kernel_fpx_sw_bytes_t sw_reserved;
    };
}
(gdb) p (*(struct _kernel_fpstate_t*)frame->uc->uc_mcontext->fpstate).sw_reserved
$14 = {magic1 = 1179670611, extended_size = 2804, xstate_bv = 255, xstate_size = 2688, padding = {0, 0, 0, 0, 0, 0, 0}}
(gdb)

At native app handler:

(gdb) ptype struct _fpstate
type = struct _fpstate {
    __uint32_t cw;
    __uint32_t sw;
    __uint32_t tag;
    __uint32_t ipoff;
    __uint32_t cssel;
    __uint32_t dataoff;
    __uint32_t datasel;
    struct _fpreg _st[8];
    unsigned short status;
    unsigned short magic;
    __uint32_t _fxsr_env[6];
    __uint32_t mxcsr;
    __uint32_t __glibc_reserved1;
    struct _fpxreg _fxsr_st[8];
    struct _xmmreg _xmm[8];
    __uint32_t __glibc_reserved2[56];
}
(gdb) p (*((struct sigcontext)ucxt->uc_mcontext)->fpstate).__glibc_reserved2
$16 = {0 <repeats 32 times>, 52, 384, 384, 4, 4, 3, 1661848, 1661848, 1661848, 19, 19, 4, 1179670611, 2804, 255, 0, 2688, 0, 0, 0, 0, 0, 0, 0}
(gdb) p (*((struct sigcontext)ucxt->uc_mcontext)->fpstate).__glibc_reserved2[44]
$17 = 1179670611
(gdb) p (*((struct sigcontext)ucxt->uc_mcontext)->fpstate).__glibc_reserved2[45]
$18 = 2804
(gdb) p (*((struct sigcontext)ucxt->uc_mcontext)->fpstate).__glibc_reserved2[46]
$19 = 255
(gdb) p (*((struct sigcontext)ucxt->uc_mcontext)->fpstate).__glibc_reserved2[47]
$20 = 0

The sw_reserved field gives details about the extended state (_fpx_sw_bytes). The magic bytes and the extended_size (which denotes total size of fpstate) match. But I see that xstate_size (the actual state size) is actually zero under the native app.

abhinav92003 commented 3 years ago

I'm looking into why xstate_size is zero in the frame during native run.

xstate_size is set at init time using fpu_user_xstate_size: https://elixir.bootlin.com/linux/v5.10.40/source/arch/x86/kernel/fpu/signal.c#L524

fpu_user_xstate_size is in turn also set at init time: https://elixir.bootlin.com/linux/v5.10.40/source/arch/x86/kernel/fpu/xstate.c#L766 This is done using cpu_id: https://elixir.bootlin.com/linux/v5.10.40/source/arch/x86/kernel/fpu/xstate.c#L721

I know for sure that the init doesn't fail or anything because I saw this log in dmesg on rebooting: https://elixir.bootlin.com/linux/v5.10.40/source/arch/x86/kernel/fpu/xstate.c#L862

[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: xstate_offset[3]:  832, xstate_sizes[3]:   64
[    0.000000] x86/fpu: xstate_offset[4]:  896, xstate_sizes[4]:   64
[    0.000000] x86/fpu: xstate_offset[5]:  960, xstate_sizes[5]:   64
[    0.000000] x86/fpu: xstate_offset[6]: 1024, xstate_sizes[6]:  512
[    0.000000] x86/fpu: xstate_offset[7]: 1536, xstate_sizes[7]: 1024
[    0.000000] x86/fpu: Enabled xstate features 0xff, context size is 2560 bytes, using 'compacted' format.

Note that 2560 is fpu_kernel_xstate_size not fpu_user_xstate_size. I think they are different because the former signifies the max accounted-for size? (https://elixir.bootlin.com/linux/v5.10.40/source/arch/x86/kernel/fpu/xstate.c#L760)

task_struct size accounts for fpu_kernel_xstate_size: https://elixir.bootlin.com/linux/v5.10.40/source/arch/x86/kernel/fpu/init.c#L171. But what's happening in the crash (see message: " usercopy: Kernel memory overwrite attempt detected to SLUB object 'task_struct' (offset 5248, size 2688)!") is that we're writing 2688 bytes which is more than that.

abhinav92003 commented 3 years ago

But I see that xstate_size (the actual state size) is actually zero under the native app.

It's actually not. Keeping the struct below in mind,

type = struct _kernel_fpx_sw_bytes_t { u32 magic1; u32 extended_size; u64 xstate_bv; __u32 xstate_size; u32 padding[7]; }

(gdb) p (*((struct sigcontext)ucxt->uc_mcontext)->fpstate).__glibc_reserved2 $16 = {0 <repeats 32 times>, 52, 384, 384, 4, 4, 3, 1661848, 1661848, 1661848, 19, 19, 4, 1179670611, 2804, 255, 0, 2688, 0, 0, 0, 0, 0, 0, 0}

As __glibc_reserved2 is __uint32_t, gdb is printing it 4 bytes at a time, and it's little endian so xstate_bv is 255 (255,0 above) and xstate_size is 2688 -- same as the run under DR :/

The only other difference I see in the two fpstates is in some padding bytes: https://gist.github.com/abhinav92003/17ff1a27973a387216a2fcc5444abcc9. Checking that now.

abhinav92003 commented 3 years ago

The padding difference seems insignificant -- it stores XMM values (https://elixir.bootlin.com/linux/v5.10.40/source/arch/x86/include/uapi/asm/sigcontext.h#L128). The DR version is 0xABABABAB which signifies just unused memory I think, so the native version might be garbage.

I also compared the frame just before the crashing sigreturn (added to above gist). That seems same too as the other two.

abhinav92003 commented 3 years ago

I might have found the issue: if _fxsr_env is not 64 byte aligned, the kernel falls back to restoring fxregs_save instead of xregs_save (https://elixir.bootlin.com/linux/v5.10.40/source/arch/x86/kernel/fpu/signal.c#L335). But in this path state_size is already modified to use fx_sw_user.xstate_size, so it goes to write that many bytes to fxregs_save structure in kernel.

I did check if it was aligned on gdb and it did turned out to be multiple of 0x40, but when I tried adding back the assert at https://github.com/DynamoRIO/dynamorio/blob/f9f95d9c66b0dfe6d03428dac968ab0f4360fc85/core/unix/signal_linux_x86.c#L228, it failed. So maybe I was looking at something wrong.

<Application /usr/local/google/home/sharmaabhinav/dr/build/i5079-1_32/suite/tests/bin/linux.signal0000 (140439).  Internal Error: DynamoRIO debug check failure: /usr/local/google/home/sharmaabhinav/dr/src/i5079-1/core/unix/signal_linux_x86.c:237 ALIGNED(xsave_start, AVX_ALIGNMENT)

This code path is disabled DISABLED_ISSUE_3256 due to #3256 right now.

If the kernel does fall back to a different path due to missing 64-byte alignment, it seems to me that it should also have made sure that state_size is correct. I don't see that pattern in v5.14 code anymore: https://elixir.bootlin.com/linux/v5.14/source/arch/x86/kernel/fpu/signal.c#L322

abhinav92003 commented 3 years ago

I see that we do make space for alignment on the signal stack https://github.com/DynamoRIO/dynamorio/blob/9357d87adb3b13740d682d2524cb70486628e775/core/unix/signal.c#L3130 but we align fpstate not fpstate._fxsr_env: https://github.com/DynamoRIO/dynamorio/blob/9357d87adb3b13740d682d2524cb70486628e775/core/unix/signal.c#L3256

Our approach to alignment is interesting here. We reserve extra bytes and then ALIGN_FORWARD to pick the place to start. I'll need to do something similar except for a point in the middle of the struct.