KSPP / linux

Linux kernel source tree (Kernel Self Protection Project)
https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project
Other
80 stars 5 forks source link

NULL dereference in UBSAN handler (i386 only) #350

Open kees opened 2 months ago

kees commented 2 months ago

Seen here: https://gitlab.freedesktop.org/drm/amd/-/issues/3323

Appeared under Clang 17 on ARCH=i386.

BUG: kernel NULL pointer dereference, address: 00000404
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
*pdpt = 0000000003723001 *pde = 0000000000000000 
Oops: 0002 [#1] SMP DEBUG_PAGEALLOC PTI
CPU: 1 PID: 187 Comm: (udev-worker) Not tainted 6.9.0-rc3-P3 #1
Hardware name: LENOVO 2007F2G/2007F2G, BIOS 79ETE7WW (2.27 ) 03/21/2011
EIP: __ubsan_handle_out_of_bounds+0x30/0x7c
Code: 83 ec 28 89 c6 64 a1 40 65 02 d4 83 b8 f0 04 00 00 00 75 1e 89 d7 8d 5d cc 89 d8 ba ff 00 00 00 b9 28 00 00 00 e8 3c 9d 31 00 <f0> 0f ba 6e 04 1f 73 0e 83 c4 28 5e 5f 5b 5d 31 c0 31 c9 31 d2 c3
EAX: c351d8b0 EBX: c351d8b0 ECX: 00000000 EDX: 00000000
ESI: 00000400 EDI: 00000010 EBP: c351d8e4 ESP: c351d8b0
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00210246
CR0: 80050033 CR2: 00000404 CR3: 01c44000 CR4: 000006f0

The Code disassembles to:

0:  83 ec 28                sub    esp,0x28
3:  89 c6                   mov    esi,eax
5:  64 a1 40 65 02 d4       mov    eax,fs:0xd4026540
b:  83 b8 f0 04 00 00 00    cmp    DWORD PTR [eax+0x4f0],0x0
12: 75 1e                   jne    0x32
14: 89 5d cc                mov    DWORD PTR [ebp-0x34],ebx
17: 89 d8                   mov    eax,ebx
19: ba ff 00 00 00          mov    edx,0xff
1e: b9 28 00 00 00          mov    ecx,0x28
23: e8 3c 9d 31 00          call   0x319d64
28: f0 0f ba 6e 04 1f       lock bts DWORD PTR [esi+0x4],0x1f

The NULL deref (actually offset NULL + 1028) happened during the bts above, which maps to the test_and_set_bit() below:

static bool was_reported(struct source_location *location)
{
        return test_and_set_bit(REPORTED_BIT, &location->reported);
}

static bool suppress_report(struct source_location *loc)
{
        return current->in_ubsan || was_reported(loc);
}
...
void __ubsan_handle_out_of_bounds(void *_data, void *index)
{
        struct out_of_bounds_data *data = _data;
        char index_str[VALUE_LENGTH];

        if (suppress_report(&data->location))
                return;
...

This should be impossible, though. struct out_of_bounds_data contains location as the first struct:

struct source_location {
        const char *file_name;
        union {
                unsigned long reported;
                struct {
                        u32 line;
                        u32 column;
                };
        };
};

struct out_of_bounds_data {
        struct source_location location;
        ...

So data->location.report should be offset _data + sizeof(void *) (here, 4). This matches the assembly: DWORD PTR [esi+0x4], but %esi is 0x400.

Having a base address of 1024 seems like either a special value, a per-cpu variable, or some failed relocation?

kees commented 2 months ago

Okay, reproduce this in a Debian i386 image. %esi for me is 00000008, fwiw:

[   17.262774] BUG: kernel NULL pointer dereference, address: 0000000c
[   17.263754] #PF: supervisor write access in kernel mode
[   17.263754] #PF: error_code(0x0002) - not-present page
[   17.263754] *pde = 00000000                                                                      
[   17.263754] Oops: 0002 [#1] PREEMPT SMP  
[   17.263754] CPU: 2 PID: 1805 Comm: cat Not tainted 6.9.0-rc2 #1
[   17.263754] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[   17.263754] EIP: __ubsan_handle_out_of_bounds+0x1b/0x150
[   17.263754] Code: 5d c3 90 90 90 90 90 90 90 90 90 90 90 90 90 55 89 e5 53 57 56 83 ec 2c 64 8b b
[   17.263754] EAX: 00000008 EBX: c2b18000 ECX: 91b74c03 EDX: f57a9f88
[   17.263754] ESI: 00000008 EDI: c1fb6d80 EBP: c3de5e58 ESP: c3de5e20
[   17.263754] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010246
[   17.263754] CR0: 80050033 CR2: 0000000c CR3: 02c4c000 CR4: 00350ed0

And it's exactly where the other one is:

$ LLVM=1 scripts/faddr2line clang-17-32/vmlinux __ubsan_handle_out_of_bounds+0x1b/0x150
__ubsan_handle_out_of_bounds+0x1b/0x150:
arch_test_and_set_bit at lib/ubsan.c:0
(inlined by) test_and_set_bit at include/asm-generic/bitops/instrumented-atomic.h:72
(inlined by) was_reported at lib/ubsan.c:112
(inlined by) suppress_report at lib/ubsan.c:117
(inlined by) __ubsan_handle_out_of_bounds at lib/ubsan.c:407
kees commented 2 months ago

Under Clang tip-of-tree, it crashes later ... ?!

[   10.417811] UBSAN: array-index-out-of-bounds in (null):0:16705     
[   10.419513] BUG: kernel NULL pointer dereference, address: 00000000      
[   10.420487] #PF: supervisor read access in kernel mode 
[   10.420487] #PF: error_code(0x0000) - not-present page                             
[   10.420487] *pde = 00000000                                                                      
[   10.420487] Oops: 0000 [#1] PREEMPT SMP                                                          
[   10.420487] CPU: 3 PID: 1794 Comm: cat Not tainted 6.9.0-rc2 #1    
[   10.420487] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[   10.420487] EIP: __ubsan_handle_out_of_bounds+0xab/0x190                                         
[   10.420487] Code: 9a 00 83 c4 04 b8 ff ff ff 7f 23 47 04 ff 77 08 50 ff 37 68 6b f7 75 d6 68 8d a
[   10.420487] EAX: c1e76d60 EBX: f57d9f88 ECX: 00000000 EDX: f57d9f88
[   10.420487] ESI: c2734c00 EDI: c1e76d60 EBP: c3eb1e5c ESP: c3eb1e24
[   10.420487] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010286
[   10.420487] CR0: 80050033 CR2: 00000000 CR3: 042e1000 CR4: 00350ed0                        

Crash is now in type_is_int()

static void val_to_string(char *str, size_t size, struct type_descriptor *type,
                        void *value)
{
        if (type_is_int(type)) {
...
kees commented 2 months ago

Still crashes with Clang-HEAD+v6.1.87 and Clang-14+v6.9-rc4, so bisection seems unlikely.

ernsteiswuerfel commented 2 months ago

Thanks for taking that up another level! I have not realized this to be a clang inherent issue. I just assumed this was clangs' way of telling me there is a "UBSAN: array-index-out-of-bounds" situation. 🤔

If there is further testing required please let me know!

kees commented 2 months ago

Ah, I think I figured it out. The handler calls aren't respecting -mregparm=3. Upstream bug opened: https://github.com/llvm/llvm-project/issues/89670

kees commented 2 months ago

Clang is fixed and the kernel has a work-around for earlier versions with commit c5d49b4773aac19782a57b11f7860f92692b230c.

ernsteiswuerfel commented 2 months ago

Many thanks!