crash-utility / crash

Linux kernel crash utility
https://crash-utility.github.io
788 stars 266 forks source link

Segmentation fault seen while decoding ARM64 kdump. #176

Closed codernavi18 closed 3 months ago

codernavi18 commented 3 months ago

I am using crash-8.0.4 release (make target=ARM64) on my x86_64 host to decode kdump generated on ARM64 target. But when I decode that kdump, the crash itself crashes.

$ sudo ./crash ~/.repos/src/arm64/linux/vmlinux /home/naveen/nfsroot/rootfs-buildroot-arm64/kernel.20240315160627.core.kdump

crash 8.0.4
Copyright (C) 2002-2022  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011, 2020-2022  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
Copyright (C) 2015, 2021  VMware, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb (GDB) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-pc-linux-gnu --target=aarch64-elf-linux".
Type "show configuration" for configuration details.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...

please wait... (determining panic task)Segmentation fault
codernavi18 commented 3 months ago

Here's the backtrace of the crash :

GNU gdb (GDB) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-pc-linux-gnu --target=aarch64-elf-linux".
Type "show configuration" for configuration details.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...

please wait... (determining panic task)
Thread 1 "crash" received signal SIGSEGV, Segmentation fault.
value_search_module_6_4 (value=18446603338276298752, offset=0x7ffffffface0) at symbols.c:5564
5564                if (value < sp->value)
(gdb) bt
#0  value_search_module_6_4 (value=18446603338276298752, offset=0x7ffffffface0) at symbols.c:5564
#1  0x0000555555812bd0 in value_to_symstr (value=18446603338276298752,
    buf=buf@entry=0x7fffffffb9c0 "", radix=10, radix@entry=0) at symbols.c:5872
#2  0x00005555557694a2 in display_memory (addr=<optimized out>, count=2048, flag=208,
    memtype=memtype@entry=1, opt=opt@entry=0x0) at memory.c:1740
#3  0x0000555555769e1f in raw_stack_dump (stackbase=<optimized out>, size=<optimized out>)
    at memory.c:2194
#4  0x00005555557923ff in get_active_set_panic_task () at task.c:8639
#5  0x00005555557930d2 in get_dumpfile_panic_task () at task.c:7628
#6  0x00005555557a89d3 in panic_search () at task.c:7380
#7  get_panic_context () at task.c:6267
#8  task_init () at task.c:687
#9  0x00005555557305b3 in main_loop () at main.c:787
#10 0x0000555555a64331 in captured_main (data=<optimized out>) at main.c:1284
#11 gdb_main (args=<optimized out>) at main.c:1313
#12 0x0000555555a643b0 in gdb_main_entry (argc=<optimized out>, argv=argv@entry=0x7fffffffe508)
    at main.c:1338
#13 0x00005555557d1ece in gdb_main_loop (argc=<optimized out>, argc@entry=3,
    argv=argv@entry=0x7fffffffe508) at gdb_interface.c:81
#14 0x0000555555728dfc in main (argc=3, argv=0x7fffffffe508) at main.c:720
codernavi18 commented 3 months ago

The kernel module being loded is a dummy kernel module that just have a null pointer deference in the init function, to trigger a kernel panic intentionally. The debug symbols are present.

$ /opt/arm-gnu-toolchain-13.2.Rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-objdump -t drivers/naveen/npdereference.ko

drivers/naveen/npdereference.ko:     file format elf64-littleaarch64

SYMBOL TABLE:
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000000 l    d  .init.text 0000000000000000 .init.text
0000000000000000 l    d  .exit.text 0000000000000000 .exit.text
0000000000000000 l    d  .plt   0000000000000000 .plt
0000000000000000 l    d  .init.plt  0000000000000000 .init.plt
0000000000000000 l    d  .text.ftrace_trampoline    0000000000000000 .text.ftrace_trampoline
0000000000000000 l    d  .rodata.str1.8 0000000000000000 .rodata.str1.8
0000000000000000 l    d  .modinfo   0000000000000000 .modinfo
0000000000000000 l    d  .note.gnu.property 0000000000000000 .note.gnu.property
0000000000000000 l    d  .note.gnu.build-id 0000000000000000 .note.gnu.build-id
0000000000000000 l    d  .note.Linux    0000000000000000 .note.Linux
0000000000000000 l    d  .data  0000000000000000 .data
0000000000000000 l    d  .exit.data 0000000000000000 .exit.data
0000000000000000 l    d  .init.data 0000000000000000 .init.data
0000000000000000 l    d  .gnu.linkonce.this_module  0000000000000000 .gnu.linkonce.this_module
0000000000000000 l    d  .bss   0000000000000000 .bss
0000000000000000 l    d  .note.GNU-stack    0000000000000000 .note.GNU-stack
0000000000000000 l    d  .comment   0000000000000000 .comment
0000000000000000 l    d  .debug_info    0000000000000000 .debug_info
0000000000000000 l    d  .debug_abbrev  0000000000000000 .debug_abbrev
0000000000000000 l    d  .debug_aranges 0000000000000000 .debug_aranges
0000000000000000 l    d  .debug_rnglists    0000000000000000 .debug_rnglists
0000000000000000 l    d  .debug_line    0000000000000000 .debug_line
0000000000000000 l    d  .debug_str 0000000000000000 .debug_str
0000000000000000 l    d  .debug_line_str    0000000000000000 .debug_line_str
0000000000000000 l    d  .debug_frame   0000000000000000 .debug_frame
0000000000000000 l    df *ABS*  0000000000000000 npdereference.c
0000000000000000 l     F .init.text 0000000000000040 null_deref_module_init
0000000000000000 l     F .exit.text 0000000000000024 null_deref_module_exit
0000000000000000 l     O .exit.data 0000000000000008 __UNIQUE_ID___addressable_cleanup_module332
0000000000000000 l     O .init.data 0000000000000008 __UNIQUE_ID___addressable_init_module331
0000000000000000 l     O .modinfo   0000000000000049 __UNIQUE_ID_description330
0000000000000049 l     O .modinfo   0000000000000011 __UNIQUE_ID_author329
000000000000005a l     O .modinfo   000000000000000c __UNIQUE_ID_license328
0000000000000000 l    df *ABS*  0000000000000000 npdereference.mod.c
0000000000000066 l     O .modinfo   0000000000000009 __UNIQUE_ID_depends331
000000000000006f l     O .modinfo   0000000000000009 __UNIQUE_ID_intree330
0000000000000078 l     O .modinfo   0000000000000013 __UNIQUE_ID_name329
000000000000008b l     O .modinfo   0000000000000048 __UNIQUE_ID_vermagic328
0000000000000000 l     O .note.Linux    0000000000000018 _note_15
0000000000000018 l     O .note.Linux    0000000000000018 _note_14
0000000000000000 g     O .gnu.linkonce.this_module  0000000000000440 __this_module
0000000000000000 g     F .exit.text 0000000000000024 cleanup_module
0000000000000000 g     F .init.text 0000000000000040 init_module
0000000000000000         *UND*  0000000000000000 _printk

naveen@workstation:~/.repos/src/arm64/linux$ file drivers/naveen/npdereference.ko
drivers/naveen/npdereference.ko: ELF 64-bit LSB relocatable, ARM aarch64, version 1 (SYSV), BuildID[sha1]=118e35b0267440ef364c551c5890ff934392fb6c, with debug_info, not stripped
naveen@workstation:~/.repos/src/arm64/linux$
liutgnu commented 3 months ago

Here's the backtrace of the crash :


GNU gdb (GDB) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-pc-linux-gnu --target=aarch64-elf-linux".
Type "show configuration" for configuration details.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...

please wait... (determining panic task)
Thread 1 "crash" received signal SIGSEGV, Segmentation fault.
value_search_module_6_4 (value=18446603338276298752, offset=0x7ffffffface0) at symbols.c:5564
5564              if (value < sp->value)

Interesting, the above code shouldn't cause any segfault, except the sp is an invalid pointer. Could you please print the value of sp out?

In addition, if sp is a valid pointer, then I guess the line printed here isn't the real place where segfault happens. In this case, you can rebuild crash utility source code without any compiling optimization. To do this, in crash source code, "make target=ARM64" to compile crash for the first time, then "make clean && make target=ARM64" to clean up and make the 2nd time. There won't be compiling optimization for the 2nd time :). Then re-test and post your findings here.

If none of these can work for you, it will be OK to send your vmcore/vmlinux to me via google drive or any other sharing method by private email if the vmcore shouldn't open to public, so I can have a look myself.

Thanks, Tao Liu

(gdb) bt

0 value_search_module_6_4 (value=18446603338276298752, offset=0x7ffffffface0) at symbols.c:5564

1 0x0000555555812bd0 in value_to_symstr (value=18446603338276298752,

buf=buf@entry=0x7fffffffb9c0 "", radix=10, radix@entry=0) at symbols.c:5872

2 0x00005555557694a2 in display_memory (addr=, count=2048, flag=208,

memtype=memtype@entry=1, opt=opt@entry=0x0) at memory.c:1740

3 0x0000555555769e1f in raw_stack_dump (stackbase=, size=)

at memory.c:2194

4 0x00005555557923ff in get_active_set_panic_task () at task.c:8639

5 0x00005555557930d2 in get_dumpfile_panic_task () at task.c:7628

6 0x00005555557a89d3 in panic_search () at task.c:7380

7 get_panic_context () at task.c:6267

8 task_init () at task.c:687

9 0x00005555557305b3 in main_loop () at main.c:787

10 0x0000555555a64331 in captured_main (data=) at main.c:1284

11 gdb_main (args=) at main.c:1313

12 0x0000555555a643b0 in gdb_main_entry (argc=, argv=argv@entry=0x7fffffffe508)

at main.c:1338

13 0x00005555557d1ece in gdb_main_loop (argc=, argc@entry=3,

argv=argv@entry=0x7fffffffe508) at gdb_interface.c:81

14 0x0000555555728dfc in main (argc=3, argv=0x7fffffffe508) at main.c:720

codernavi18 commented 3 months ago

Sorry my bad. I forgot to mention that the sp is coming as NULL. kdump : https://drive.google.com/file/d/1z55OHcPLuKy5KvsMml1uTJ2kJYf3LqwI/view?usp=drive_link vmlinux : https://drive.google.com/file/d/1DusF8Ipu24b5VQBYmbUjGg5nfCoPErdM/view?usp=drive_link

The kernel uImage is built from vanilla 6.5 linux kernel release, built for arm64 using defconfig. The module just has two lines of code in init to trigger null pointer deference and when this module is loaded, the kdump is triggered. The makedumpfile utility is used to generate the kdump using command : makedumpfile --message-level 4 -d 17,31 /proc/vmcore "${FILENAME}"

liutgnu commented 3 months ago

Patch posted upstream1, it can work according to my test. Thanks for your bug reporting and vmcore providing!