Open cjl20062529 opened 4 years ago
----- Original Message -----
Hi: I use crash 7.2.6-3 to parse vmcore. The vmcore was generated by kernel 4.19 aarch64。
When I read the global variables in the module, the values returned by the p command and the rd command are different.
crash /boot/vmlinux vmcore
crash 7.2.6-3 Copyright (C) 2002-2019 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "aarch64-unknown-linux-gnu"...
WARNING: cannot find NT_PRSTATUS note for cpu: 78 KERNEL: /boot/vmlinux DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 96 DATE: Wed Feb 12 16:23:47 2020 UPTIME: 17 days, 13:05:44 LOAD AVERAGE: 5253.54, 5244.11, 5221.62 TASKS: 11580 NODENAME: 121-6 RELEASE: 4.19.aarch64 VERSION: #1 SMP Mon Jul 22 00:00:00 UTC 2019 MACHINE: aarch64 (unknown Mhz) MEMORY: 96 GB PANIC: "kernel BUG at /xxx/upi_cache.c:120!" PID: 29229 COMMAND: "Jpool" TASK: ffff8022be10be00 [THREAD_INFO: ffff8022be10be00] CPU: 18 STATE: TASK_RUNNING (PANIC)
crash> mod -s snas_ds ./modules/snas_ds.ko MODULE NAME SIZE OBJECT FILE ffff000003ed0900 snas_ds 2887680 ./modules/snas_ds.ko crash> p g_bCheckMetaCap g_bCheckMetaCap = $1 = 2432712771 crash> crash> rd g_bCheckMetaCap ffff000003ececc0: 0000000000000001 ........ crash> crash> set debug 31 debug: 31 crash> set debug 31 debug: 31 text hit rate: 0% (0 of 1) crash> rd g_bCheckMetaCap <addr: ffff000003ececc0 count: 1 flag: 490 (KVADDR)> <readmem: ffff000003ececc0, KVADDR, "64-bit KVADDR", 8, (FOE), ffffc570ddb0> <read_diskdump: addr: ffff000003ececc0 paddr: 202d7e8cecc0 cnt: 8> read_diskdump: paddr/pfn: 202d7e8cecc0/202d7e8ce -> physical page is cached: 202d7e8ce000 ffff000003ececc0: 0000000000000001 ........ text hit rate: 0% (0 of 1) crash> p g_bCheckMetaCap p: per_cpu_symbol_search(g_bCheckMetaCap): NULL g_bCheckMetaCap = GETBUF(328 -> 0) $2 = 2432712771 FREEBUF(0) text hit rate: 50% (1 of 2)
I don't understand why there's no debug output after the "p g_bCheckMetaCap" command? There should be a "<readmem: ..." line with a virtual address and a "gdb_readmem_callback" type string.
Note that the "rd g_bCheckMetaCap" command shows a readmem debug output line with virtual address ffff000003ececc0 and type "64-bit KVADDR".
In any case, both the rd and the p commands should be requesting the same virtual address, which would be the address shown by "sym g_bCheckMetaCap". But presumably that's not the case for some reason.
Hi, g_bCheckMetaCap define as U32 g_bCheckMetaCap = 1 crash> p g_bCheckMetaCap g_bCheckMetaCap = $1 = 2432712771 I set debug to 31, but no debug info shown. crash> rd g_bCheckMetaCap ffff000003ececc0: 0000000000000001 ........
p commands seems didnot readmem the virtual address. cmd_p func call gdb interface to get the value. I read a lot of global variables defined in the module in my vmcore, and some displayed incorrectly.
I don't particularly understand the scenario and specific implementation of the p command, can you give me some guidance.
Below is the cmd_p code.
sp = NULL;
if ((sp = symbol_search(args[optind])) && !args[optind+1]) { //《--
Enter the branch if ((percpu_sp = per_cpu_symbol_search(args[optind])) && display_per_cpu_info(percpu_sp, radix, cpuspec)) return; if (module_symbol(sp->value, NULL, NULL, NULL, *gdb_output_radix)) // <-sp->value is the correct virtual address g_bCheckMetaCap do_load_module_filter = TRUE; } else if ((percpu_sp = per_cpu_symbol_search(args[optind])) && display_per_cpu_info(percpu_sp, radix, cpuspec)) return; else if (st->flags & LOAD_MODULE_SYMS) do_load_module_filter = TRUE;
if (cpuspec) {
if (sp)
error(WARNING, "%s is not percpu; cpuspec ignored.\n",
sp->name);
else
/* maybe a valid C expression (e.g. ':') */
*(cpuspec-1) = ':';
}
process_gdb_output(concat_args(buf1, 0, TRUE), radix,
sp ? sp->name : NULL, do_load_module_filter);
----- Original Message -----
----- Original Message ----- Hi: I use crash 7.2.6-3 to parse vmcore. The vmcore was generated by kernel 4.19 aarch64。 When I read the global variables in the module, the values returned by the p command and the rd command are different. #crash /boot/vmlinux vmcore crash 7.2.6-3 Copyright (C) 2002-2019 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "aarch64-unknown-linux-gnu"... WARNING: cannot find NT_PRSTATUS note for cpu: 78 KERNEL: /boot/vmlinux DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 96 DATE: Wed Feb 12 16:23:47 2020 UPTIME: 17 days, 13:05:44 LOAD AVERAGE: 5253.54, 5244.11, 5221.62 TASKS: 11580 NODENAME: 121-6 RELEASE: 4.19.aarch64 VERSION: #1 SMP Mon Jul 22 00:00:00 UTC 2019 MACHINE: aarch64 (unknown Mhz) MEMORY: 96 GB PANIC: "kernel BUG at /xxx/upi_cache.c:120!" PID: 29229 COMMAND: "Jpool" TASK: ffff8022be10be00 [THREAD_INFO: ffff8022be10be00] CPU: 18 STATE: TASK_RUNNING (PANIC) crash> mod -s snas_ds ./modules/snas_ds.ko MODULE NAME SIZE OBJECT FILE ffff000003ed0900 snas_ds 2887680 ./modules/snas_ds.ko crash> p g_bCheckMetaCap g_bCheckMetaCap = $1 = 2432712771 crash> crash> rd g_bCheckMetaCap ffff000003ececc0: 0000000000000001 ........ crash> crash> set debug 31 debug: 31 crash> set debug 31 debug: 31 text hit rate: 0% (0 of 1) crash> rd g_bCheckMetaCap <addr: ffff000003ececc0 count: 1 flag: 490 (KVADDR)> <readmem: ffff000003ececc0, KVADDR, "64-bit KVADDR", 8, (FOE), ffffc570ddb0> <read_diskdump: addr: ffff000003ececc0 paddr: 202d7e8cecc0 cnt: 8> read_diskdump: paddr/pfn: 202d7e8cecc0/202d7e8ce -> physical page is cached: 202d7e8ce000 ffff000003ececc0: 0000000000000001 ........ text hit rate: 0% (0 of 1) crash> p g_bCheckMetaCap p: per_cpu_symbol_search(g_bCheckMetaCap): NULL g_bCheckMetaCap = GETBUF(328 -> 0) $2 = 2432712771 FREEBUF(0) text hit rate: 50% (1 of 2) I don't understand why there's no debug output after the "p g_bCheckMetaCap" command? There should be a "<readmem: ..." line with a virtual address and a "gdb_readmem_callback" type string. Note that the "rd g_bCheckMetaCap" command shows a readmem debug output line with virtual address ffff000003ececc0 and type "64-bit KVADDR". In any case, both the rd and the p commands should be requesting the same virtual address, which would be the address shown by "sym g_bCheckMetaCap". But presumably that's not the case for some reason.
Hi, p commands seems didnot readmem the virtual address of g_bCheckMetaCap. cmd_p func call gdb interface to get the value.
Below is the cmd_p code.
sp = NULL; if ((sp = symbol_search(args[optind])) && !args[optind+1]) { //《-- Enter the branch if ((percpu_sp = per_cpu_symbol_search(args[optind])) && display_per_cpu_info(percpu_sp, radix, cpuspec)) return; if (module_symbol(sp->value, NULL, NULL, NULL, *gdb_output_radix)) // <-sp->value is the correct virtual address do_load_module_filter = TRUE; } else if ((percpu_sp = per_cpu_symbol_search(args[optind])) && display_per_cpu_info(percpu_sp, radix, cpuspec)) return; else if (st->flags & LOAD_MODULE_SYMS) do_load_module_filter = TRUE;
if (cpuspec) { if (sp) error(WARNING, "%s is not percpu; cpuspec ignored.\n", sp->name); else / maybe a valid C expression (e.g. ':') / *(cpuspec-1) = ':'; }
process_gdb_output(concat_args(buf1, 0, TRUE), radix, sp ? sp->name : NULL, do_load_module_filter);
That's correct. However, when gdb needs to read the data in order to display it, it calls back into the crash utility's gdb_readmem_callback() function. And gdb_readmem_callback() then does the requested readmem() call.
Hi,
I found that the input parameters of the gdb_readmem_callback function are incorrect. crash> rd g_bCheckMetaCap ffff000003ececc0: 0000000000000001 ........ but gdb_readmem_callback(addr=0xffff000003d79cc0)
" And gdb_readmem_callback() then does the requested readmem() call. " Because it read from the cache, it is not call readmem.
Can you tell me how the addr parameters of the gdb_readmem_callback() are passed. Thanks.
----- Original Message -----
Hi,
I found that the input parameters of the gdb_readmem_callback function are incorrect. crash> rd g_bCheckMetaCap ffff000003ececc0: 0000000000000001 ........ but gdb_readmem_callback(addr=0xffff000003d79cc0)
" And gdb_readmem_callback() then does the requested readmem() call. " Because it read from the cache, it is not call readmem.
Can you tell me how the addr parameters of the gdb_readmem_callback() are passed
The "p bCheckMetaCap" string is passed to the embedded gdb module, the gdb code evaluates it, and then reads the resultant address via the call-back into gdb_readmem_callback().
----- Original Message ----- Hi, I found that the input parameters of the gdb_readmem_callback function are incorrect. crash> rd g_bCheckMetaCap ffff000003ececc0: 0000000000000001 ........ but gdb_readmem_callback(addr=0xffff000003d79cc0) " And gdb_readmem_callback() then does the requested readmem() call. " Because it read from the cache, it is not call readmem. Can you tell me how the addr parameters of the gdb_readmem_callback() are passed The "p bCheckMetaCap" string is passed to the embedded gdb module, the gdb code evaluates it, and then reads the resultant address via the call-back into gdb_readmem_callback().
Can you give me some suggestions so that I can go to the gdb code to find out why p command returns the wrong address?
----- Original Message -----
----- Original Message ----- Hi, I found that the input parameters of the gdb_readmem_callback function are incorrect. crash> rd g_bCheckMetaCap ffff000003ececc0: 0000000000000001 ........ but gdb_readmem_callback(addr=0xffff000003d79cc0) " And gdb_readmem_callback() then does the requested readmem() call. " Because it read from the cache, it is not call readmem. Can you tell me how the addr parameters of the gdb_readmem_callback() are passed The "p bCheckMetaCap" string is passed to the embedded gdb module, the gdb code evaluates it, and then reads the resultant address via the call-back into gdb_readmem_callback().
Can you give me some suggestions so that I can go to the gdb code to find out why p command returns the wrong address?
The gdb sources incredibly convoluted, and I am by no means an expert. Start with print_command() in gdb-7.6/gdb/printcmd.c, and go from there. Somewhere in there it will parse the string and evaluate it to an address.
I still didn't find the reason why gdb can't read the global variable address in the module correctly. I have some new discoveries, crash can not read the global variables in the live system module normally. My test module is as follows:
unsigned long along = 0x1234;
struct aaa {
int aa;
unsigned long bb;
} test;
static int test_init(void)
{
printk("hello, test begin...\n");
printk("along=0x%lx\n", along);
test.aa = 0xabc;
test.bb = 0x789;
printk("test.aa=0x%lx test.bb=0x%lx\n", test.aa, test.bb);
return 0;
}
static void test_exit(void)
{
printk("bye!\n");
}
If I use the command mod -s test test.o, and then read the 'along' variable, the following correct information is displayed:
crash> mod -s test test.o
MODULE NAME SIZE OBJECT FILE
ffff000000a24040 test 16384 test.o
crash> p /x along
$1 = 0x1234
crash> sym along
ffff000000a24000 (D) along [test]
crash> p /x &along
$2 = 0xffff000000a24000
If I use the command mod -s test test.ko, it is wrong to read 'along' information.
crash> mod -s test test.ko
MODULE NAME SIZE OBJECT FILE
ffff000000a24040 test 16384 test.ko
crash> p /x along
$2 = 0x1400000004
crash> sym along
ffff000000a24000 (D) along [test]
crash> p /x &along
$3 = 0xffff000000a23000
This is an inevitable problem. Can any expert give me some advice? @crash-utility @bhupesh-sharma @k-hagio @lian-bo Thanks.
@bhupesh-sharma @lian-bo (Seems editing a comment doesn't send a notification..)
It looks like RHEL8 also has the same or a similar issue. I could reproduce it on RHEL8.2 for arm64 and its crash-7.2.7-3.el8, though I could not on x86_64. As @cjl20062529 said above, mod -s test test.ko
is NG, but mod -s test test.o
looks OK:
crash> mod -s test test.ko
MODULE NAME SIZE OBJECT FILE
ffff3efb81930040 test 262144 test.ko
crash> sym -m test
ffff3efb81910000 MODULE START: test
ffff3efb81910000 (t) init_mod
ffff3efb81910000 (T) init_module
ffff3efb81910070 (T) cleanup_module
ffff3efb81910070 (t) exit_mod
ffff3efb81930000 (D) testint
ffff3efb81930008 (D) testlong
ffff3efb81930040 (D) __this_module
ffff3efb81950000 MODULE END: test
crash> rd testint
ffff3efb81930000: 0000000000001234 4.......
crash> p testint
p: gdb request failed: p testint
crash> mod -d test
crash> mod -s test test.o
MODULE NAME SIZE OBJECT FILE
ffff3efb81930040 test 262144 test.o
crash> sym -m test
ffff3efb81910000 MODULE START: test
ffff3efb81910000 (t) init_mod
ffff3efb81910000 (T) init_module
ffff3efb81910070 (T) cleanup_module
ffff3efb81910070 (t) exit_mod
ffff3efb81930000 (D) testint
ffff3efb81930008 (D) testlong
ffff3efb81930040 (d) __this_module
ffff3efb81950000 MODULE END: test
crash> rd testint
ffff3efb81930000: 0000000000001234 4.......
crash> p testint
testint = $10 = 4660
crash> p -x testint
testint = $20 = 0x1234
@bhupesh-sharma @lian-bo (Seems editing a comment doesn't send a notification..)
It looks like RHEL8 also has the same or a similar issue. I could reproduce it on RHEL8.2 for arm64 and its crash-7.2.7-3.el8, though I could not on x86_64. As @cjl20062529 said above,
mod -s test test.ko
is NG, butmod -s test test.o
looks OK:crash> mod -s test test.ko MODULE NAME SIZE OBJECT FILE ffff3efb81930040 test 262144 test.ko crash> sym -m test ffff3efb81910000 MODULE START: test ffff3efb81910000 (t) init_mod ffff3efb81910000 (T) init_module ffff3efb81910070 (T) cleanup_module ffff3efb81910070 (t) exit_mod ffff3efb81930000 (D) testint ffff3efb81930008 (D) testlong ffff3efb81930040 (D) __this_module ffff3efb81950000 MODULE END: test crash> rd testint ffff3efb81930000: 0000000000001234 4....... crash> p testint p: gdb request failed: p testint crash> mod -d test crash> mod -s test test.o MODULE NAME SIZE OBJECT FILE ffff3efb81930040 test 262144 test.o crash> sym -m test ffff3efb81910000 MODULE START: test ffff3efb81910000 (t) init_mod ffff3efb81910000 (T) init_module ffff3efb81910070 (T) cleanup_module ffff3efb81910070 (t) exit_mod ffff3efb81930000 (D) testint ffff3efb81930008 (D) testlong ffff3efb81930040 (d) __this_module ffff3efb81950000 MODULE END: test crash> rd testint ffff3efb81930000: 0000000000001234 4....... crash> p testint testint = $10 = 4660 crash> p -x testint testint = $20 = 0x1234
Yes, this problem can be easily reproduced on arm64, and can also be reproduced on RHEL8.2. I have located that it is wrong for gdb to get the address of the variable. The call stack for gdb to obtain the call variables is roughly as follows:
at objfiles.c:1436
at parse.c:1234
errstring@entry=0xaaaaaaff6840 "", mask=403589793, mask@entry=6) at exceptions.c:557
errstring=0xfffffffff260 "\340\362\377\377\377\377", errstring@entry=0xaaaaaaff6840 "", mask=403589793, mask@entry=6) at exceptions.c:557
In the layer 0 call stack, the information in ‘’attr‘’ contains the global variable address of the module, and attr is obtained by ‘’die‘’. I can't find where to get the die information.
@bhupesh-sharma @lian-bo (Seems editing a comment doesn't send a notification..)
It looks like RHEL8 also has the same or a similar issue. I could reproduce it on RHEL8.2 for arm64 and its crash-7.2.7-3.el8, though I could not on x86_64. As @cjl20062529 said above,
mod -s test test.ko
is NG, butmod -s test test.o
looks OK:crash> mod -s test test.ko MODULE NAME SIZE OBJECT FILE ffff3efb81930040 test 262144 test.ko crash> sym -m test ffff3efb81910000 MODULE START: test ffff3efb81910000 (t) init_mod ffff3efb81910000 (T) init_module ffff3efb81910070 (T) cleanup_module ffff3efb81910070 (t) exit_mod ffff3efb81930000 (D) testint ffff3efb81930008 (D) testlong ffff3efb81930040 (D) __this_module ffff3efb81950000 MODULE END: test crash> rd testint ffff3efb81930000: 0000000000001234 4....... crash> p testint p: gdb request failed: p testint crash> mod -d test crash> mod -s test test.o MODULE NAME SIZE OBJECT FILE ffff3efb81930040 test 262144 test.o crash> sym -m test ffff3efb81910000 MODULE START: test ffff3efb81910000 (t) init_mod ffff3efb81910000 (T) init_module ffff3efb81910070 (T) cleanup_module ffff3efb81910070 (t) exit_mod ffff3efb81930000 (D) testint ffff3efb81930008 (D) testlong ffff3efb81930040 (d) __this_module ffff3efb81950000 MODULE END: test crash> rd testint ffff3efb81930000: 0000000000001234 4....... crash> p testint testint = $10 = 4660 crash> p -x testint testint = $20 = 0x1234
@k-hagio This should be related to the gdb behavior, not a crash issue. If the gdb loads the test.ko, it can be reproduced on the latest gdb-9.2 as follow:
GNU gdb (GDB) 9.2 ......
(gdb) p along $1 = 85899345924 ---here is not correct value.
[root@hpe-apollo-cn99xx-14-vm-08 build]# gdb/gdb /home/mod/test/test.o GNU gdb (GDB) 9.2 ...... Reading symbols from /home/mod/test/test.o... (gdb) p along $1 = 4660 ---here is correct value as expected.(0x1234)
@cjl20062529 Can you help to report this issue in gdb? Thanks.
BTW: I would like to forward this issue to the gdb maintainer(Pedro Alves), who knows more details about gdb, and we can still discuss with them together. Thanks.
@lian-bo, thanks for the info. But I'm not sure whether the gdb behavior you showed is the same as this issue in crash.
I tried the following and it looks to be fixed in gdb-9.2 (and gdb-9.1 as well):
# ../gdb-9.2 /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux /proc/kcore
...
(gdb) add-symbol-file test.ko 0xffff0000091d0000 -s .data 0xffff0000091f0000
add symbol table from file "test.ko" at
.text_addr = 0xffff0000091d0000
.data_addr = 0xffff0000091f0000
(y or n) y
Reading symbols from test.ko...
(gdb) print /x testlong
$1 = 0x1234 <<-- correct
# ../gdb-8.3 /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux /proc/kcore
...
(gdb) add-symbol-file test.ko 0xffff0000091d0000 -s .data 0xffff0000091f0000
add symbol table from file "test.ko" at
.text_addr = 0xffff0000091d0000
.data_addr = 0xffff0000091f0000
(y or n) y
Reading symbols from test.ko...
(gdb) p /x testlong
$1 = 0x554e4700000003 <<-- wrong
If this behavior is the issue, it looks to be fixed by this patch to me:
commit 4b610737f02338b2aea7641ab771aa5e137d067c
Author: Tom Tromey <tromey@adacore.com>
Date: Tue Jun 25 12:50:45 2019 -0600
Handle copy relocations
FYI, the add-symbol-file command is seen in crash's debug output:
crash> set debug 1
crash> mod -s test test.ko
...
add-symbol-file test.ko 0xffff0000091d0000 -s .data 0xffff0000091f0000
add symbol table from file "test.ko" at
.text_addr = 0xffff0000091d0000
.data_addr = 0xffff0000091f0000
MODULE NAME SIZE OBJECT FILE
ffff0000091f0040 test 262144 test.ko
crash>
@lian-bo, thanks for the info. But I'm not sure whether the gdb behavior you showed is the same as this issue in crash.
It may be similar to this issue, but not sure if the gdb still has another issue about this, the behavior looks strange.
I tried the following and it looks to be fixed in gdb-9.2 (and gdb-9.1 as well):
In general, I tend to use the file command to load it as follow, and got the wrong result. If I use the add-symbol-file command to load a symbol file with additional parameters, and got the correct result.
[root@hpe-apollo-cn99xx-14-vm-08 test]# /home/mod/binutils-gdb/build/gdb/gdb /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux /proc/kcore GNU gdb (GDB) 8.3.50.20191002-git ...... Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux... [New process 1] Core was generated by `BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-193.el8.aarch64 root=/dev/mapper/rhel_hpe-'.
(gdb) file test.ko warning: core file may not match specified executable file. Load new symbol table from "test.ko"? (y or n) y Reading symbols from test.ko... (gdb) p along $1 = 85899345924 (gdb) quit
[root@hpe-apollo-cn99xx-14-vm-08 test]# /home/mod/binutils-gdb/build/gdb/gdb /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux /proc/kcore GNU gdb (GDB) 8.3.50.20191002-git ...... Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux... [New process 1] Core was generated by `BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-193.el8.aarch64 root=/dev/mapper/rhel_hpe-'.
(gdb) add-symbol-file test.ko 0xffff3fedc60e0000 -s .data 0xffff3fedc6100000 -s .bss 0xffff3fedc61003c0 add symbol table from file "test.ko" at .text_addr = 0xffff3fedc60e0000 .data_addr = 0xffff3fedc6100000 .bss_addr = 0xffff3fedc61003c0 (y or n) y Reading symbols from test.ko... (gdb) p along $1 = 4660 (gdb) quit
# ../gdb-9.2 /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux /proc/kcore ... (gdb) add-symbol-file test.ko 0xffff0000091d0000 -s .data 0xffff0000091f0000 add symbol table from file "test.ko" at .text_addr = 0xffff0000091d0000 .data_addr = 0xffff0000091f0000 (y or n) y Reading symbols from test.ko... (gdb) print /x testlong $1 = 0x1234 <<-- correct # ../gdb-8.3 /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux /proc/kcore ... (gdb) add-symbol-file test.ko 0xffff0000091d0000 -s .data 0xffff0000091f0000 add symbol table from file "test.ko" at .text_addr = 0xffff0000091d0000 .data_addr = 0xffff0000091f0000 (y or n) y Reading symbols from test.ko... (gdb) p /x testlong $1 = 0x554e4700000003 <<-- wrong
If this behavior is the issue, it looks to be fixed by this patch to me:
Good findings. I just tested the commit: 4b610737f0 ("Handle copy relocations") with your method, and I saw the gdb works well and got the correct result as expected.
commit 4b610737f02338b2aea7641ab771aa5e137d067c Author: Tom Tromey <tromey@adacore.com> Date: Tue Jun 25 12:50:45 2019 -0600 Handle copy relocations
FYI, the add-symbol-file command is seen in crash's debug output:
The above commit may be able to fix the current problem that we are facing in the crash utility. In view of this, introduce this patch to the crash utility or rebase to the latest gdb, any thoughts? Thanks.
crash> set debug 1 crash> mod -s test test.ko ... add-symbol-file test.ko 0xffff0000091d0000 -s .data 0xffff0000091f0000 add symbol table from file "test.ko" at .text_addr = 0xffff0000091d0000 .data_addr = 0xffff0000091f0000 MODULE NAME SIZE OBJECT FILE ffff0000091f0040 test 262144 test.ko crash>
Good findings. I just tested the commit: 4b610737f0 ("Handle copy relocations") with your method, and I saw the gdb works well and got the correct result as expected.
commit 4b610737f02338b2aea7641ab771aa5e137d067c Author: Tom Tromey <tromey@adacore.com> Date: Tue Jun 25 12:50:45 2019 -0600 Handle copy relocations
FYI, the add-symbol-file command is seen in crash's debug output:
The above commit may be able to fix the current problem that we are facing in the crash utility. In view of this, introduce this patch to the crash utility or rebase to the latest gdb, any thoughts? Thanks.
This is really good news, thanks. @k-hagio @lian-bo
I have tried to upgrade gdb in crash before and it failed because the difference is too big. But you have proved that this patch is a repair patch, which is really good.
I tried to consult with gdb maintainer, but I didn't find a communication channel.
I'll see if the patch can be applied to the crash utility later on. Rebasing gdb should be a very tough work for us (at least for me), so I think it would be a last resort.
Hmm, I tried to apply the patch to crash, but it looks pretty hard for me because it got changed from C into C++ and I'm also not an expert on gdb. Who can do this?
Another approach I think of is that, it looks like the debug modules (*.ko.debug) in RHEL8 kernel-debuginfo don't reproduce this issue as far as I've checked, so there might be some option or config to be able to avoid this issue?
Hmm, I tried to apply the patch to crash, but it looks pretty hard for me because it got changed from C into C++ and I'm also not an expert on gdb. Who can do this?
There are too many differences between gdb-7.6 and gdb-8.3+, and there are some dependencies. It's not easy to backport from the latest gdb. Anyway, let me investigate later to see if this is doable.
Another approach I think of is that, it looks like the debug modules (*.ko.debug) in RHEL8 kernel-debuginfo don't reproduce this issue as far as I've checked, so there might be some option or config to be able to avoid this issue?
It might be worth looking into what happened. btw: I checked the config, unfortunately, I didn't see useful clues, there may be some compile options?
I have not found any good workaround or fix for this issue so far.
Please use rd
and struct
commands instead of p
for now.
I have not found any good workaround or fix for this issue so far. Please use
rd
andstruct
commands instead ofp
for now.
hi,I am using crash-8.0.2 with gdb 10.2, the problem is still exist.
Hi: I use crash 7.2.6-3 to parse vmcore. The vmcore was generated by kernel 4.19 aarch64。
When I read the global variables in the module, the values returned by the p command and the rd command are different.
crash /boot/vmlinux vmcore
crash 7.2.6-3 Copyright (C) 2002-2019 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "aarch64-unknown-linux-gnu"...
WARNING: cannot find NT_PRSTATUS note for cpu: 78 KERNEL: /boot/vmlinux DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 96 DATE: Wed Feb 12 16:23:47 2020 UPTIME: 17 days, 13:05:44 LOAD AVERAGE: 5253.54, 5244.11, 5221.62 TASKS: 11580 NODENAME: 121-6 RELEASE: 4.19.aarch64 VERSION: #1 SMP Mon Jul 22 00:00:00 UTC 2019 MACHINE: aarch64 (unknown Mhz) MEMORY: 96 GB PANIC: "kernel BUG at /xxx/upi_cache.c:120!" PID: 29229 COMMAND: "Jpool" TASK: ffff8022be10be00 [THREAD_INFO: ffff8022be10be00] CPU: 18 STATE: TASK_RUNNING (PANIC)
crash> mod -s snas_ds ./modules/snas_ds.ko MODULE NAME SIZE OBJECT FILE ffff000003ed0900 snas_ds 2887680 ./modules/snas_ds.ko crash> p g_bCheckMetaCap g_bCheckMetaCap = $1 = 2432712771 crash> crash> rd g_bCheckMetaCap ffff000003ececc0: 0000000000000001 ........ crash> crash> set debug 31 debug: 31 crash> set debug 31 debug: 31 text hit rate: 0% (0 of 1) crash> rd g_bCheckMetaCap <addr: ffff000003ececc0 count: 1 flag: 490 (KVADDR)> <readmem: ffff000003ececc0, KVADDR, "64-bit KVADDR", 8, (FOE), ffffc570ddb0> <read_diskdump: addr: ffff000003ececc0 paddr: 202d7e8cecc0 cnt: 8> read_diskdump: paddr/pfn: 202d7e8cecc0/202d7e8ce -> physical page is cached: 202d7e8ce000 ffff000003ececc0: 0000000000000001 ........ text hit rate: 0% (0 of 1) crash> p g_bCheckMetaCap p: per_cpu_symbol_search(g_bCheckMetaCap): NULL g_bCheckMetaCap = GETBUF(328 -> 0) $2 = 2432712771 FREEBUF(0) text hit rate: 50% (1 of 2)