crash-utility / crash

Linux kernel crash utility
https://crash-utility.github.io
837 stars 274 forks source link

Error reading global variable value in module with p command #50

Open cjl20062529 opened 4 years ago

cjl20062529 commented 4 years ago

Hi: I use crash 7.2.6-3 to parse vmcore. The vmcore was generated by kernel 4.19 aarch64。

When I read the global variables in the module, the values ​​returned by the p command and the rd command are different.

crash /boot/vmlinux vmcore

crash 7.2.6-3 Copyright (C) 2002-2019 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details.

GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "aarch64-unknown-linux-gnu"...

WARNING: cannot find NT_PRSTATUS note for cpu: 78 KERNEL: /boot/vmlinux DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 96 DATE: Wed Feb 12 16:23:47 2020 UPTIME: 17 days, 13:05:44 LOAD AVERAGE: 5253.54, 5244.11, 5221.62 TASKS: 11580 NODENAME: 121-6 RELEASE: 4.19.aarch64 VERSION: #1 SMP Mon Jul 22 00:00:00 UTC 2019 MACHINE: aarch64 (unknown Mhz) MEMORY: 96 GB PANIC: "kernel BUG at /xxx/upi_cache.c:120!" PID: 29229 COMMAND: "Jpool" TASK: ffff8022be10be00 [THREAD_INFO: ffff8022be10be00] CPU: 18 STATE: TASK_RUNNING (PANIC)

crash> mod -s snas_ds ./modules/snas_ds.ko MODULE NAME SIZE OBJECT FILE ffff000003ed0900 snas_ds 2887680 ./modules/snas_ds.ko crash> p g_bCheckMetaCap g_bCheckMetaCap = $1 = 2432712771 crash> crash> rd g_bCheckMetaCap ffff000003ececc0: 0000000000000001 ........ crash> crash> set debug 31 debug: 31 crash> set debug 31 debug: 31 text hit rate: 0% (0 of 1) crash> rd g_bCheckMetaCap <addr: ffff000003ececc0 count: 1 flag: 490 (KVADDR)> <readmem: ffff000003ececc0, KVADDR, "64-bit KVADDR", 8, (FOE), ffffc570ddb0> <read_diskdump: addr: ffff000003ececc0 paddr: 202d7e8cecc0 cnt: 8> read_diskdump: paddr/pfn: 202d7e8cecc0/202d7e8ce -> physical page is cached: 202d7e8ce000 ffff000003ececc0: 0000000000000001 ........ text hit rate: 0% (0 of 1) crash> p g_bCheckMetaCap p: per_cpu_symbol_search(g_bCheckMetaCap): NULL g_bCheckMetaCap = GETBUF(328 -> 0) $2 = 2432712771 FREEBUF(0) text hit rate: 50% (1 of 2)

crash-utility commented 4 years ago

----- Original Message -----

Hi: I use crash 7.2.6-3 to parse vmcore. The vmcore was generated by kernel 4.19 aarch64。

When I read the global variables in the module, the values ​​returned by the p command and the rd command are different.

crash /boot/vmlinux vmcore

crash 7.2.6-3 Copyright (C) 2002-2019 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details.

GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "aarch64-unknown-linux-gnu"...

WARNING: cannot find NT_PRSTATUS note for cpu: 78 KERNEL: /boot/vmlinux DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 96 DATE: Wed Feb 12 16:23:47 2020 UPTIME: 17 days, 13:05:44 LOAD AVERAGE: 5253.54, 5244.11, 5221.62 TASKS: 11580 NODENAME: 121-6 RELEASE: 4.19.aarch64 VERSION: #1 SMP Mon Jul 22 00:00:00 UTC 2019 MACHINE: aarch64 (unknown Mhz) MEMORY: 96 GB PANIC: "kernel BUG at /xxx/upi_cache.c:120!" PID: 29229 COMMAND: "Jpool" TASK: ffff8022be10be00 [THREAD_INFO: ffff8022be10be00] CPU: 18 STATE: TASK_RUNNING (PANIC)

crash> mod -s snas_ds ./modules/snas_ds.ko MODULE NAME SIZE OBJECT FILE ffff000003ed0900 snas_ds 2887680 ./modules/snas_ds.ko crash> p g_bCheckMetaCap g_bCheckMetaCap = $1 = 2432712771 crash> crash> rd g_bCheckMetaCap ffff000003ececc0: 0000000000000001 ........ crash> crash> set debug 31 debug: 31 crash> set debug 31 debug: 31 text hit rate: 0% (0 of 1) crash> rd g_bCheckMetaCap <addr: ffff000003ececc0 count: 1 flag: 490 (KVADDR)> <readmem: ffff000003ececc0, KVADDR, "64-bit KVADDR", 8, (FOE), ffffc570ddb0> <read_diskdump: addr: ffff000003ececc0 paddr: 202d7e8cecc0 cnt: 8> read_diskdump: paddr/pfn: 202d7e8cecc0/202d7e8ce -> physical page is cached: 202d7e8ce000 ffff000003ececc0: 0000000000000001 ........ text hit rate: 0% (0 of 1) crash> p g_bCheckMetaCap p: per_cpu_symbol_search(g_bCheckMetaCap): NULL g_bCheckMetaCap = GETBUF(328 -> 0) $2 = 2432712771 FREEBUF(0) text hit rate: 50% (1 of 2)

I don't understand why there's no debug output after the "p g_bCheckMetaCap" command? There should be a "<readmem: ..." line with a virtual address and a "gdb_readmem_callback" type string.

Note that the "rd g_bCheckMetaCap" command shows a readmem debug output line with virtual address ffff000003ececc0 and type "64-bit KVADDR".

In any case, both the rd and the p commands should be requesting the same virtual address, which would be the address shown by "sym g_bCheckMetaCap". But presumably that's not the case for some reason.

cjl20062529 commented 4 years ago

Hi, g_bCheckMetaCap define as U32 g_bCheckMetaCap = 1 crash> p g_bCheckMetaCap g_bCheckMetaCap = $1 = 2432712771 I set debug to 31, but no debug info shown. crash> rd g_bCheckMetaCap ffff000003ececc0: 0000000000000001 ........

p commands seems didnot readmem the virtual address. cmd_p func call gdb interface to get the value. I read a lot of global variables defined in the module in my vmcore, and some displayed incorrectly.

I don't particularly understand the scenario and specific implementation of the p command, can you give me some guidance.

Below is the cmd_p code.

sp = NULL;
if ((sp = symbol_search(args[optind])) && !args[optind+1]) {  //《--

Enter the branch if ((percpu_sp = per_cpu_symbol_search(args[optind])) && display_per_cpu_info(percpu_sp, radix, cpuspec)) return; if (module_symbol(sp->value, NULL, NULL, NULL, *gdb_output_radix)) // <-sp->value is the correct virtual address g_bCheckMetaCap do_load_module_filter = TRUE; } else if ((percpu_sp = per_cpu_symbol_search(args[optind])) && display_per_cpu_info(percpu_sp, radix, cpuspec)) return; else if (st->flags & LOAD_MODULE_SYMS) do_load_module_filter = TRUE;

if (cpuspec) {
    if (sp)
        error(WARNING, "%s is not percpu; cpuspec ignored.\n",
              sp->name);
    else
        /* maybe a valid C expression (e.g. ':') */
        *(cpuspec-1) = ':';
}

process_gdb_output(concat_args(buf1, 0, TRUE), radix,
           sp ? sp->name : NULL, do_load_module_filter);
crash-utility commented 4 years ago

----- Original Message -----

----- Original Message ----- Hi: I use crash 7.2.6-3 to parse vmcore. The vmcore was generated by kernel 4.19 aarch64。 When I read the global variables in the module, the values ​​returned by the p command and the rd command are different. #crash /boot/vmlinux vmcore crash 7.2.6-3 Copyright (C) 2002-2019 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "aarch64-unknown-linux-gnu"... WARNING: cannot find NT_PRSTATUS note for cpu: 78 KERNEL: /boot/vmlinux DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 96 DATE: Wed Feb 12 16:23:47 2020 UPTIME: 17 days, 13:05:44 LOAD AVERAGE: 5253.54, 5244.11, 5221.62 TASKS: 11580 NODENAME: 121-6 RELEASE: 4.19.aarch64 VERSION: #1 SMP Mon Jul 22 00:00:00 UTC 2019 MACHINE: aarch64 (unknown Mhz) MEMORY: 96 GB PANIC: "kernel BUG at /xxx/upi_cache.c:120!" PID: 29229 COMMAND: "Jpool" TASK: ffff8022be10be00 [THREAD_INFO: ffff8022be10be00] CPU: 18 STATE: TASK_RUNNING (PANIC) crash> mod -s snas_ds ./modules/snas_ds.ko MODULE NAME SIZE OBJECT FILE ffff000003ed0900 snas_ds 2887680 ./modules/snas_ds.ko crash> p g_bCheckMetaCap g_bCheckMetaCap = $1 = 2432712771 crash> crash> rd g_bCheckMetaCap ffff000003ececc0: 0000000000000001 ........ crash> crash> set debug 31 debug: 31 crash> set debug 31 debug: 31 text hit rate: 0% (0 of 1) crash> rd g_bCheckMetaCap <addr: ffff000003ececc0 count: 1 flag: 490 (KVADDR)> <readmem: ffff000003ececc0, KVADDR, "64-bit KVADDR", 8, (FOE), ffffc570ddb0> <read_diskdump: addr: ffff000003ececc0 paddr: 202d7e8cecc0 cnt: 8> read_diskdump: paddr/pfn: 202d7e8cecc0/202d7e8ce -> physical page is cached: 202d7e8ce000 ffff000003ececc0: 0000000000000001 ........ text hit rate: 0% (0 of 1) crash> p g_bCheckMetaCap p: per_cpu_symbol_search(g_bCheckMetaCap): NULL g_bCheckMetaCap = GETBUF(328 -> 0) $2 = 2432712771 FREEBUF(0) text hit rate: 50% (1 of 2) I don't understand why there's no debug output after the "p g_bCheckMetaCap" command? There should be a "<readmem: ..." line with a virtual address and a "gdb_readmem_callback" type string. Note that the "rd g_bCheckMetaCap" command shows a readmem debug output line with virtual address ffff000003ececc0 and type "64-bit KVADDR". In any case, both the rd and the p commands should be requesting the same virtual address, which would be the address shown by "sym g_bCheckMetaCap". But presumably that's not the case for some reason.

Hi, p commands seems didnot readmem the virtual address of g_bCheckMetaCap. cmd_p func call gdb interface to get the value.

Below is the cmd_p code.

sp = NULL; if ((sp = symbol_search(args[optind])) && !args[optind+1]) { //《-- Enter the branch if ((percpu_sp = per_cpu_symbol_search(args[optind])) && display_per_cpu_info(percpu_sp, radix, cpuspec)) return; if (module_symbol(sp->value, NULL, NULL, NULL, *gdb_output_radix)) // <-sp->value is the correct virtual address do_load_module_filter = TRUE; } else if ((percpu_sp = per_cpu_symbol_search(args[optind])) && display_per_cpu_info(percpu_sp, radix, cpuspec)) return; else if (st->flags & LOAD_MODULE_SYMS) do_load_module_filter = TRUE;

if (cpuspec) { if (sp) error(WARNING, "%s is not percpu; cpuspec ignored.\n", sp->name); else / maybe a valid C expression (e.g. ':') / *(cpuspec-1) = ':'; }

process_gdb_output(concat_args(buf1, 0, TRUE), radix, sp ? sp->name : NULL, do_load_module_filter);

That's correct. However, when gdb needs to read the data in order to display it, it calls back into the crash utility's gdb_readmem_callback() function. And gdb_readmem_callback() then does the requested readmem() call.

cjl20062529 commented 4 years ago

Hi,

I found that the input parameters of the gdb_readmem_callback function are incorrect. crash> rd g_bCheckMetaCap ffff000003ececc0: 0000000000000001 ........ but gdb_readmem_callback(addr=0xffff000003d79cc0)

" And gdb_readmem_callback() then does the requested readmem() call. " Because it read from the cache, it is not call readmem.

Can you tell me how the addr parameters of the gdb_readmem_callback() are passed. Thanks.

crash-utility commented 4 years ago

----- Original Message -----

Hi,

I found that the input parameters of the gdb_readmem_callback function are incorrect. crash> rd g_bCheckMetaCap ffff000003ececc0: 0000000000000001 ........ but gdb_readmem_callback(addr=0xffff000003d79cc0)

" And gdb_readmem_callback() then does the requested readmem() call. " Because it read from the cache, it is not call readmem.

Can you tell me how the addr parameters of the gdb_readmem_callback() are passed

The "p bCheckMetaCap" string is passed to the embedded gdb module, the gdb code evaluates it, and then reads the resultant address via the call-back into gdb_readmem_callback().

cjl20062529 commented 4 years ago

----- Original Message ----- Hi, I found that the input parameters of the gdb_readmem_callback function are incorrect. crash> rd g_bCheckMetaCap ffff000003ececc0: 0000000000000001 ........ but gdb_readmem_callback(addr=0xffff000003d79cc0) " And gdb_readmem_callback() then does the requested readmem() call. " Because it read from the cache, it is not call readmem. Can you tell me how the addr parameters of the gdb_readmem_callback() are passed The "p bCheckMetaCap" string is passed to the embedded gdb module, the gdb code evaluates it, and then reads the resultant address via the call-back into gdb_readmem_callback().

Can you give me some suggestions so that I can go to the gdb code to find out why p command returns the wrong address?

crash-utility commented 4 years ago

----- Original Message -----

----- Original Message ----- Hi, I found that the input parameters of the gdb_readmem_callback function are incorrect. crash> rd g_bCheckMetaCap ffff000003ececc0: 0000000000000001 ........ but gdb_readmem_callback(addr=0xffff000003d79cc0) " And gdb_readmem_callback() then does the requested readmem() call. " Because it read from the cache, it is not call readmem. Can you tell me how the addr parameters of the gdb_readmem_callback() are passed The "p bCheckMetaCap" string is passed to the embedded gdb module, the gdb code evaluates it, and then reads the resultant address via the call-back into gdb_readmem_callback().

Can you give me some suggestions so that I can go to the gdb code to find out why p command returns the wrong address?

The gdb sources incredibly convoluted, and I am by no means an expert. Start with print_command() in gdb-7.6/gdb/printcmd.c, and go from there. Somewhere in there it will parse the string and evaluate it to an address.

cjl20062529 commented 4 years ago

I still didn't find the reason why gdb can't read the global variable address in the module correctly. I have some new discoveries, crash can not read the global variables in the live system module normally. My test module is as follows:

unsigned long along = 0x1234;
struct aaa {
    int aa;
    unsigned long bb;
} test;

static int test_init(void)
{
        printk("hello, test begin...\n");
        printk("along=0x%lx\n", along);
    test.aa = 0xabc;
    test.bb = 0x789;
    printk("test.aa=0x%lx  test.bb=0x%lx\n", test.aa, test.bb);
        return 0;
}

static void test_exit(void)
{
    printk("bye!\n");
}

If I use the command mod -s test test.o, and then read the 'along' variable, the following correct information is displayed:

crash> mod -s test test.o
     MODULE       NAME                 SIZE  OBJECT FILE
ffff000000a24040  test                16384  test.o 
crash> p /x along
$1 = 0x1234
crash> sym along
ffff000000a24000 (D) along [test]
crash> p /x &along
$2 = 0xffff000000a24000

If I use the command mod -s test test.ko, it is wrong to read 'along' information.

crash> mod -s test test.ko
     MODULE       NAME                 SIZE  OBJECT FILE
ffff000000a24040  test                16384  test.ko 
crash> p /x  along
$2 = 0x1400000004
crash> sym along
ffff000000a24000 (D) along [test]
crash> p /x  &along
$3 = 0xffff000000a23000

This is an inevitable problem. Can any expert give me some advice? @crash-utility @bhupesh-sharma @k-hagio @lian-bo Thanks.

k-hagio commented 4 years ago

@bhupesh-sharma @lian-bo (Seems editing a comment doesn't send a notification..)

It looks like RHEL8 also has the same or a similar issue. I could reproduce it on RHEL8.2 for arm64 and its crash-7.2.7-3.el8, though I could not on x86_64. As @cjl20062529 said above, mod -s test test.ko is NG, but mod -s test test.o looks OK:

crash> mod -s test test.ko
     MODULE       NAME               SIZE  OBJECT FILE
ffff3efb81930040  test             262144  test.ko 
crash> sym -m test
ffff3efb81910000 MODULE START: test
ffff3efb81910000 (t) init_mod
ffff3efb81910000 (T) init_module
ffff3efb81910070 (T) cleanup_module
ffff3efb81910070 (t) exit_mod
ffff3efb81930000 (D) testint
ffff3efb81930008 (D) testlong
ffff3efb81930040 (D) __this_module
ffff3efb81950000 MODULE END: test
crash> rd testint
ffff3efb81930000:  0000000000001234                    4.......
crash> p testint
p: gdb request failed: p testint
crash> mod -d test
crash> mod -s test test.o
     MODULE       NAME               SIZE  OBJECT FILE
ffff3efb81930040  test             262144  test.o 
crash> sym -m test
ffff3efb81910000 MODULE START: test
ffff3efb81910000 (t) init_mod
ffff3efb81910000 (T) init_module
ffff3efb81910070 (T) cleanup_module
ffff3efb81910070 (t) exit_mod
ffff3efb81930000 (D) testint
ffff3efb81930008 (D) testlong
ffff3efb81930040 (d) __this_module
ffff3efb81950000 MODULE END: test
crash> rd testint
ffff3efb81930000:  0000000000001234                    4.......
crash> p testint
testint = $10 = 4660
crash> p -x testint
testint = $20 = 0x1234
cjl20062529 commented 4 years ago

@bhupesh-sharma @lian-bo (Seems editing a comment doesn't send a notification..)

It looks like RHEL8 also has the same or a similar issue. I could reproduce it on RHEL8.2 for arm64 and its crash-7.2.7-3.el8, though I could not on x86_64. As @cjl20062529 said above, mod -s test test.ko is NG, but mod -s test test.o looks OK:

crash> mod -s test test.ko
     MODULE       NAME               SIZE  OBJECT FILE
ffff3efb81930040  test             262144  test.ko 
crash> sym -m test
ffff3efb81910000 MODULE START: test
ffff3efb81910000 (t) init_mod
ffff3efb81910000 (T) init_module
ffff3efb81910070 (T) cleanup_module
ffff3efb81910070 (t) exit_mod
ffff3efb81930000 (D) testint
ffff3efb81930008 (D) testlong
ffff3efb81930040 (D) __this_module
ffff3efb81950000 MODULE END: test
crash> rd testint
ffff3efb81930000:  0000000000001234                    4.......
crash> p testint
p: gdb request failed: p testint
crash> mod -d test
crash> mod -s test test.o
     MODULE       NAME               SIZE  OBJECT FILE
ffff3efb81930040  test             262144  test.o 
crash> sym -m test
ffff3efb81910000 MODULE START: test
ffff3efb81910000 (t) init_mod
ffff3efb81910000 (T) init_module
ffff3efb81910070 (T) cleanup_module
ffff3efb81910070 (t) exit_mod
ffff3efb81930000 (D) testint
ffff3efb81930008 (D) testlong
ffff3efb81930040 (d) __this_module
ffff3efb81950000 MODULE END: test
crash> rd testint
ffff3efb81930000:  0000000000001234                    4.......
crash> p testint
testint = $10 = 4660
crash> p -x testint
testint = $20 = 0x1234

Yes, this problem can be easily reproduced on arm64, and can also be reproduced on RHEL8.2. I have located that it is wrong for gdb to get the address of the variable. The call stack for gdb to obtain the call variables is roughly as follows:

0 var_decode_location (attr=0xaaaaacaf90f8, sym=0xaaaaad8845b0, cu=0xaaaaac1bd590) at dwarf2read.c:15760

1 0x0000aaaaaae20d98 in new_symbol_full (die=0xaaaaacaf9080, type=, cu=0xaaaaac1bd590, space=) at dwarf2read.c:15976

2 0x0000aaaaaae2276c in new_symbol (cu=0xaaaaac1bd590, type=0x0, die=0xaaaaacaf9080) at dwarf2read.c:16222

3 process_die (die=0xaaaaacaf9080, cu=0xaaaaac1bd590) at dwarf2read.c:7275

4 0x0000aaaaaae22914 in read_file_scope (cu=0xaaaaac1bd590, die=0xaaaaacab20d0) at dwarf2read.c:8015

5 process_die (die=0xaaaaacab20d0, cu=0xaaaaac1bd590) at dwarf2read.c:7201

6 0x0000aaaaaae2653c in process_full_comp_unit (pretend_language=, per_cu=) at dwarf2read.c:7005

7 process_queue () at dwarf2read.c:6570

8 dw2_do_instantiate_symtab (per_cu=) at dwarf2read.c:2295

9 0x0000aaaaaae27b34 in dwarf2_read_symtab (self=0xaaaaacaab140, objfile=0xaaaaacb1da00) at dwarf2read.c:6459

10 0x0000aaaaaad94684 in psymtab_to_symtab (objfile=objfile@entry=0xaaaaacb1da00, pst=pst@entry=0xaaaaacaab140) at psymtab.c:781

11 0x0000aaaaaad96224 in lookup_symbol_aux_psymtabs (objfile=0xaaaaacb1da00, block_index=0, name=0xaaaaab723de0 "along", domain=VAR_DOMAIN) at psymtab.c:515

12 0x0000aaaaaad8efe4 in lookup_symbol_aux_quick (objfile=0xaaaaacb1da00, kind=0, name=0xaaaaab723de0 "along", domain=VAR_DOMAIN) at symtab.c:1645

13 0x0000aaaaaad8f1ec in lookup_symbol_global_iterator_cb (objfile=0xaaaaacb1da00, cb_data=0xffffffffc010) at symtab.c:1774

14 0x0000aaaaaadfaed4 in default_iterate_over_objfiles_in_search_order (gdbarch=, cb=0xaaaaaad8f188 , cb_data=0xffffffffc010, current_objfile=)

at objfiles.c:1436

15 0x0000aaaaaad8eba8 in lookup_symbol_global (name=0xaaaaab723de0 "along", block=, domain=VAR_DOMAIN) at symtab.c:1804

16 0x0000aaaaaad8f3fc in lookup_symbol_aux (is_a_field_of_this=0x0, language=language_c, domain=VAR_DOMAIN, block=0x0, name=0xaaaaab723de0 "along") at symtab.c:1380

17 lookup_symbol_in_language (name=name@entry=0xaaaaab723de0 "along", block=0x0, domain=VAR_DOMAIN, lang=language_c, is_a_field_of_this=0x0) at symtab.c:1213

18 0x0000aaaaaad8f570 in lookup_symbol (name=name@entry=0xaaaaab723de0 "along", block=, block@entry=0x0, domain=domain@entry=VAR_DOMAIN, is_a_field_of_this=) at symtab.c:1241

19 0x0000aaaaaad27768 in classify_name (block=0x0) at c-exp.y:2766

20 0x0000aaaaaad299d0 in c_lex () at c-exp.y:2934

21 c_parse_internal () at c-exp.c:1938

22 0x0000aaaaaad2bd70 in c_parse () at c-exp.y:3064

23 0x0000aaaaaadf0dcc in parse_exp_in_context (stringptr=0x0, stringptr@entry=0xffffffffdfb0, pc=pc@entry=0, block=block@entry=0x0, comma=comma@entry=0, out_subexp=out_subexp@entry=0x0, void_context_p=0)

at parse.c:1234

24 0x0000aaaaaadf1034 in parse_exp_1 (stringptr=stringptr@entry=0xffffffffdfd8, pc=pc@entry=0, block=block@entry=0x0, comma=comma@entry=0) at parse.c:1136

25 0x0000aaaaaadf10bc in parse_expression (string=) at parse.c:1279

26 0x0000aaaaaad88fc4 in print_command_1 (exp=, voidprint=1) at ./printcmd.c:972

27 0x0000aaaaaae83e28 in execute_command (p=, from_tty=1) at top.c:484

28 0x0000aaaaaad93d9c in gdb_command_funnel (req=0xaaaaab220070 , req@entry=0x1) at symtab.c:5174

29 0x0000aaaaaac2f504 in gdb_interface (req=0x1, req@entry=0xaaaaab220070 ) at gdb_interface.c:397

30 0x0000aaaaaac2fc4c in gdb_pass_through (cmd=cmd@entry=0xffffffffe948 "p along", fptr=fptr@entry=0x0, flags=flags@entry=8) at gdb_interface.c:332

31 0x0000aaaaaac59ba4 in process_gdb_output (gdb_request=0xffffffffe948 "p along", radix=radix@entry=0, leader=0xaaaaacf54b40 "along", do_load_module_filter=do_load_module_filter@entry=1) at symbols.c:7323

32 0x0000aaaaaac63854 in cmd_p () at symbols.c:7305

33 0x0000aaaaaab958d8 in exec_command () at main.c:879

34 0x0000aaaaaab95c1c in main_loop () at main.c:826

35 0x0000aaaaaadc3488 in captured_command_loop (data=) at main.c:258

36 0x0000aaaaaadc18fc in catch_errors (func=0x1, func@entry=0xaaaaaadc3468 , func_args=0x1, func_args@entry=0x0, errstring=0xfffffffff1a0 "\020\362\377\377\377\377",

errstring@entry=0xaaaaaaff6840 "", mask=403589793, mask@entry=6) at exceptions.c:557

37 0x0000aaaaaadc4664 in captured_main (data=) at main.c:1064

38 0x0000aaaaaadc18fc in catch_errors (func=0xaaaaaab94014 <main+2620>, func@entry=0xaaaaaadc3810 , func_args=0xaaaaab18ac58 , func_args@entry=0xfffffffff248,

errstring=0xfffffffff260 "\340\362\377\377\377\377", errstring@entry=0xaaaaaaff6840 "", mask=403589793, mask@entry=6) at exceptions.c:557

39 0x0000aaaaaadc4a14 in gdb_main (args=0xfffffffff248) at main.c:1079

40 gdb_main_entry (argc=, argv=) at main.c:1099

41 0x0000aaaaaab94014 in main (argc=43690, argv=0x0) at main.c:707

In the layer 0 call stack, the information in ‘’attr‘’ contains the global variable address of the module, and attr is obtained by ‘’die‘’. I can't find where to get the die information.

lian-bo commented 4 years ago

@bhupesh-sharma @lian-bo (Seems editing a comment doesn't send a notification..)

It looks like RHEL8 also has the same or a similar issue. I could reproduce it on RHEL8.2 for arm64 and its crash-7.2.7-3.el8, though I could not on x86_64. As @cjl20062529 said above, mod -s test test.ko is NG, but mod -s test test.o looks OK:

crash> mod -s test test.ko
     MODULE       NAME               SIZE  OBJECT FILE
ffff3efb81930040  test             262144  test.ko 
crash> sym -m test
ffff3efb81910000 MODULE START: test
ffff3efb81910000 (t) init_mod
ffff3efb81910000 (T) init_module
ffff3efb81910070 (T) cleanup_module
ffff3efb81910070 (t) exit_mod
ffff3efb81930000 (D) testint
ffff3efb81930008 (D) testlong
ffff3efb81930040 (D) __this_module
ffff3efb81950000 MODULE END: test
crash> rd testint
ffff3efb81930000:  0000000000001234                    4.......
crash> p testint
p: gdb request failed: p testint
crash> mod -d test
crash> mod -s test test.o
     MODULE       NAME               SIZE  OBJECT FILE
ffff3efb81930040  test             262144  test.o 
crash> sym -m test
ffff3efb81910000 MODULE START: test
ffff3efb81910000 (t) init_mod
ffff3efb81910000 (T) init_module
ffff3efb81910070 (T) cleanup_module
ffff3efb81910070 (t) exit_mod
ffff3efb81930000 (D) testint
ffff3efb81930008 (D) testlong
ffff3efb81930040 (d) __this_module
ffff3efb81950000 MODULE END: test
crash> rd testint
ffff3efb81930000:  0000000000001234                    4.......
crash> p testint
testint = $10 = 4660
crash> p -x testint
testint = $20 = 0x1234

@k-hagio This should be related to the gdb behavior, not a crash issue. If the gdb loads the test.ko, it can be reproduced on the latest gdb-9.2 as follow:

gdb/gdb /home/mod/test/test.ko

GNU gdb (GDB) 9.2 ......

Reading symbols from /home/mod/test/test.ko...

(gdb) p along $1 = 85899345924 ---here is not correct value.

[root@hpe-apollo-cn99xx-14-vm-08 build]# gdb/gdb /home/mod/test/test.o GNU gdb (GDB) 9.2 ...... Reading symbols from /home/mod/test/test.o... (gdb) p along $1 = 4660 ---here is correct value as expected.(0x1234)

@cjl20062529 Can you help to report this issue in gdb? Thanks.

BTW: I would like to forward this issue to the gdb maintainer(Pedro Alves), who knows more details about gdb, and we can still discuss with them together. Thanks.

k-hagio commented 4 years ago

@lian-bo, thanks for the info. But I'm not sure whether the gdb behavior you showed is the same as this issue in crash.

I tried the following and it looks to be fixed in gdb-9.2 (and gdb-9.1 as well):

# ../gdb-9.2 /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux /proc/kcore
...
(gdb) add-symbol-file test.ko 0xffff0000091d0000 -s .data 0xffff0000091f0000
add symbol table from file "test.ko" at
        .text_addr = 0xffff0000091d0000
        .data_addr = 0xffff0000091f0000
(y or n) y
Reading symbols from test.ko...
(gdb) print /x testlong
$1 = 0x1234            <<-- correct

# ../gdb-8.3 /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux /proc/kcore
...
(gdb) add-symbol-file test.ko 0xffff0000091d0000 -s .data 0xffff0000091f0000
add symbol table from file "test.ko" at
        .text_addr = 0xffff0000091d0000
        .data_addr = 0xffff0000091f0000
(y or n) y
Reading symbols from test.ko...
(gdb) p /x testlong
$1 = 0x554e4700000003  <<-- wrong

If this behavior is the issue, it looks to be fixed by this patch to me:

commit 4b610737f02338b2aea7641ab771aa5e137d067c
Author: Tom Tromey <tromey@adacore.com>
Date:   Tue Jun 25 12:50:45 2019 -0600

    Handle copy relocations

FYI, the add-symbol-file command is seen in crash's debug output:

crash> set debug 1
crash> mod -s test test.ko
...
add-symbol-file test.ko 0xffff0000091d0000  -s .data 0xffff0000091f0000
add symbol table from file "test.ko" at
        .text_addr = 0xffff0000091d0000
        .data_addr = 0xffff0000091f0000
     MODULE       NAME               SIZE  OBJECT FILE
ffff0000091f0040  test             262144  test.ko 
crash> 
lian-bo commented 4 years ago

@lian-bo, thanks for the info. But I'm not sure whether the gdb behavior you showed is the same as this issue in crash.

It may be similar to this issue, but not sure if the gdb still has another issue about this, the behavior looks strange.

I tried the following and it looks to be fixed in gdb-9.2 (and gdb-9.1 as well):

In general, I tend to use the file command to load it as follow, and got the wrong result. If I use the add-symbol-file command to load a symbol file with additional parameters, and got the correct result.

[root@hpe-apollo-cn99xx-14-vm-08 test]# /home/mod/binutils-gdb/build/gdb/gdb /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux /proc/kcore GNU gdb (GDB) 8.3.50.20191002-git ...... Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux... [New process 1] Core was generated by `BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-193.el8.aarch64 root=/dev/mapper/rhel_hpe-'.

0 0x0000000000000000 in ?? ()

(gdb) file test.ko warning: core file may not match specified executable file. Load new symbol table from "test.ko"? (y or n) y Reading symbols from test.ko... (gdb) p along $1 = 85899345924 (gdb) quit

[root@hpe-apollo-cn99xx-14-vm-08 test]# /home/mod/binutils-gdb/build/gdb/gdb /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux /proc/kcore GNU gdb (GDB) 8.3.50.20191002-git ...... Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux... [New process 1] Core was generated by `BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-193.el8.aarch64 root=/dev/mapper/rhel_hpe-'.

0 0x0000000000000000 in ?? ()

(gdb) add-symbol-file test.ko 0xffff3fedc60e0000 -s .data 0xffff3fedc6100000 -s .bss 0xffff3fedc61003c0 add symbol table from file "test.ko" at .text_addr = 0xffff3fedc60e0000 .data_addr = 0xffff3fedc6100000 .bss_addr = 0xffff3fedc61003c0 (y or n) y Reading symbols from test.ko... (gdb) p along $1 = 4660 (gdb) quit

# ../gdb-9.2 /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux /proc/kcore
...
(gdb) add-symbol-file test.ko 0xffff0000091d0000 -s .data 0xffff0000091f0000
add symbol table from file "test.ko" at
        .text_addr = 0xffff0000091d0000
        .data_addr = 0xffff0000091f0000
(y or n) y
Reading symbols from test.ko...
(gdb) print /x testlong
$1 = 0x1234            <<-- correct

# ../gdb-8.3 /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux /proc/kcore
...
(gdb) add-symbol-file test.ko 0xffff0000091d0000 -s .data 0xffff0000091f0000
add symbol table from file "test.ko" at
        .text_addr = 0xffff0000091d0000
        .data_addr = 0xffff0000091f0000
(y or n) y
Reading symbols from test.ko...
(gdb) p /x testlong
$1 = 0x554e4700000003  <<-- wrong

If this behavior is the issue, it looks to be fixed by this patch to me:

Good findings. I just tested the commit: 4b610737f0 ("Handle copy relocations") with your method, and I saw the gdb works well and got the correct result as expected.

commit 4b610737f02338b2aea7641ab771aa5e137d067c
Author: Tom Tromey <tromey@adacore.com>
Date:   Tue Jun 25 12:50:45 2019 -0600

    Handle copy relocations

FYI, the add-symbol-file command is seen in crash's debug output:

The above commit may be able to fix the current problem that we are facing in the crash utility. In view of this, introduce this patch to the crash utility or rebase to the latest gdb, any thoughts? Thanks.

crash> set debug 1
crash> mod -s test test.ko
...
add-symbol-file test.ko 0xffff0000091d0000  -s .data 0xffff0000091f0000
add symbol table from file "test.ko" at
        .text_addr = 0xffff0000091d0000
        .data_addr = 0xffff0000091f0000
     MODULE       NAME               SIZE  OBJECT FILE
ffff0000091f0040  test             262144  test.ko 
crash> 
cjl20062529 commented 4 years ago

Good findings. I just tested the commit: 4b610737f0 ("Handle copy relocations") with your method, and I saw the gdb works well and got the correct result as expected.

commit 4b610737f02338b2aea7641ab771aa5e137d067c
Author: Tom Tromey <tromey@adacore.com>
Date:   Tue Jun 25 12:50:45 2019 -0600

    Handle copy relocations

FYI, the add-symbol-file command is seen in crash's debug output:

The above commit may be able to fix the current problem that we are facing in the crash utility. In view of this, introduce this patch to the crash utility or rebase to the latest gdb, any thoughts? Thanks.

This is really good news, thanks. @k-hagio @lian-bo

I have tried to upgrade gdb in crash before and it failed because the difference is too big. But you have proved that this patch is a repair patch, which is really good.

I tried to consult with gdb maintainer, but I didn't find a communication channel.

k-hagio commented 4 years ago

I'll see if the patch can be applied to the crash utility later on. Rebasing gdb should be a very tough work for us (at least for me), so I think it would be a last resort.

k-hagio commented 4 years ago

Hmm, I tried to apply the patch to crash, but it looks pretty hard for me because it got changed from C into C++ and I'm also not an expert on gdb. Who can do this?

Another approach I think of is that, it looks like the debug modules (*.ko.debug) in RHEL8 kernel-debuginfo don't reproduce this issue as far as I've checked, so there might be some option or config to be able to avoid this issue?

lian-bo commented 4 years ago

Hmm, I tried to apply the patch to crash, but it looks pretty hard for me because it got changed from C into C++ and I'm also not an expert on gdb. Who can do this?

There are too many differences between gdb-7.6 and gdb-8.3+, and there are some dependencies. It's not easy to backport from the latest gdb. Anyway, let me investigate later to see if this is doable.

Another approach I think of is that, it looks like the debug modules (*.ko.debug) in RHEL8 kernel-debuginfo don't reproduce this issue as far as I've checked, so there might be some option or config to be able to avoid this issue?

It might be worth looking into what happened. btw: I checked the config, unfortunately, I didn't see useful clues, there may be some compile options?

k-hagio commented 4 years ago

I have not found any good workaround or fix for this issue so far. Please use rd and struct commands instead of p for now.

xuchunmei000 commented 1 year ago

I have not found any good workaround or fix for this issue so far. Please use rd and struct commands instead of p for now.

hi,I am using crash-8.0.2 with gdb 10.2, the problem is still exist.