crash-utility / crash

Linux kernel crash utility
https://crash-utility.github.io
788 stars 266 forks source link

crash can not open big vmcore caused from arm64 centos7.6. #138

Open lelbalala opened 1 year ago

lelbalala commented 1 year ago

1、vmcore from OS: [root@sds-98 ~]# uname -a Linux sds-98 4.14.0-115.el7a.0.1.aarch64 #1 SMP Sun Nov 25 20:54:21 UTC 2018 aarch64 aarch64 aarch64 GNU/Linux [root@sds-98 ~]# cat /etc/redhat-release CentOS Linux release 7.6.1810 (AltArch)

[root@arm-sds-213 crash-8.0.3]# ll -h ../vmcore -rw------- 1 root root 5.1G Apr 22 21:48 ../vmcore

2、When this vmcore is big from real machine ,we can not use crash to debug, errors always like this: Tips: even if we do nothing and just use "echo c > /proc/sysrq-trigger" to generate vmcore,crash can not work, errors is the same. [root@arm-sds-213 crash-8.0.3]# ./crash -d 30 /usr/lib/debug/lib/modules/4.14.0-115.el7a.0.1.aarch64/vmlinux ../vmcore

crash 8.0.3 Copyright (C) 2002-2022 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011, 2020-2022 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. Copyright (C) 2015, 2021 VMware, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details.

compressed kdump: header->utsname.machine: aarch64 compressed kdump: memory bitmap offset: 20000 diskdump_data: filename: ../vmcore flags: 6 (KDUMP_CMPRS_LOCAL|ERROR_EXCLUDED) dfd: 3 ofp: 0 machine_type: 183 (EM_AARCH64)

        header: 23241140
       signature: "KDUMP   "
  header_version: 6
         utsname:
           sysname: Linux
          nodename: sds-98
           release: 4.14.0-115.el7a.0.1.aarch64
           version: #1 SMP Sun Nov 25 20:54:21 UTC 2018
           machine: aarch64
        domainname: (none)
       timestamp:
            tv_sec: 6443ad98
           tv_usec: 0
          status: 2 (DUMP_DH_COMPRESSED_LZO)
      block_size: 65536
    sub_hdr_size: 1
   bitmap_blocks: 2064
       max_mapnr: 541048832
total_ram_blocks: 0
   device_blocks: 0
  written_blocks: 0
     current_cpu: 0
         nr_cpus: 64
  tasks[nr_cpus]: 0
                  0
                  0
                  0
                  0
                  0

....

    sub_header: 0 (n/a)

sub_header_kdump: 23251150 phys_base: 0 dump_level: 31 (0x1f) (DUMP_EXCLUDE_ZERO|DUMP_EXCLUDE_CACHE|DUMP_EXCLUDE_CACHE_PRI|DUMP_EXCLUDE_USER_DATA|DUMP_EXCLUDE_FREE) split: 0 start_pfn: (unused) end_pfn: (unused) offset_vmcoreinfo: 92032 (0x16780) size_vmcoreinfo: 1806 (0x70e) OSRELEASE=4.14.0-115.el7a.0.1.aarch64 PAGESIZE=65536 SYMBOL(init_uts_ns)=ffff000008dc53b8 SYMBOL(node_online_map)=ffff000008dbd670 SYMBOL(swapper_pg_dir)=ffff000009910000 SYMBOL(_stext)=ffff000008081000 SYMBOL(vmap_area_list)=ffff000008e6df98 SYMBOL(mem_section)=ffff00000985f200 LENGTH(mem_section)=64 SIZE(mem_section)=16 OFFSET(mem_section.section_mem_map)=0 SIZE(page)=64 SIZE(pglist_data)=6912 SIZE(zone)=1920 SIZE(free_area)=88 SIZE(list_head)=16 SIZE(nodemask_t)=8 OFFSET(page.flags)=0 OFFSET(page._refcount)=28 OFFSET(page.mapping)=8 OFFSET(page.lru)=32 OFFSET(page._mapcount)=24 OFFSET(page.private)=48 OFFSET(page.compound_dtor)=40 OFFSET(page.compound_order)=44 OFFSET(page.compound_head)=32 OFFSET(pglist_data.node_zones)=0 OFFSET(pglist_data.nr_zones)=6176 OFFSET(pglist_data.node_start_pfn)=6184 OFFSET(pglist_data.node_spanned_pages)=6200 OFFSET(pglist_data.node_id)=6208 OFFSET(zone.free_area)=256 OFFSET(zone.vm_stat)=1664 OFFSET(zone.spanned_pages)=96 OFFSET(free_area.free_list)=0 OFFSET(list_head.next)=0 OFFSET(list_head.prev)=8 OFFSET(vmap_area.va_start)=0 OFFSET(vmap_area.list)=48 LENGTH(zone.free_area)=14 SYMBOL(log_buf)=ffff000008dff560 SYMBOL(log_buf_len)=ffff000008dff558 SYMBOL(log_first_idx)=ffff000009694b40 SYMBOL(clear_idx)=ffff000009694b50 SYMBOL(log_next_idx)=ffff000009694b44 SIZE(printk_log)=16 OFFSET(printk_log.ts_nsec)=0 OFFSET(printk_log.len)=8 OFFSET(printk_log.text_len)=10 OFFSET(printk_log.dict_len)=12 LENGTH(free_area.free_list)=5 NUMBER(NR_FREE_PAGES)=0 NUMBER(PG_lru)=5 NUMBER(PG_private)=12 NUMBER(PG_swapcache)=9 NUMBER(PG_slab)=8 NUMBER(PG_hwpoison)=21 NUMBER(PG_head_mask)=32768 NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)=-128 NUMBER(HUGETLB_PAGE_DTOR)=2 NUMBER(VA_BITS)=48 NUMBER(kimage_voffset)=0xffff000008000000 NUMBER(PHYS_OFFSET)=0x0 CRASHTIME=1682156952 offset_note: 65640 (0x10068) size_note: 28200 (0x6e28) notes_buf: 23261160 num_vmcoredd_notes: 0 num_prstatus_notes: 64 notes[0]: 23261160 (NT_PRSTATUS) si.signo: 0 si.code: 0 si.errno: 0 cursig: 0 sigpend: 0 sighold: 0 pid: 0 ppid: 0 pgrp: 0 sid:0 utime: 0.000000 stime: 0.000000 cutime: 0.000000 cstime: 0.000000 X0: 0000803feec80000 X1: ffff000008d50018 X2: ffff000008dc0c44 X3: 0000000000000001 X4: 0000000000000000 X5: 00000004d05d82ba X6: 00000000000034f0 X7: 00000000ffffa66b X8: ffff000008dc6360 X9: ffff000008d8fe30 X10: 0000000000000d00 X11: 0000000000000000 X12: 0000000000000000 X13: 0000000000000000 X14: 0000000000000000 X15: 0000000000000068 X16: 0000000000010000 X17: 0000ffffa1e56a80 X18: 0000000000000000 X19: ffff000008f224b8 X20: ffff000008d50000 X21: ffff000008f22000 X22: 0000000000000000 X23: ffff000008dbc50c X24: 0000000000000000 X25: 0000000000000000 X26: 0000203ffff99f40 X27: 0000203ffff730ad X28: 0000000000c00018 X29: ffff000008d8fec0 LR: ffff0000080858c8 SP: ffff000008d8fec0 PC: ffff0000080858cc PSTATE: 60c00009 FPVALID: 00000000 notes[1]: 232612fc (NT_PRSTATUS) si.signo: 0 si.code: 0 si.errno: 0 cursig: 0 sigpend: 0 sighold: 0 pid: 0 ppid: 0 pgrp: 0 sid:0 utime: 0.000000 stime: 0.000000 cutime: 0.000000 cstime: 0.000000 X0: 0000803feecb0000 X1: ffff000008d50018 X2: ffff000008dc0c44 X3: 0000000000000001 X4: 0000000000000000 X5: ffff000009601d00 X6: 00000000000034b8 X7: 00000000ffffa64e X8: ffff80208744b760 X9: ffff00001898fea0 X10: 0000000000000d00 X11: ffff800001f80000 X12: 00000000000000fb X13: 0000000000000000 X14: ffff7fe00fecf000 X15: 0000000000000008 X16: 0000000000010000 X17: 0000ffffafc1fb30 X18: 0000000000000016 X19: ffff000008f224b8 X20: ffff000008d50000 X21: ffff000008f22000 X22: 0000000000000001 X23: ffff000008dbc50c X24: 0000000000000000 X25: 0000000000000000 X26: 0000000000000000 X27: 0000000000000000 X28: 0000000000000000 X29: ffff00001898ff30 LR: ffff0000080858c8 SP: ffff00001898ff30 PC: ffff0000080858cc PSTATE: 60c00009 FPVALID: 00000000 notes[2]: 23261498 (NT_PRSTATUS) si.signo: 0 si.code: 0 si.errno: 0 cursig: 0 sigpend: 0 sighold: 0 pid: 0 ppid: 0 pgrp: 0 sid:0 utime: 0.000000 stime: 0.000000 cutime: 0.000000 cstime: 0.000000 X0: 0000803feece0000 X1: ffff000008d50018 X2: ffff000008dc0c44 X3: 0000000000000001 X4: 0000000000000000 X5: 000006d4504c2180 X6: ffff803ff7a39168 X7: 0000000000000000 X8: 0000000000000000

<read_diskdump: addr: ffffa03ff7bc0090 paddr: 203ff7bc0090 cnt: 8> read_diskdump: SEEK_ERROR: paddr/pfn: 203ff7bc0090/203ff7bc max_mapnr: 203fc000 crash: seek error: kernel virtual address: ffffa03ff7bc0090 type: "IRQ stack pointer" <readmem: ffffa03ff7bf0090, KVADDR, "IRQ stack pointer", 8, (Q), 236f43a8> <read_diskdump: addr: ffffa03ff7bf0090 paddr: 203ff7bf0090 cnt: 8> read_diskdump: SEEK_ERROR: paddr/pfn: 203ff7bf0090/203ff7bf max_mapnr: 203fc000 crash: seek error: kernel virtual address: ffffa03ff7bf0090 type: "IRQ stack pointer" <readmem: ffffa03ff7c20090, KVADDR, "IRQ stack pointer", 8, (Q), 236f43b0> <read_diskdump: addr: ffffa03ff7c20090 paddr: 203ff7c20090 cnt: 8> read_diskdump: SEEK_ERROR: paddr/pfn: 203ff7c20090/203ff7c2 max_mapnr: 203fc000 crash: seek error: kernel virtual address: ffffa03ff7c20090 type: "IRQ stack pointer" <readmem: ffffa03ff7c50090, KVADDR, "IRQ stack pointer", 8, (Q), 236f43b8> <read_diskdump: addr: ffffa03ff7c50090 paddr: 203ff7c50090 cnt: 8> read_diskdump: SEEK_ERROR: paddr/pfn: 203ff7c50090/203ff7c5 max_mapnr: 203fc000 crash: seek error: kernel virtual address: ffffa03ff7c50090 type: "IRQ stack pointer" <readmem: ffffa03ff7c80090, KVADDR, "IRQ stack pointer", 8, (Q), 236f43c0> <read_diskdump: addr: ffffa03ff7c80090 paddr: 203ff7c80090 cnt: 8> read_diskdump: SEEK_ERROR: paddr/pfn: 203ff7c80090/203ff7c8 max_mapnr: 203fc000 crash: seek error: kernel virtual address: ffffa03ff7c80090 type: "IRQ stack pointer" <readmem: ffffa03ff7cb0090, KVADDR, "IRQ stack pointer", 8, (Q), 236f43c8> <read_diskdump: addr: ffffa03ff7cb0090 paddr: 203ff7cb0090 cnt: 8> read_diskdump: SEEK_ERROR: paddr/pfn: 203ff7cb0090/203ff7cb max_mapnr: 203fc000 crash: seek error: kernel virtual address: ffffa03ff7cb0090 type: "IRQ stack pointer" <readmem: ffffa03ff7ce0090, KVADDR, "IRQ stack pointer", 8, (Q), 236f43d0> <read_diskdump: addr: ffffa03ff7ce0090 paddr: 203ff7ce0090 cnt: 8> read_diskdump: SEEK_ERROR: paddr/pfn: 203ff7ce0090/203ff7ce max_mapnr: 203fc000 crash: seek error: kernel virtual address: ffffa03ff7ce0090 type: "IRQ stack pointer" <readmem: ffffa03ff7d10090, KVADDR, "IRQ stack pointer", 8, (Q), 236f43d8> <read_diskdump: addr: ffffa03ff7d10090 paddr: 203ff7d10090 cnt: 8> read_diskdump: SEEK_ERROR: paddr/pfn: 203ff7d10090/203ff7d1 max_mapnr: 203fc000 crash: seek error: kernel virtual address: ffffa03ff7d10090 type: "IRQ stack pointer" overflow_stack: type: 2, TYPE_CODE_ARRAY target_typecode: 8, TYPE_CODE_INT target_length: 8 length: 4096 crash: builtin stackframe.sp offset differs from kernel version crash: builtin stackframe.pc offset differs from kernel version GETBUF(344 -> 0) GNU_PASS_THROUGH: returned via gdb_error_hook (1 buffer in use) FREEBUF(0) GETBUF(344 -> 0) GNU_PASS_THROUGH: returned via gdb_error_hook (1 buffer in use) FREEBUF(0) GNU_GET_DATATYPE[kmem_slab_s]: returned via gdb_error_hook GNU_GET_DATATYPE[slab_s]: returned via gdb_error_hook GNU_GET_DATATYPE[slab]: returned via gdb_error_hook GNU_GET_DATATYPE[kmem_cache_s]: returned via gdb_error_hook GETBUF(344 -> 0) GNU_PASS_THROUGH: returned via gdb_error_hook (1 buffer in use) FREEBUF(0)

3、I find this in defs.h in crash codes:

arm64_stackframe_init: if (offsetof(struct arm64_stackframe, sp) != MEMBER_OFFSET("stackframe", "sp")) { if (CRASHDEBUG(1)) error(INFO, "builtin stackframe.sp offset differs from kernel version\n"); } if (offsetof(struct arm64_stackframe, fp) != MEMBER_OFFSET("stackframe", "fp")) { if (CRASHDEBUG(1)) error(INFO, "builtin stackframe.fp offset differs from kernel version\n"); }

struct arm64_stackframe { unsigned long fp; unsigned long sp; unsigned long pc; }; but this struct in centos7.6 arm64 is in arch/arm64/include/asm/stacktrace.h as: struct stackframe { unsigned long fp; unsigned long pc;

ifdef CONFIG_FUNCTION_GRAPH_TRACER

int graph;

endif

};

But there is a small vmcore which generated from the same OS and arm64 cpu,it can debug by crash: Tips: As we can see below,struct stackframe do not has sp.

[root@arm-sds-213 crash-8.0.3]# ll -h ../vmcore.137 -rw------- 1 root root 376M Apr 22 22:38 ../vmcore.137 [root@arm-sds-213 crash-8.0.3]# ./crash /usr/lib/debug/lib/modules/4.14.0-115.el7a.0.1.aarch64/vmlinux ../vmcore.137 crash 8.0.3 Copyright (C) 2002-2022 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011, 2020-2022 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. Copyright (C) 2015, 2021 VMware, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details.

GNU gdb (GDB) 10.2 Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "aarch64-unknown-linux-gnu". Type "show configuration" for configuration details. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/.

For help, type "help". Type "apropos word" to search for commands related to "word"...

  KERNEL: /usr/lib/debug/lib/modules/4.14.0-115.el7a.0.1.aarch64/vmlinux  [TAINTED]
DUMPFILE: ../vmcore.137  [PARTIAL DUMP]
    CPUS: 16
    DATE: Fri Apr 21 03:02:00 CST 2023
  UPTIME: 126 days, 12:53:11

LOAD AVERAGE: 0.73, 0.61, 0.48 TASKS: 666 NODENAME: 10-252-60-137 RELEASE: 4.14.0-115.el7a.0.1.aarch64 VERSION: #1 SMP Sun Nov 25 20:54:21 UTC 2018 MACHINE: aarch64 (unknown Mhz) MEMORY: 24 GB PANIC: "sysrq: SysRq : Trigger a crash" PID: 12561 COMMAND: "bash" TASK: ffff800403646600 [THREAD_INFO: ffff800403646600] CPU: 13 STATE: TASK_RUNNING (SYSRQ)

crash> struct stackframe struct stackframe { unsigned long fp; unsigned long pc; int graph; } SIZE: 24 crash>

Is the crash do not support big vmcore file from arm64 centos7.6 ? I have also tried crash-7.3.0, 7.3.2, all can not work with the same errors.

k-hagio commented 1 year ago
<read_diskdump: addr: ffffa03ff7bc0090 paddr: 203ff7bc0090 cnt: 8>
read_diskdump: SEEK_ERROR: paddr/pfn: 203ff7bc0090/203ff7bc max_mapnr: 203fc000

Hmm, it's strange that pfn is higher than max_mapnr..

dsouzae commented 1 year ago

Check if you have enough space in /var/tmp - at least the same size of your core file.

I had a similar issue with a 56GB core file. There is a hard code in ramdump.c and you can use TMPDIR environment variable to override the one in symbol.c

The following line has to be changed: https://github.com/crash-utility/crash/blob/342cf340ed0386880fe2a3115d6bef32eabb511b/ramdump.c#L35