crash-utility / crash

Linux kernel crash utility
https://crash-utility.github.io
837 stars 274 forks source link

crash bt -c 0 got wrong info when cpu0 save notes failed #105

Closed xuchunmei000 closed 2 years ago

xuchunmei000 commented 2 years ago

my platform is aarch64 with kernel version 5.10.23, crash 7.2.9, kexec-tools 2.0.21, makedumpfile 1.6.9 when system crashes, cpu 0 and some other cpus are failed to stop ,following is some information about vmcore. cpu126 is the panic cpu, and cpu 1 also failed to stop. use help -D to get vmcore info, found that only one elf note parsed from vmcore, which should be cpu126, because other cpus are failed to stop, and only cpu126 can show backtrace.

crash> bt -c 1
PID: 66538  TASK: ffff00081749c200  CPU: 1   COMMAND: "fc_vcpu41"
bt: WARNING: cannot determine starting stack frame for task ffff00081749c200
crash> bt -c 0
PID: 66516  TASK: ffff00084642e300  CPU: 0   COMMAND: "fc_vcpu19"
Segmentation fault
crash> help -D | grep prstatus
  num_prstatus_notes: 1
crash> bt -c 126
PID: 0      TASK: ffff0400064de300  CPU: 126  COMMAND: "swapper/126"
 #0 [ffff8000250f3c90] __crash_kexec at ffff80001013a064
 #1 [ffff8000250f3e30] panic at ffff800010afd028
...
...

I found that arm64_get_crash_notes function, when get crash_notes failed, it will change to call diskdump_get_prstatus_percpu to get elf note from nt_prstatus_percpu, cpu0 will get dd->nt_prstatus_percpu[0] as note.

dd->nt_prstatus_percpu is parsed from vmcore for each cpu , when cpu offline or stop failed before crash, crash notes or elf notes failed to be saved, therefore use cpu as index to get note from dd->nt_prstatus_percpu will be wrong.

Any ideas to avoid to get wrong note for offline cpu or cpu failed to save notes ?

k-hagio commented 2 years ago

thanks for the report. @lian-bo, I've not have been able to look into this yet, but this will be reproduced also on RHEL?

lian-bo commented 2 years ago

So far I haven't seen it on RHEL. Could you please try it on the latest crash-7.3 or crash-8.0? If this is still reproduced, would you mind sharing the vmcore or the reproducible steps?

xuchunmei000 commented 2 years ago

So far I haven't seen it on RHEL. Could you please try it on the latest crash-7.3 or crash-8.0? If this is still reproduced, would you mind sharing the vmcore or the reproducible steps?

here is the reproduce steps: my aarch64 vm has 8 cpus, before OS crash, set some cpu offline,

echo 0 > /sys/devices/system/cpu/cpu0/online
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 0 > /sys/devices/system/cpu/cpu3/online

echo c > /proc/sysrq-trigger

then exec bt -c 0 got segment fault:

crash> bt -c 1
PID: 0      TASK: ffff0000c03510c0  CPU: 1   COMMAND: "swapper/1"
 #0 [ffff800011f73e90] arch_cpu_idle at ffff800010c0b0a4
crash> bt -c 2
PID: 0      TASK: ffff0000c039a180  CPU: 2   COMMAND: "swapper/2"
 #0 [ffff800011f7be90] arch_cpu_idle at ffff800010c0b0a4
crash> bt -c 3
PID: 0      TASK: ffff0000c039b240  CPU: 3   COMMAND: "swapper/3"
 #0 [ffff800011f83e90] arch_cpu_idle at ffff800010c0b0a4
crash> bt -c 0
PID: 0      TASK: ffff8000117fa240  CPU: 0   COMMAND: "swapper/0"
Segmentation fault (core dumped)
lian-bo commented 2 years ago

The crash-7.2.9 is old, can you try it with the latest upstream crash? I have never reproduced this issue.

xuchunmei000 commented 2 years ago

The crash-7.2.9 is old, can you try it with the latest upstream crash? I have never reproduced this issue.

I use latest upstream version 8.0.0 with gdb 10.2, still reproduce the issue.

k-hagio commented 2 years ago

hmm, I thought that map_cpus_to_prstatus_kdump_cmprs() maps cpus to prstatus, but it doesn't on arm64. Is this the cause?

If I don't understand the situation, could you please send the whole help -D output on the 8-cpu machine?

xuchunmei000 commented 2 years ago
crash> help -D
diskdump_data:
          filename: ./vmcore
             flags: 1c6 (KDUMP_CMPRS_LOCAL|ERROR_EXCLUDED|LZO_SUPPORTED|SNAPPY_SUPPORTED|ZSTD_SUPPORTED)
               dfd: 3
               ofp: ffffb554b510
      machine_type: 183 (EM_AARCH64)

            header: aaab2101ee10
           signature: "KDUMP   "
      header_version: 6
             utsname:
               sysname: Linux
              nodename: localhost
               release: 5.10.60-9.al8.aarch64
               version: #1 SMP Mon Sep 6 20:56:34 CST 2021
               machine: aarch64
            domainname: (none)
           timestamp:
                tv_sec: 61cd7e17
               tv_usec: 0
              status: 2 (DUMP_DH_COMPRESSED_LZO)
          block_size: 4096
        sub_hdr_size: 2
       bitmap_blocks: 262
           max_mapnr: 4286464
    total_ram_blocks: 0
       device_blocks: 0
      written_blocks: 0
         current_cpu: 0
             nr_cpus: 4
      tasks[nr_cpus]: 0
                      0
                      0
                      0

        sub_header: 0 (n/a)

  sub_header_kdump: aaab2101fe20
           phys_base: 40000000
          dump_level: 31 (0x1f) (DUMP_EXCLUDE_ZERO|DUMP_EXCLUDE_CACHE|DUMP_EXCLUDE_CACHE_PRI|DUMP_EXCLUDE_USER_DATA|DUMP_EXCLUDE_FREE)
               split: 0
           start_pfn: (unused)
             end_pfn: (unused)
   offset_vmcoreinfo: 5872 (0x16f0)
     size_vmcoreinfo: 2885 (0xb45)
                      OSRELEASE=5.10.60-9.al8.aarch64
                      BUILD-ID=c7f4708939637fe3985ed53ecb1aad98b94c847a
                      PAGESIZE=4096
                      SYMBOL(init_uts_ns)=ffff8000117fa028
                      SYMBOL(node_online_map)=ffff8000117f1bd0
                      SYMBOL(swapper_pg_dir)=ffff8000113b2000
                      SYMBOL(_stext)=ffff8000100d0000
                      SYMBOL(vmap_area_list)=ffff800011bdb6a0
                      SYMBOL(mem_section)=ffff0003d4783200
                      LENGTH(mem_section)=1024
                      SIZE(mem_section)=16
                      OFFSET(mem_section.section_mem_map)=0
                      NUMBER(SECTION_SIZE_BITS)=30
                      NUMBER(MAX_PHYSMEM_BITS)=48
                      SIZE(page)=64
                      SIZE(pglist_data)=7680
                      SIZE(zone)=1472
                      SIZE(free_area)=88
                      SIZE(list_head)=16
                      SIZE(nodemask_t)=8
                      OFFSET(page.flags)=0
                      OFFSET(page._refcount)=52
                      OFFSET(page.mapping)=24
                      OFFSET(page.lru)=8
                      OFFSET(page._mapcount)=48
                      OFFSET(page.private)=40
                      OFFSET(page.compound_dtor)=16
                      OFFSET(page.compound_order)=17
                      OFFSET(page.compound_head)=8
                      OFFSET(pglist_data.node_zones)=0
                      OFFSET(pglist_data.nr_zones)=6944
                      OFFSET(pglist_data.node_start_pfn)=6952
                      OFFSET(pglist_data.node_spanned_pages)=6968
                      OFFSET(pglist_data.node_id)=6992
                      OFFSET(zone.free_area)=192
                      OFFSET(zone.vm_stat)=1280
                      OFFSET(zone.spanned_pages)=112
                      OFFSET(free_area.free_list)=0
                      OFFSET(list_head.next)=0
                      OFFSET(list_head.prev)=8
                      OFFSET(vmap_area.va_start)=0
                      OFFSET(vmap_area.list)=40
                      LENGTH(zone.free_area)=11
                      SYMBOL(prb)=ffff80001181f330
                      SYMBOL(printk_rb_static)=ffff80001181f370
                      SYMBOL(clear_seq)=ffff800011cfb9e0
                      SIZE(printk_ringbuffer)=80
                      OFFSET(printk_ringbuffer.desc_ring)=0
                      OFFSET(printk_ringbuffer.text_data_ring)=40
                      OFFSET(printk_ringbuffer.fail)=72
                      SIZE(prb_desc_ring)=40
                      OFFSET(prb_desc_ring.count_bits)=0
                      OFFSET(prb_desc_ring.descs)=8
                      OFFSET(prb_desc_ring.infos)=16
                      OFFSET(prb_desc_ring.head_id)=24
                      OFFSET(prb_desc_ring.tail_id)=32
                      SIZE(prb_desc)=24
                      OFFSET(prb_desc.state_var)=0
                      OFFSET(prb_desc.text_blk_lpos)=8
                      SIZE(prb_data_blk_lpos)=16
                      OFFSET(prb_data_blk_lpos.begin)=0
                      OFFSET(prb_data_blk_lpos.next)=8
                      SIZE(printk_info)=88
                      OFFSET(printk_info.seq)=0
                      OFFSET(printk_info.ts_nsec)=8
                      OFFSET(printk_info.text_len)=16
                      OFFSET(printk_info.caller_id)=20
                      OFFSET(printk_info.dev_info)=24
                      SIZE(dev_printk_info)=64
                      OFFSET(dev_printk_info.subsystem)=0
                      LENGTH(printk_info_subsystem)=16
                      OFFSET(dev_printk_info.device)=16
                      LENGTH(printk_info_device)=48
                      SIZE(prb_data_ring)=32
                      OFFSET(prb_data_ring.size_bits)=0
                      OFFSET(prb_data_ring.data)=8
                      OFFSET(prb_data_ring.head_lpos)=16
                      OFFSET(prb_data_ring.tail_lpos)=24
                      SIZE(atomic_long_t)=8
                      OFFSET(atomic_long_t.counter)=0
                      LENGTH(free_area.free_list)=5
                      NUMBER(NR_FREE_PAGES)=0
                      NUMBER(PG_lru)=4
                      NUMBER(PG_private)=13
                      NUMBER(PG_swapcache)=10
                      NUMBER(PG_swapbacked)=19
                      NUMBER(PG_slab)=9
                      NUMBER(PG_hwpoison)=22
                      NUMBER(PG_head_mask)=65536
                      NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)=-129
                      NUMBER(HUGETLB_PAGE_DTOR)=2
                      NUMBER(PAGE_OFFLINE_MAPCOUNT_VALUE)=-257
                      NUMBER(VA_BITS)=48
                      NUMBER(kimage_voffset)=0xffff7ffc67a00000
                      NUMBER(PHYS_OFFSET)=0x40000000
                      NUMBER(TCR_EL1_T1SZ)=0x10
                      KERNELOFFSET=c0000
                      NUMBER(KERNELPACMASK)=0x0
                      CRASHTIME=1640857111
         offset_note: 4200 (0x1068)
           size_note: 4560 (0x11d0)
           notes_buf: aaab21020e30
  num_vmcoredd_notes: 0
  num_prstatus_notes: 4
            notes[0]: aaab21020e30 (NT_PRSTATUS)
                      si.signo: 0  si.code: 0  si.errno: 0
                      cursig: 0  sigpend: 0  sighold: 0
                      pid: 1408  ppid: 0  pgrp: 0  sid:0
                      utime: 0.000000  stime: 0.000000
                      cutime: 0.000000  cstime: 0.000000
                       X0: ffff0000c8742800   X1: 0000000000000000   X2: ffff00036b6e90c0
                       X3: ffff800011bb22e8   X4: ffff00036b6e90c0   X5: 0000000000000000
                       X6: 000000000000000f   X7: ffff80001181f550   X8: 0000000000000000
                       X9: ffff8000102448fc  X10: 00000000ffff8000  X11: ffff800011adf550
                      X12: 0720072007200720  X13: 0720072007200720  X14: 0720072007200720
                      X15: ffff00036b6e9740  X16: 0000000000000000  X17: 0000000000000000
                      X18: 0000000000000030  X19: ffff00036b6e90c0  X20: ffff800011bb22a8
                      X21: 0000000000000000  X22: ffff800011e08000  X23: ffff80001329bab8
                      X24: ffff800011cf2000  X25: ffff800010cdcbc0  X26: 0000000000000000
                      X27: 0000000000000000  X28: ffff00036b6e90c0  X29: ffff80001329ba70
                       LR: ffff8000102448fc   SP: ffff80001329ba70   PC: ffff8000102449d4
                      PSTATE: 60000085   FPVALID: 00000000
            notes[1]: aaab21020fcc (NT_PRSTATUS)
                      si.signo: 0  si.code: 0  si.errno: 0
                      cursig: 0  sigpend: 0  sighold: 0
                      pid: 0  ppid: 0  pgrp: 0  sid:0
                      utime: 0.000000  stime: 0.000000
                      cutime: 0.000000  cstime: 0.000000
                       X0: 00000000000000e0   X1: ffff800011c60520   X2: 0000000000000001
                       X3: ffff80001097e240   X4: 0000000000000015   X5: 00ffffffffffffff
                       X6: 0000be9186c23431   X7: 00000010ab4a0098   X8: ffff0000c0398d20
                       X9: ffff80001097e268  X10: 0000000000000cc0  X11: 0000000000000000
                      X12: 0000000000000000  X13: 0000000000000000  X14: 0000000000000000
                      X15: 0000000000000000  X16: 0000000000000000  X17: 0000000000000000
                      X18: 0000000000000000  X19: 0000000000000001  X20: ffff800011c605a0
                      X21: ffff0003d4738600  X22: ffff800011c60520  X23: 0000000000000001
                      X24: 000001b6696821aa  X25: 0000000000000000  X26: 0000000000000000
                      X27: 0000000000000000  X28: 0000000000000000  X29: ffff800011f73e90
                       LR: ffff800010c0b0a0   SP: ffff800011f73e90   PC: ffff800010c0b0a8
                      PSTATE: 60c00005   FPVALID: 00000000
            notes[2]: aaab21021168 (NT_PRSTATUS)
                      si.signo: 0  si.code: 0  si.errno: 0
                      cursig: 0  sigpend: 0  sighold: 0
                      pid: 0  ppid: 0  pgrp: 0  sid:0
                      utime: 0.000000  stime: 0.000000
                      cutime: 0.000000  cstime: 0.000000
                       X0: 00000000000000e0   X1: ffff800011c60520   X2: 0000000000000001
                       X3: ffff80001097e240   X4: 0000000000000015   X5: 00ffffffffffffff
                       X6: 0000be9186c23431   X7: 0000000d7156c757   X8: ffff0000c039d020
                       X9: ffff80001097e268  X10: 0000000000000cc0  X11: 0000000000000000
                      X12: 0000000000000000  X13: 0000000000000000  X14: 0000000000000000
                      X15: 0000000000000000  X16: 0000000000000000  X17: 0000000000000000
                      X18: 0000000000000000  X19: 0000000000000001  X20: ffff800011c605a0
                      X21: ffff0003d4759600  X22: ffff800011c60520  X23: 0000000000000001
                      X24: 000001b666fedcd8  X25: 0000000000000000  X26: 0000000000000000
                      X27: 0000000000000000  X28: 0000000000000000  X29: ffff800011f7be90
                       LR: ffff800010c0b0a0   SP: ffff800011f7be90   PC: ffff800010c0b0a8
                      PSTATE: 60c00005   FPVALID: 00000000
            notes[3]: aaab21021304 (NT_PRSTATUS)
                      si.signo: 0  si.code: 0  si.errno: 0
                      cursig: 0  sigpend: 0  sighold: 0
                      pid: 0  ppid: 0  pgrp: 0  sid:0
                      utime: 0.000000  stime: 0.000000
                      cutime: 0.000000  cstime: 0.000000
                       X0: 00000000000000e0   X1: ffff800011c60520   X2: 0000000000000001
                       X3: ffff80001097e240   X4: 0000000000000015   X5: 00ffffffffffffff
                       X6: 0000be9186c23431   X7: 00000012126509af   X8: ffff0000c039e0e0
                       X9: ffff80001097e268  X10: 0000000000000cc0  X11: 0000000000000000
                      X12: 0000000000000000  X13: 0000000000000000  X14: 0000000000000000
                      X15: 0000000000000000  X16: 0000000000000000  X17: 0000000000000000
                      X18: 0000000000000000  X19: 0000000000000001  X20: ffff800011c605a0
                      X21: ffff0003d477a600  X22: ffff800011c60520  X23: 0000000000000001
                      X24: 000001b669615082  X25: 0000000000000000  X26: 0000000000000000
                      X27: 0000000000000000  X28: 0000000000000000  X29: ffff800011f83e90
                       LR: ffff800010c0b0a0   SP: ffff800011f83e90   PC: ffff800010c0b0a8
                      PSTATE: 60c00005   FPVALID: 00000000
       snapshot_task: 0
      num_qemu_notes: 0
        NOTE offsets: 1068 (NT_PRSTATUS)
                      1204 (NT_PRSTATUS)
                      13a0 (NT_PRSTATUS)
                      153c (NT_PRSTATUS)
    offset_eraseinfo: 0 (0x0)
      size_eraseinfo: 0 (0x0)
        start_pfn_64: (unused)
          end_pfn_64: (unused)
        max_mapnr_64: 4286464 (0x416800)

       data_offset: 109000
        block_size: 4096
       block_shift: 12
            bitmap: ffffb52c3010
        bitmap_len: 1073152
         max_mapnr: 4286464 (0x416800)
   dumpable_bitmap: ffffb51bc010
              byte: 0
               bit: 0
   compressed_page: aaab2104c330
         curbufptr: aaab21049320

 page_cache_hdr[0]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3aa3f6000
           pg_bufptr: aaab2103c320
        pg_hit_count: 1
 page_cache_hdr[1]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3aa3f7000
           pg_bufptr: aaab2103d320
        pg_hit_count: 1
 page_cache_hdr[2]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3aa3f8000
           pg_bufptr: aaab2103e320
        pg_hit_count: 1
 page_cache_hdr[3]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3aa3f9000
           pg_bufptr: aaab2103f320
        pg_hit_count: 1
 page_cache_hdr[4]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3aa3fa000
           pg_bufptr: aaab21040320
        pg_hit_count: 1
 page_cache_hdr[5]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3aa3fb000
           pg_bufptr: aaab21041320
        pg_hit_count: 1
 page_cache_hdr[6]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3aa3fc000
           pg_bufptr: aaab21042320
        pg_hit_count: 1
 page_cache_hdr[7]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3aa3fd000
           pg_bufptr: aaab21043320
        pg_hit_count: 1
 page_cache_hdr[8]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3aa4af000
           pg_bufptr: aaab21044320
        pg_hit_count: 1
 page_cache_hdr[9]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3a926b000
           pg_bufptr: aaab21045320
        pg_hit_count: 10
page_cache_hdr[10]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3a9540000
           pg_bufptr: aaab21046320
        pg_hit_count: 2
page_cache_hdr[11]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3a9541000
           pg_bufptr: aaab21047320
        pg_hit_count: 9
page_cache_hdr[12]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3ab6e9000
           pg_bufptr: aaab21048320
        pg_hit_count: 1
page_cache_hdr[13]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3ab6ea000
           pg_bufptr: aaab21049320
        pg_hit_count: 1
page_cache_hdr[14]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3aa3f4000
           pg_bufptr: aaab2104a320
        pg_hit_count: 1
page_cache_hdr[15]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3aa3f5000
           pg_bufptr: aaab2104b320
        pg_hit_count: 1

    page_cache_buf: aaab2103c320
       evict_index: 14
         evictions: 2734
          accesses: 23443
      cached_reads: 20693 (88%)
       valid_pages: aaab2103a250
 total_valid_pages: 154959
k-hagio commented 2 years ago

Thanks. so how does it work with this change?

--- a/diskdump.c
+++ b/diskdump.c
@@ -111,8 +111,7 @@ map_cpus_to_prstatus_kdump_cmprs(void)
    if (pc->flags2 & QEMU_MEM_DUMP_COMPRESSED)  /* notes exist for all cpus */
        goto resize_note_pointers;

-   if (!(online = get_cpus_online()) || (online == kt->cpus) || 
-       machine_type("ARM64"))
+   if (!(online = get_cpus_online()) || (online == kt->cpus))
        goto resize_note_pointers;

    if (CRASHDEBUG(1))
xuchunmei000 commented 2 years ago

Thanks. so how does it work with this change?

--- a/diskdump.c
+++ b/diskdump.c
@@ -111,8 +111,7 @@ map_cpus_to_prstatus_kdump_cmprs(void)
  if (pc->flags2 & QEMU_MEM_DUMP_COMPRESSED)  /* notes exist for all cpus */
      goto resize_note_pointers;

- if (!(online = get_cpus_online()) || (online == kt->cpus) || 
-     machine_type("ARM64"))
+ if (!(online = get_cpus_online()) || (online == kt->cpus))
      goto resize_note_pointers;

  if (CRASHDEBUG(1))

I tried, but it does not work.

k-hagio commented 2 years ago

What is printed by help -D with the patch?

xuchunmei000 commented 2 years ago

What is printed by help -D with the patch?

sorry for late:

crash> help -D
diskdump_data:
          filename: /var/crash/127.0.0.1-2021-12-31-01:38:10/vmcore
             flags: 1c6 (KDUMP_CMPRS_LOCAL|ERROR_EXCLUDED|LZO_SUPPORTED|SNAPPY_SUPPORTED|ZSTD_SUPPORTED)
               dfd: 3
               ofp: ffff9f717510
      machine_type: 183 (EM_AARCH64)

            header: aaab02541e10
           signature: "KDUMP   "
      header_version: 6
             utsname:
               sysname: Linux
              nodename: localhost
               release: 5.10.60-9.al8.aarch64
               version: #1 SMP Mon Sep 6 20:56:34 CST 2021
               machine: aarch64
            domainname: (none)
           timestamp:
                tv_sec: 61cd7e17
               tv_usec: 0
              status: 2 (DUMP_DH_COMPRESSED_LZO)
          block_size: 4096
        sub_hdr_size: 2
       bitmap_blocks: 262
           max_mapnr: 4286464
    total_ram_blocks: 0
       device_blocks: 0
      written_blocks: 0
         current_cpu: 0
             nr_cpus: 4
      tasks[nr_cpus]: 0
                      0
                      0
                      0

        sub_header: 0 (n/a)

  sub_header_kdump: aaab02542e20
           phys_base: 40000000
          dump_level: 31 (0x1f) (DUMP_EXCLUDE_ZERO|DUMP_EXCLUDE_CACHE|DUMP_EXCLUDE_CACHE_PRI|DUMP_EXCLUDE_USER_DATA|DUMP_EXCLUDE_FREE)
               split: 0
           start_pfn: (unused)
             end_pfn: (unused)
   offset_vmcoreinfo: 5872 (0x16f0)
     size_vmcoreinfo: 2885 (0xb45)
                      OSRELEASE=5.10.60-9.al8.aarch64
                      BUILD-ID=c7f4708939637fe3985ed53ecb1aad98b94c847a
                      PAGESIZE=4096
                      SYMBOL(init_uts_ns)=ffff8000117fa028
                      SYMBOL(node_online_map)=ffff8000117f1bd0
                      SYMBOL(swapper_pg_dir)=ffff8000113b2000
                      SYMBOL(_stext)=ffff8000100d0000
                      SYMBOL(vmap_area_list)=ffff800011bdb6a0
                      SYMBOL(mem_section)=ffff0003d4783200
                      LENGTH(mem_section)=1024
                      SIZE(mem_section)=16
                      OFFSET(mem_section.section_mem_map)=0
                      NUMBER(SECTION_SIZE_BITS)=30
                      NUMBER(MAX_PHYSMEM_BITS)=48
                      SIZE(page)=64
                      SIZE(pglist_data)=7680
                      SIZE(zone)=1472
                      SIZE(free_area)=88
                      SIZE(list_head)=16
                      SIZE(nodemask_t)=8
                      OFFSET(page.flags)=0
                      OFFSET(page._refcount)=52
                      OFFSET(page.mapping)=24
                      OFFSET(page.lru)=8
                      OFFSET(page._mapcount)=48
                      OFFSET(page.private)=40
                      OFFSET(page.compound_dtor)=16
                      OFFSET(page.compound_order)=17
                      OFFSET(page.compound_head)=8
                      OFFSET(pglist_data.node_zones)=0
                      OFFSET(pglist_data.nr_zones)=6944
                      OFFSET(pglist_data.node_start_pfn)=6952
                      OFFSET(pglist_data.node_spanned_pages)=6968
                      OFFSET(pglist_data.node_id)=6992
                      OFFSET(zone.free_area)=192
                      OFFSET(zone.vm_stat)=1280
                      OFFSET(zone.spanned_pages)=112
                      OFFSET(free_area.free_list)=0
                      OFFSET(list_head.next)=0
                      OFFSET(list_head.prev)=8
                      OFFSET(vmap_area.va_start)=0
                      OFFSET(vmap_area.list)=40
                      LENGTH(zone.free_area)=11
                      SYMBOL(prb)=ffff80001181f330
                      SYMBOL(printk_rb_static)=ffff80001181f370
                      SYMBOL(clear_seq)=ffff800011cfb9e0
                      SIZE(printk_ringbuffer)=80
                      OFFSET(printk_ringbuffer.desc_ring)=0
                      OFFSET(printk_ringbuffer.text_data_ring)=40
                      OFFSET(printk_ringbuffer.fail)=72
                      SIZE(prb_desc_ring)=40
                      OFFSET(prb_desc_ring.count_bits)=0
                      OFFSET(prb_desc_ring.descs)=8
                      OFFSET(prb_desc_ring.infos)=16
                      OFFSET(prb_desc_ring.head_id)=24
                      OFFSET(prb_desc_ring.tail_id)=32
                      SIZE(prb_desc)=24
                      OFFSET(prb_desc.state_var)=0
                      OFFSET(prb_desc.text_blk_lpos)=8
                      SIZE(prb_data_blk_lpos)=16
                      OFFSET(prb_data_blk_lpos.begin)=0
                      OFFSET(prb_data_blk_lpos.next)=8
                      SIZE(printk_info)=88
                      OFFSET(printk_info.seq)=0
                      OFFSET(printk_info.ts_nsec)=8
                      OFFSET(printk_info.text_len)=16
                      OFFSET(printk_info.caller_id)=20
                      OFFSET(printk_info.dev_info)=24
                      SIZE(dev_printk_info)=64
                      OFFSET(dev_printk_info.subsystem)=0
                      LENGTH(printk_info_subsystem)=16
                      OFFSET(dev_printk_info.device)=16
                      LENGTH(printk_info_device)=48
                      SIZE(prb_data_ring)=32
                      OFFSET(prb_data_ring.size_bits)=0
                      OFFSET(prb_data_ring.data)=8
                      OFFSET(prb_data_ring.head_lpos)=16
                      OFFSET(prb_data_ring.tail_lpos)=24
                      SIZE(atomic_long_t)=8
                      OFFSET(atomic_long_t.counter)=0
                      LENGTH(free_area.free_list)=5
                      NUMBER(NR_FREE_PAGES)=0
                      NUMBER(PG_lru)=4
                      NUMBER(PG_private)=13
                      NUMBER(PG_swapcache)=10
                      NUMBER(PG_swapbacked)=19
                      NUMBER(PG_slab)=9
                      NUMBER(PG_hwpoison)=22
                      NUMBER(PG_head_mask)=65536
                      NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)=-129
                      NUMBER(HUGETLB_PAGE_DTOR)=2
                      NUMBER(PAGE_OFFLINE_MAPCOUNT_VALUE)=-257
                      NUMBER(VA_BITS)=48
                      NUMBER(kimage_voffset)=0xffff7ffc67a00000
                      NUMBER(PHYS_OFFSET)=0x40000000
                      NUMBER(TCR_EL1_T1SZ)=0x10
                      KERNELOFFSET=c0000
                      NUMBER(KERNELPACMASK)=0x0
                      CRASHTIME=1640857111
         offset_note: 4200 (0x1068)
           size_note: 4560 (0x11d0)
           notes_buf: aaab02543e30
  num_vmcoredd_notes: 0
  num_prstatus_notes: 8
            notes[0]: 0
            notes[1]: 0
            notes[2]: 0
            notes[3]: 0
            notes[4]: aaab02543e30 (NT_PRSTATUS)
                      si.signo: 0  si.code: 0  si.errno: 0
                      cursig: 0  sigpend: 0  sighold: 0
                      pid: 1408  ppid: 0  pgrp: 0  sid:0
                      utime: 0.000000  stime: 0.000000
                      cutime: 0.000000  cstime: 0.000000
                       X0: ffff0000c8742800   X1: 0000000000000000   X2: ffff00036b6e90c0
                       X3: ffff800011bb22e8   X4: ffff00036b6e90c0   X5: 0000000000000000
                       X6: 000000000000000f   X7: ffff80001181f550   X8: 0000000000000000
                       X9: ffff8000102448fc  X10: 00000000ffff8000  X11: ffff800011adf550
                      X12: 0720072007200720  X13: 0720072007200720  X14: 0720072007200720
                      X15: ffff00036b6e9740  X16: 0000000000000000  X17: 0000000000000000
                      X18: 0000000000000030  X19: ffff00036b6e90c0  X20: ffff800011bb22a8
                      X21: 0000000000000000  X22: ffff800011e08000  X23: ffff80001329bab8
                      X24: ffff800011cf2000  X25: ffff800010cdcbc0  X26: 0000000000000000
                      X27: 0000000000000000  X28: ffff00036b6e90c0  X29: ffff80001329ba70
                       LR: ffff8000102448fc   SP: ffff80001329ba70   PC: ffff8000102449d4
                      PSTATE: 60000085   FPVALID: 00000000
            notes[5]: aaab02543fcc (NT_PRSTATUS)
                      si.signo: 0  si.code: 0  si.errno: 0
                      cursig: 0  sigpend: 0  sighold: 0
                      pid: 0  ppid: 0  pgrp: 0  sid:0
                      utime: 0.000000  stime: 0.000000
                      cutime: 0.000000  cstime: 0.000000
                       X0: 00000000000000e0   X1: ffff800011c60520   X2: 0000000000000001
                       X3: ffff80001097e240   X4: 0000000000000015   X5: 00ffffffffffffff
                       X6: 0000be9186c23431   X7: 00000010ab4a0098   X8: ffff0000c0398d20
                       X9: ffff80001097e268  X10: 0000000000000cc0  X11: 0000000000000000
                      X12: 0000000000000000  X13: 0000000000000000  X14: 0000000000000000
                      X15: 0000000000000000  X16: 0000000000000000  X17: 0000000000000000
                      X18: 0000000000000000  X19: 0000000000000001  X20: ffff800011c605a0
                      X21: ffff0003d4738600  X22: ffff800011c60520  X23: 0000000000000001
                      X24: 000001b6696821aa  X25: 0000000000000000  X26: 0000000000000000
                      X27: 0000000000000000  X28: 0000000000000000  X29: ffff800011f73e90
                       LR: ffff800010c0b0a0   SP: ffff800011f73e90   PC: ffff800010c0b0a8
                      PSTATE: 60c00005   FPVALID: 00000000
            notes[6]: aaab02544168 (NT_PRSTATUS)
                      si.signo: 0  si.code: 0  si.errno: 0
                      cursig: 0  sigpend: 0  sighold: 0
                      pid: 0  ppid: 0  pgrp: 0  sid:0
                      utime: 0.000000  stime: 0.000000
                      cutime: 0.000000  cstime: 0.000000
                       X0: 00000000000000e0   X1: ffff800011c60520   X2: 0000000000000001
                       X3: ffff80001097e240   X4: 0000000000000015   X5: 00ffffffffffffff
                       X6: 0000be9186c23431   X7: 0000000d7156c757   X8: ffff0000c039d020
                       X9: ffff80001097e268  X10: 0000000000000cc0  X11: 0000000000000000
                      X12: 0000000000000000  X13: 0000000000000000  X14: 0000000000000000
                      X15: 0000000000000000  X16: 0000000000000000  X17: 0000000000000000
                      X18: 0000000000000000  X19: 0000000000000001  X20: ffff800011c605a0
                      X21: ffff0003d4759600  X22: ffff800011c60520  X23: 0000000000000001
                      X24: 000001b666fedcd8  X25: 0000000000000000  X26: 0000000000000000
                      X27: 0000000000000000  X28: 0000000000000000  X29: ffff800011f7be90
                       LR: ffff800010c0b0a0   SP: ffff800011f7be90   PC: ffff800010c0b0a8
                      PSTATE: 60c00005   FPVALID: 00000000
            notes[7]: aaab02544304 (NT_PRSTATUS)
                      si.signo: 0  si.code: 0  si.errno: 0
                      cursig: 0  sigpend: 0  sighold: 0
                      pid: 0  ppid: 0  pgrp: 0  sid:0
                      utime: 0.000000  stime: 0.000000
                      cutime: 0.000000  cstime: 0.000000
                       X0: 00000000000000e0   X1: ffff800011c60520   X2: 0000000000000001
                       X3: ffff80001097e240   X4: 0000000000000015   X5: 00ffffffffffffff
                       X6: 0000be9186c23431   X7: 00000012126509af   X8: ffff0000c039e0e0
                       X9: ffff80001097e268  X10: 0000000000000cc0  X11: 0000000000000000
                      X12: 0000000000000000  X13: 0000000000000000  X14: 0000000000000000
                      X15: 0000000000000000  X16: 0000000000000000  X17: 0000000000000000
                      X18: 0000000000000000  X19: 0000000000000001  X20: ffff800011c605a0
                      X21: ffff0003d477a600  X22: ffff800011c60520  X23: 0000000000000001
                      X24: 000001b669615082  X25: 0000000000000000  X26: 0000000000000000
                      X27: 0000000000000000  X28: 0000000000000000  X29: ffff800011f83e90
                       LR: ffff800010c0b0a0   SP: ffff800011f83e90   PC: ffff800010c0b0a8
                      PSTATE: 60c00005   FPVALID: 00000000
       snapshot_task: 0
      num_qemu_notes: 0
        NOTE offsets: 1068 (NT_PRSTATUS)
                      1204 (NT_PRSTATUS)
                      13a0 (NT_PRSTATUS)
                      153c (NT_PRSTATUS)
    offset_eraseinfo: 0 (0x0)
      size_eraseinfo: 0 (0x0)
        start_pfn_64: (unused)
          end_pfn_64: (unused)
        max_mapnr_64: 4286464 (0x416800)

       data_offset: 109000
        block_size: 4096
       block_shift: 12
            bitmap: ffff9f48f010
        bitmap_len: 1073152
         max_mapnr: 4286464 (0x416800)
   dumpable_bitmap: ffff9f388010
              byte: 0
               bit: 0
   compressed_page: aaab0256f330
         curbufptr: aaab0256c320

 page_cache_hdr[0]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3aa3f6000
           pg_bufptr: aaab0255f320
        pg_hit_count: 1
 page_cache_hdr[1]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3aa3f7000
           pg_bufptr: aaab02560320
        pg_hit_count: 1
 page_cache_hdr[2]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3aa3f8000
           pg_bufptr: aaab02561320
        pg_hit_count: 1
 page_cache_hdr[3]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3aa3f9000
           pg_bufptr: aaab02562320
        pg_hit_count: 1
 page_cache_hdr[4]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3aa3fa000
           pg_bufptr: aaab02563320
        pg_hit_count: 1
 page_cache_hdr[5]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3aa3fb000
           pg_bufptr: aaab02564320
        pg_hit_count: 1
 page_cache_hdr[6]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3aa3fc000
           pg_bufptr: aaab02565320
        pg_hit_count: 1
 page_cache_hdr[7]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3aa3fd000
           pg_bufptr: aaab02566320
        pg_hit_count: 1
 page_cache_hdr[8]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3aa4af000
           pg_bufptr: aaab02567320
        pg_hit_count: 1
 page_cache_hdr[9]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3a926b000
           pg_bufptr: aaab02568320
        pg_hit_count: 10
page_cache_hdr[10]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3a9540000
           pg_bufptr: aaab02569320
        pg_hit_count: 2
page_cache_hdr[11]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3a9541000
           pg_bufptr: aaab0256a320
        pg_hit_count: 9
page_cache_hdr[12]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3ab6e9000
           pg_bufptr: aaab0256b320
        pg_hit_count: 1
page_cache_hdr[13]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3ab6ea000
           pg_bufptr: aaab0256c320
        pg_hit_count: 1
page_cache_hdr[14]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3aa3f4000
           pg_bufptr: aaab0256d320
        pg_hit_count: 1
page_cache_hdr[15]:
            pg_flags: 1 (PAGE_VALID)
             pg_addr: 3aa3f5000
           pg_bufptr: aaab0256e320
        pg_hit_count: 1

    page_cache_buf: aaab0255f320
       evict_index: 14
         evictions: 2734
          accesses: 23443
      cached_reads: 20693 (88%)
       valid_pages: aaab0255d250
 total_valid_pages: 154959
k-hagio commented 2 years ago

Thanks, it looks correctly mapped.

  num_prstatus_notes: 8
            notes[0]: 0
            notes[1]: 0
            notes[2]: 0
            notes[3]: 0
            notes[4]: aaab02543e30 (NT_PRSTATUS)
            ...

I tried, but it does not work.

What errors do you see? The same segfault by bt -c 0?

xuchunmei000 commented 2 years ago

Thanks, it looks correctly mapped.

  num_prstatus_notes: 8
            notes[0]: 0
            notes[1]: 0
            notes[2]: 0
            notes[3]: 0
            notes[4]: aaab02543e30 (NT_PRSTATUS)
            ...

I tried, but it does not work.

What errors do you see? The same segfault by bt -c 0?

yes.

crash> bt -c 1
PID: 0      TASK: ffff0000c03510c0  CPU: 1   COMMAND: "swapper/1"
 #0 [ffff800011f73e90] arch_cpu_idle at ffff800010c0b0a4
crash> bt -c 2
PID: 0      TASK: ffff0000c039a180  CPU: 2   COMMAND: "swapper/2"
 #0 [ffff800011f7be90] arch_cpu_idle at ffff800010c0b0a4
crash> bt -c 2
PID: 0      TASK: ffff0000c039a180  CPU: 2   COMMAND: "swapper/2"
 #0 [ffff800011f7be90] arch_cpu_idle at ffff800010c0b0a4
crash> bt -c 3
PID: 0      TASK: ffff0000c039b240  CPU: 3   COMMAND: "swapper/3"
 #0 [ffff800011f83e90] arch_cpu_idle at ffff800010c0b0a4
crash> bt -c 0
PID: 0      TASK: ffff8000117fa240  CPU: 0   COMMAND: "swapper/0"
Segmentation fault (core dumped)
k-hagio commented 2 years ago

So perhaps dd->nt_prstatus_percpu is not the cause. Is it possible to debug where crash fails? I don't have an arm machine and cannot reproduce this.

xuchunmei000 commented 2 years ago

So perhaps dd->nt_prstatus_percpu is not the cause. Is it possible to debug where crash fails? I don't have an arm machine and cannot reproduce this.

following is gdb info, the panic_task_regs[0] is same as panic_task_regs[4], while cpu 0 does not save crash_notes, it shound be empty.

 #0  arm64_is_kernel_exception_frame (bt=bt@entry=0xffffcd47d9f8, stkptr=stkptr@entry=18446603336542697776) at arm64.c:1925
1925        if (INSTACK(regs->sp, bt) && INSTACK(regs->regs[29], bt) &&
[Current thread is 1 (Thread 0xffff8aa7f010 (LWP 128066))]
(gdb) bt
#0  arm64_is_kernel_exception_frame (bt=bt@entry=0xffffcd47d9f8, stkptr=stkptr@entry=18446603336542697776) at arm64.c:1925
#1  0x0000aaaab26b2ef4 in arm64_back_trace_cmd (bt=0xffffcd47d9f8) at arm64.c:2760
#2  0x0000aaaab2684058 in back_trace (bt=0xffffcd47d9f8) at kernel.c:3186
#3  0x0000aaaab2685be4 in cmd_bt () at kernel.c:2789
#4  0x0000aaaab25fe2fc in exec_command () at main.c:892
#5  0x0000aaaab25fe5b8 in main_loop () at main.c:839
#6  0x0000aaaab292216c in captured_main (data=data@entry=0xffffcd47e1e0) at main.c:1284
#7  gdb_main (args=args@entry=0xffffcd47e220) at main.c:1313
#8  0x0000aaaab292225c in gdb_main_entry (argc=<optimized out>, argv=<optimized out>) at main.c:1338
#9  0x0000aaaab25f873c in main (argc=3, argv=0xffffcd47e418) at main.c:720
(gdb) p machdep->machspec->panic_task_regs[0]
$1 = {{user_regs = {regs = {18446462602095896576, 0, 18446462613420150976, 18446603336518673128, 18446462613420150976, 0, 15,
        18446603336514925904, 0, 18446603336492009724, 4294934528, 18446603336517809488, 513418191660123936, 513418191660123936,
        513418191660123936, 18446462613420152640, 0, 0, 48, 18446462613420150976, 18446603336518673064, 0, 18446603336521121792,
        18446603336542698168, 18446603336519983104, 18446603336503118784, 0, 0, 18446462613420150976, 18446603336542698096,
        18446603336492009724}, sp = 18446603336542698096, pc = 18446603336492009940, pstate = 1610612869}, {regs = {18446462602095896576,
        0, 18446462613420150976, 18446603336518673128, 18446462613420150976, 0, 15, 18446603336514925904, 0, 18446603336492009724,
        4294934528, 18446603336517809488, 513418191660123936, 513418191660123936, 513418191660123936, 18446462613420152640, 0, 0, 48,
        18446462613420150976, 18446603336518673064, 0, 18446603336521121792, 18446603336542698168, 18446603336519983104,
        18446603336503118784, 0, 0, 18446462613420150976, 18446603336542698096, 18446603336492009724}, sp = 18446603336542698096,
      pc = 18446603336492009940, pstate = 1610612869}}, orig_x0 = 0, syscallno = 0}
(gdb) p machdep->machspec->panic_task_regs[4]
$2 = {{user_regs = {regs = {18446462602095896576, 0, 18446462613420150976, 18446603336518673128, 18446462613420150976, 0, 15,
        18446603336514925904, 0, 18446603336492009724, 4294934528, 18446603336517809488, 513418191660123936, 513418191660123936,
        513418191660123936, 18446462613420152640, 0, 0, 48, 18446462613420150976, 18446603336518673064, 0, 18446603336521121792,
        18446603336542698168, 18446603336519983104, 18446603336503118784, 0, 0, 18446462613420150976, 18446603336542698096,
        18446603336492009724}, sp = 18446603336542698096, pc = 18446603336492009940, pstate = 1610612869}, {regs = {18446462602095896576,
        0, 18446462613420150976, 18446603336518673128, 18446462613420150976, 0, 15, 18446603336514925904, 0, 18446603336492009724,
        4294934528, 18446603336517809488, 513418191660123936, 513418191660123936, 513418191660123936, 18446462613420152640, 0, 0, 48,
        18446462613420150976, 18446603336518673064, 0, 18446603336521121792, 18446603336542698168, 18446603336519983104,
        18446603336503118784, 0, 0, 18446462613420150976, 18446603336542698096, 18446603336492009724}, sp = 18446603336542698096,
      pc = 18446603336492009940, pstate = 1610612869}}, orig_x0 = 0, syscallno = 0}
xuchunmei000 commented 2 years ago

So perhaps dd->nt_prstatus_percpu is not the cause. Is it possible to debug where crash fails? I don't have an arm machine and cannot reproduce this.

I have some debug information: dd->nt_prstatus_percpu is correctly mapped in map_cpus_to_prstatus_kdump_cmprs with your patch. but it is later than machdep->machspec->panic_task_regs saved. machdep->machspec->panic_task_regs is saved from arm64_get_crash_notes called from arm64_init. while map_cpus_to_prstatus_kdump_cmprs is called from task_init. task_init is called later than machdep_init(arm64_init).

lian-bo commented 2 years ago

I got a vmcore with the same backtrace from customers, the "bt -a" may trigger the segfault on this specific vmcore. It seems to be a similar case.

k-hagio commented 2 years ago

task_init is called later than machdep_init(arm64_init).

Thanks for debugging.

How does this work with the patch above?

--- a/arm64.c
+++ b/arm64.c
@@ -472,7 +472,7 @@ arm64_init(int when)
                arm64_stackframe_init();
                break;

-       case POST_VM:
+       case POST_INIT:
                /*
                 * crash_notes contains machine specific information about the
                 * crash. In particular, it contains CPU registers at the time
xuchunmei000 commented 2 years ago
map_cpus_to_prstatus_kdump_cmprs

yes, it works, with previous patch for map_cpus_to_prstatus_kdump_cmprs.

k-hagio commented 2 years ago

Thanks for testing. It's just an idea, will check if there is a better way.

k-hagio commented 2 years ago

@lian-bo, could you test this patch with the vmcore you got? I think I will go with this.

--- a/arm64.c
+++ b/arm64.c
@@ -472,7 +472,7 @@ arm64_init(int when)
                arm64_stackframe_init();
                break;

-       case POST_VM:
+       case POST_INIT:
                /*
                 * crash_notes contains machine specific information about the
                 * crash. In particular, it contains CPU registers at the time
diff --git a/diskdump.c b/diskdump.c
index 3e1cfd548c96..d5674276e1fd 100644
--- a/diskdump.c
+++ b/diskdump.c
@@ -111,8 +111,7 @@ map_cpus_to_prstatus_kdump_cmprs(void)
        if (pc->flags2 & QEMU_MEM_DUMP_COMPRESSED)  /* notes exist for all cpus */
                goto resize_note_pointers;

-       if (!(online = get_cpus_online()) || (online == kt->cpus) || 
-           machine_type("ARM64"))
+       if (!(online = get_cpus_online()) || (online == kt->cpus))
                goto resize_note_pointers;

        if (CRASHDEBUG(1))
lian-bo commented 2 years ago

Sure. Crash got a lot of warnings when running the bt command on my vmcore, but I have no much time to investigate the details. As you know, currently I'm working on another issues. I will help to test it further once I have time. But anyway, could you post it to upstream firstly? We can continue to talk about it there. Thanks.

k-hagio commented 2 years ago

ok, will post.