Closed xuchunmei000 closed 2 years ago
thanks for the report. @lian-bo, I've not have been able to look into this yet, but this will be reproduced also on RHEL?
So far I haven't seen it on RHEL. Could you please try it on the latest crash-7.3 or crash-8.0? If this is still reproduced, would you mind sharing the vmcore or the reproducible steps?
So far I haven't seen it on RHEL. Could you please try it on the latest crash-7.3 or crash-8.0? If this is still reproduced, would you mind sharing the vmcore or the reproducible steps?
here is the reproduce steps: my aarch64 vm has 8 cpus, before OS crash, set some cpu offline,
echo 0 > /sys/devices/system/cpu/cpu0/online
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 0 > /sys/devices/system/cpu/cpu3/online
echo c > /proc/sysrq-trigger
then exec bt -c 0
got segment fault:
crash> bt -c 1
PID: 0 TASK: ffff0000c03510c0 CPU: 1 COMMAND: "swapper/1"
#0 [ffff800011f73e90] arch_cpu_idle at ffff800010c0b0a4
crash> bt -c 2
PID: 0 TASK: ffff0000c039a180 CPU: 2 COMMAND: "swapper/2"
#0 [ffff800011f7be90] arch_cpu_idle at ffff800010c0b0a4
crash> bt -c 3
PID: 0 TASK: ffff0000c039b240 CPU: 3 COMMAND: "swapper/3"
#0 [ffff800011f83e90] arch_cpu_idle at ffff800010c0b0a4
crash> bt -c 0
PID: 0 TASK: ffff8000117fa240 CPU: 0 COMMAND: "swapper/0"
Segmentation fault (core dumped)
The crash-7.2.9 is old, can you try it with the latest upstream crash? I have never reproduced this issue.
The crash-7.2.9 is old, can you try it with the latest upstream crash? I have never reproduced this issue.
I use latest upstream version 8.0.0 with gdb 10.2, still reproduce the issue.
hmm, I thought that map_cpus_to_prstatus_kdump_cmprs()
maps cpus to prstatus, but it doesn't on arm64.
Is this the cause?
If I don't understand the situation, could you please send the whole help -D
output on the 8-cpu machine?
crash> help -D
diskdump_data:
filename: ./vmcore
flags: 1c6 (KDUMP_CMPRS_LOCAL|ERROR_EXCLUDED|LZO_SUPPORTED|SNAPPY_SUPPORTED|ZSTD_SUPPORTED)
dfd: 3
ofp: ffffb554b510
machine_type: 183 (EM_AARCH64)
header: aaab2101ee10
signature: "KDUMP "
header_version: 6
utsname:
sysname: Linux
nodename: localhost
release: 5.10.60-9.al8.aarch64
version: #1 SMP Mon Sep 6 20:56:34 CST 2021
machine: aarch64
domainname: (none)
timestamp:
tv_sec: 61cd7e17
tv_usec: 0
status: 2 (DUMP_DH_COMPRESSED_LZO)
block_size: 4096
sub_hdr_size: 2
bitmap_blocks: 262
max_mapnr: 4286464
total_ram_blocks: 0
device_blocks: 0
written_blocks: 0
current_cpu: 0
nr_cpus: 4
tasks[nr_cpus]: 0
0
0
0
sub_header: 0 (n/a)
sub_header_kdump: aaab2101fe20
phys_base: 40000000
dump_level: 31 (0x1f) (DUMP_EXCLUDE_ZERO|DUMP_EXCLUDE_CACHE|DUMP_EXCLUDE_CACHE_PRI|DUMP_EXCLUDE_USER_DATA|DUMP_EXCLUDE_FREE)
split: 0
start_pfn: (unused)
end_pfn: (unused)
offset_vmcoreinfo: 5872 (0x16f0)
size_vmcoreinfo: 2885 (0xb45)
OSRELEASE=5.10.60-9.al8.aarch64
BUILD-ID=c7f4708939637fe3985ed53ecb1aad98b94c847a
PAGESIZE=4096
SYMBOL(init_uts_ns)=ffff8000117fa028
SYMBOL(node_online_map)=ffff8000117f1bd0
SYMBOL(swapper_pg_dir)=ffff8000113b2000
SYMBOL(_stext)=ffff8000100d0000
SYMBOL(vmap_area_list)=ffff800011bdb6a0
SYMBOL(mem_section)=ffff0003d4783200
LENGTH(mem_section)=1024
SIZE(mem_section)=16
OFFSET(mem_section.section_mem_map)=0
NUMBER(SECTION_SIZE_BITS)=30
NUMBER(MAX_PHYSMEM_BITS)=48
SIZE(page)=64
SIZE(pglist_data)=7680
SIZE(zone)=1472
SIZE(free_area)=88
SIZE(list_head)=16
SIZE(nodemask_t)=8
OFFSET(page.flags)=0
OFFSET(page._refcount)=52
OFFSET(page.mapping)=24
OFFSET(page.lru)=8
OFFSET(page._mapcount)=48
OFFSET(page.private)=40
OFFSET(page.compound_dtor)=16
OFFSET(page.compound_order)=17
OFFSET(page.compound_head)=8
OFFSET(pglist_data.node_zones)=0
OFFSET(pglist_data.nr_zones)=6944
OFFSET(pglist_data.node_start_pfn)=6952
OFFSET(pglist_data.node_spanned_pages)=6968
OFFSET(pglist_data.node_id)=6992
OFFSET(zone.free_area)=192
OFFSET(zone.vm_stat)=1280
OFFSET(zone.spanned_pages)=112
OFFSET(free_area.free_list)=0
OFFSET(list_head.next)=0
OFFSET(list_head.prev)=8
OFFSET(vmap_area.va_start)=0
OFFSET(vmap_area.list)=40
LENGTH(zone.free_area)=11
SYMBOL(prb)=ffff80001181f330
SYMBOL(printk_rb_static)=ffff80001181f370
SYMBOL(clear_seq)=ffff800011cfb9e0
SIZE(printk_ringbuffer)=80
OFFSET(printk_ringbuffer.desc_ring)=0
OFFSET(printk_ringbuffer.text_data_ring)=40
OFFSET(printk_ringbuffer.fail)=72
SIZE(prb_desc_ring)=40
OFFSET(prb_desc_ring.count_bits)=0
OFFSET(prb_desc_ring.descs)=8
OFFSET(prb_desc_ring.infos)=16
OFFSET(prb_desc_ring.head_id)=24
OFFSET(prb_desc_ring.tail_id)=32
SIZE(prb_desc)=24
OFFSET(prb_desc.state_var)=0
OFFSET(prb_desc.text_blk_lpos)=8
SIZE(prb_data_blk_lpos)=16
OFFSET(prb_data_blk_lpos.begin)=0
OFFSET(prb_data_blk_lpos.next)=8
SIZE(printk_info)=88
OFFSET(printk_info.seq)=0
OFFSET(printk_info.ts_nsec)=8
OFFSET(printk_info.text_len)=16
OFFSET(printk_info.caller_id)=20
OFFSET(printk_info.dev_info)=24
SIZE(dev_printk_info)=64
OFFSET(dev_printk_info.subsystem)=0
LENGTH(printk_info_subsystem)=16
OFFSET(dev_printk_info.device)=16
LENGTH(printk_info_device)=48
SIZE(prb_data_ring)=32
OFFSET(prb_data_ring.size_bits)=0
OFFSET(prb_data_ring.data)=8
OFFSET(prb_data_ring.head_lpos)=16
OFFSET(prb_data_ring.tail_lpos)=24
SIZE(atomic_long_t)=8
OFFSET(atomic_long_t.counter)=0
LENGTH(free_area.free_list)=5
NUMBER(NR_FREE_PAGES)=0
NUMBER(PG_lru)=4
NUMBER(PG_private)=13
NUMBER(PG_swapcache)=10
NUMBER(PG_swapbacked)=19
NUMBER(PG_slab)=9
NUMBER(PG_hwpoison)=22
NUMBER(PG_head_mask)=65536
NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)=-129
NUMBER(HUGETLB_PAGE_DTOR)=2
NUMBER(PAGE_OFFLINE_MAPCOUNT_VALUE)=-257
NUMBER(VA_BITS)=48
NUMBER(kimage_voffset)=0xffff7ffc67a00000
NUMBER(PHYS_OFFSET)=0x40000000
NUMBER(TCR_EL1_T1SZ)=0x10
KERNELOFFSET=c0000
NUMBER(KERNELPACMASK)=0x0
CRASHTIME=1640857111
offset_note: 4200 (0x1068)
size_note: 4560 (0x11d0)
notes_buf: aaab21020e30
num_vmcoredd_notes: 0
num_prstatus_notes: 4
notes[0]: aaab21020e30 (NT_PRSTATUS)
si.signo: 0 si.code: 0 si.errno: 0
cursig: 0 sigpend: 0 sighold: 0
pid: 1408 ppid: 0 pgrp: 0 sid:0
utime: 0.000000 stime: 0.000000
cutime: 0.000000 cstime: 0.000000
X0: ffff0000c8742800 X1: 0000000000000000 X2: ffff00036b6e90c0
X3: ffff800011bb22e8 X4: ffff00036b6e90c0 X5: 0000000000000000
X6: 000000000000000f X7: ffff80001181f550 X8: 0000000000000000
X9: ffff8000102448fc X10: 00000000ffff8000 X11: ffff800011adf550
X12: 0720072007200720 X13: 0720072007200720 X14: 0720072007200720
X15: ffff00036b6e9740 X16: 0000000000000000 X17: 0000000000000000
X18: 0000000000000030 X19: ffff00036b6e90c0 X20: ffff800011bb22a8
X21: 0000000000000000 X22: ffff800011e08000 X23: ffff80001329bab8
X24: ffff800011cf2000 X25: ffff800010cdcbc0 X26: 0000000000000000
X27: 0000000000000000 X28: ffff00036b6e90c0 X29: ffff80001329ba70
LR: ffff8000102448fc SP: ffff80001329ba70 PC: ffff8000102449d4
PSTATE: 60000085 FPVALID: 00000000
notes[1]: aaab21020fcc (NT_PRSTATUS)
si.signo: 0 si.code: 0 si.errno: 0
cursig: 0 sigpend: 0 sighold: 0
pid: 0 ppid: 0 pgrp: 0 sid:0
utime: 0.000000 stime: 0.000000
cutime: 0.000000 cstime: 0.000000
X0: 00000000000000e0 X1: ffff800011c60520 X2: 0000000000000001
X3: ffff80001097e240 X4: 0000000000000015 X5: 00ffffffffffffff
X6: 0000be9186c23431 X7: 00000010ab4a0098 X8: ffff0000c0398d20
X9: ffff80001097e268 X10: 0000000000000cc0 X11: 0000000000000000
X12: 0000000000000000 X13: 0000000000000000 X14: 0000000000000000
X15: 0000000000000000 X16: 0000000000000000 X17: 0000000000000000
X18: 0000000000000000 X19: 0000000000000001 X20: ffff800011c605a0
X21: ffff0003d4738600 X22: ffff800011c60520 X23: 0000000000000001
X24: 000001b6696821aa X25: 0000000000000000 X26: 0000000000000000
X27: 0000000000000000 X28: 0000000000000000 X29: ffff800011f73e90
LR: ffff800010c0b0a0 SP: ffff800011f73e90 PC: ffff800010c0b0a8
PSTATE: 60c00005 FPVALID: 00000000
notes[2]: aaab21021168 (NT_PRSTATUS)
si.signo: 0 si.code: 0 si.errno: 0
cursig: 0 sigpend: 0 sighold: 0
pid: 0 ppid: 0 pgrp: 0 sid:0
utime: 0.000000 stime: 0.000000
cutime: 0.000000 cstime: 0.000000
X0: 00000000000000e0 X1: ffff800011c60520 X2: 0000000000000001
X3: ffff80001097e240 X4: 0000000000000015 X5: 00ffffffffffffff
X6: 0000be9186c23431 X7: 0000000d7156c757 X8: ffff0000c039d020
X9: ffff80001097e268 X10: 0000000000000cc0 X11: 0000000000000000
X12: 0000000000000000 X13: 0000000000000000 X14: 0000000000000000
X15: 0000000000000000 X16: 0000000000000000 X17: 0000000000000000
X18: 0000000000000000 X19: 0000000000000001 X20: ffff800011c605a0
X21: ffff0003d4759600 X22: ffff800011c60520 X23: 0000000000000001
X24: 000001b666fedcd8 X25: 0000000000000000 X26: 0000000000000000
X27: 0000000000000000 X28: 0000000000000000 X29: ffff800011f7be90
LR: ffff800010c0b0a0 SP: ffff800011f7be90 PC: ffff800010c0b0a8
PSTATE: 60c00005 FPVALID: 00000000
notes[3]: aaab21021304 (NT_PRSTATUS)
si.signo: 0 si.code: 0 si.errno: 0
cursig: 0 sigpend: 0 sighold: 0
pid: 0 ppid: 0 pgrp: 0 sid:0
utime: 0.000000 stime: 0.000000
cutime: 0.000000 cstime: 0.000000
X0: 00000000000000e0 X1: ffff800011c60520 X2: 0000000000000001
X3: ffff80001097e240 X4: 0000000000000015 X5: 00ffffffffffffff
X6: 0000be9186c23431 X7: 00000012126509af X8: ffff0000c039e0e0
X9: ffff80001097e268 X10: 0000000000000cc0 X11: 0000000000000000
X12: 0000000000000000 X13: 0000000000000000 X14: 0000000000000000
X15: 0000000000000000 X16: 0000000000000000 X17: 0000000000000000
X18: 0000000000000000 X19: 0000000000000001 X20: ffff800011c605a0
X21: ffff0003d477a600 X22: ffff800011c60520 X23: 0000000000000001
X24: 000001b669615082 X25: 0000000000000000 X26: 0000000000000000
X27: 0000000000000000 X28: 0000000000000000 X29: ffff800011f83e90
LR: ffff800010c0b0a0 SP: ffff800011f83e90 PC: ffff800010c0b0a8
PSTATE: 60c00005 FPVALID: 00000000
snapshot_task: 0
num_qemu_notes: 0
NOTE offsets: 1068 (NT_PRSTATUS)
1204 (NT_PRSTATUS)
13a0 (NT_PRSTATUS)
153c (NT_PRSTATUS)
offset_eraseinfo: 0 (0x0)
size_eraseinfo: 0 (0x0)
start_pfn_64: (unused)
end_pfn_64: (unused)
max_mapnr_64: 4286464 (0x416800)
data_offset: 109000
block_size: 4096
block_shift: 12
bitmap: ffffb52c3010
bitmap_len: 1073152
max_mapnr: 4286464 (0x416800)
dumpable_bitmap: ffffb51bc010
byte: 0
bit: 0
compressed_page: aaab2104c330
curbufptr: aaab21049320
page_cache_hdr[0]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3aa3f6000
pg_bufptr: aaab2103c320
pg_hit_count: 1
page_cache_hdr[1]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3aa3f7000
pg_bufptr: aaab2103d320
pg_hit_count: 1
page_cache_hdr[2]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3aa3f8000
pg_bufptr: aaab2103e320
pg_hit_count: 1
page_cache_hdr[3]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3aa3f9000
pg_bufptr: aaab2103f320
pg_hit_count: 1
page_cache_hdr[4]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3aa3fa000
pg_bufptr: aaab21040320
pg_hit_count: 1
page_cache_hdr[5]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3aa3fb000
pg_bufptr: aaab21041320
pg_hit_count: 1
page_cache_hdr[6]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3aa3fc000
pg_bufptr: aaab21042320
pg_hit_count: 1
page_cache_hdr[7]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3aa3fd000
pg_bufptr: aaab21043320
pg_hit_count: 1
page_cache_hdr[8]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3aa4af000
pg_bufptr: aaab21044320
pg_hit_count: 1
page_cache_hdr[9]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3a926b000
pg_bufptr: aaab21045320
pg_hit_count: 10
page_cache_hdr[10]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3a9540000
pg_bufptr: aaab21046320
pg_hit_count: 2
page_cache_hdr[11]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3a9541000
pg_bufptr: aaab21047320
pg_hit_count: 9
page_cache_hdr[12]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3ab6e9000
pg_bufptr: aaab21048320
pg_hit_count: 1
page_cache_hdr[13]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3ab6ea000
pg_bufptr: aaab21049320
pg_hit_count: 1
page_cache_hdr[14]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3aa3f4000
pg_bufptr: aaab2104a320
pg_hit_count: 1
page_cache_hdr[15]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3aa3f5000
pg_bufptr: aaab2104b320
pg_hit_count: 1
page_cache_buf: aaab2103c320
evict_index: 14
evictions: 2734
accesses: 23443
cached_reads: 20693 (88%)
valid_pages: aaab2103a250
total_valid_pages: 154959
Thanks. so how does it work with this change?
--- a/diskdump.c
+++ b/diskdump.c
@@ -111,8 +111,7 @@ map_cpus_to_prstatus_kdump_cmprs(void)
if (pc->flags2 & QEMU_MEM_DUMP_COMPRESSED) /* notes exist for all cpus */
goto resize_note_pointers;
- if (!(online = get_cpus_online()) || (online == kt->cpus) ||
- machine_type("ARM64"))
+ if (!(online = get_cpus_online()) || (online == kt->cpus))
goto resize_note_pointers;
if (CRASHDEBUG(1))
Thanks. so how does it work with this change?
--- a/diskdump.c +++ b/diskdump.c @@ -111,8 +111,7 @@ map_cpus_to_prstatus_kdump_cmprs(void) if (pc->flags2 & QEMU_MEM_DUMP_COMPRESSED) /* notes exist for all cpus */ goto resize_note_pointers; - if (!(online = get_cpus_online()) || (online == kt->cpus) || - machine_type("ARM64")) + if (!(online = get_cpus_online()) || (online == kt->cpus)) goto resize_note_pointers; if (CRASHDEBUG(1))
I tried, but it does not work.
What is printed by help -D
with the patch?
What is printed by
help -D
with the patch?
sorry for late:
crash> help -D
diskdump_data:
filename: /var/crash/127.0.0.1-2021-12-31-01:38:10/vmcore
flags: 1c6 (KDUMP_CMPRS_LOCAL|ERROR_EXCLUDED|LZO_SUPPORTED|SNAPPY_SUPPORTED|ZSTD_SUPPORTED)
dfd: 3
ofp: ffff9f717510
machine_type: 183 (EM_AARCH64)
header: aaab02541e10
signature: "KDUMP "
header_version: 6
utsname:
sysname: Linux
nodename: localhost
release: 5.10.60-9.al8.aarch64
version: #1 SMP Mon Sep 6 20:56:34 CST 2021
machine: aarch64
domainname: (none)
timestamp:
tv_sec: 61cd7e17
tv_usec: 0
status: 2 (DUMP_DH_COMPRESSED_LZO)
block_size: 4096
sub_hdr_size: 2
bitmap_blocks: 262
max_mapnr: 4286464
total_ram_blocks: 0
device_blocks: 0
written_blocks: 0
current_cpu: 0
nr_cpus: 4
tasks[nr_cpus]: 0
0
0
0
sub_header: 0 (n/a)
sub_header_kdump: aaab02542e20
phys_base: 40000000
dump_level: 31 (0x1f) (DUMP_EXCLUDE_ZERO|DUMP_EXCLUDE_CACHE|DUMP_EXCLUDE_CACHE_PRI|DUMP_EXCLUDE_USER_DATA|DUMP_EXCLUDE_FREE)
split: 0
start_pfn: (unused)
end_pfn: (unused)
offset_vmcoreinfo: 5872 (0x16f0)
size_vmcoreinfo: 2885 (0xb45)
OSRELEASE=5.10.60-9.al8.aarch64
BUILD-ID=c7f4708939637fe3985ed53ecb1aad98b94c847a
PAGESIZE=4096
SYMBOL(init_uts_ns)=ffff8000117fa028
SYMBOL(node_online_map)=ffff8000117f1bd0
SYMBOL(swapper_pg_dir)=ffff8000113b2000
SYMBOL(_stext)=ffff8000100d0000
SYMBOL(vmap_area_list)=ffff800011bdb6a0
SYMBOL(mem_section)=ffff0003d4783200
LENGTH(mem_section)=1024
SIZE(mem_section)=16
OFFSET(mem_section.section_mem_map)=0
NUMBER(SECTION_SIZE_BITS)=30
NUMBER(MAX_PHYSMEM_BITS)=48
SIZE(page)=64
SIZE(pglist_data)=7680
SIZE(zone)=1472
SIZE(free_area)=88
SIZE(list_head)=16
SIZE(nodemask_t)=8
OFFSET(page.flags)=0
OFFSET(page._refcount)=52
OFFSET(page.mapping)=24
OFFSET(page.lru)=8
OFFSET(page._mapcount)=48
OFFSET(page.private)=40
OFFSET(page.compound_dtor)=16
OFFSET(page.compound_order)=17
OFFSET(page.compound_head)=8
OFFSET(pglist_data.node_zones)=0
OFFSET(pglist_data.nr_zones)=6944
OFFSET(pglist_data.node_start_pfn)=6952
OFFSET(pglist_data.node_spanned_pages)=6968
OFFSET(pglist_data.node_id)=6992
OFFSET(zone.free_area)=192
OFFSET(zone.vm_stat)=1280
OFFSET(zone.spanned_pages)=112
OFFSET(free_area.free_list)=0
OFFSET(list_head.next)=0
OFFSET(list_head.prev)=8
OFFSET(vmap_area.va_start)=0
OFFSET(vmap_area.list)=40
LENGTH(zone.free_area)=11
SYMBOL(prb)=ffff80001181f330
SYMBOL(printk_rb_static)=ffff80001181f370
SYMBOL(clear_seq)=ffff800011cfb9e0
SIZE(printk_ringbuffer)=80
OFFSET(printk_ringbuffer.desc_ring)=0
OFFSET(printk_ringbuffer.text_data_ring)=40
OFFSET(printk_ringbuffer.fail)=72
SIZE(prb_desc_ring)=40
OFFSET(prb_desc_ring.count_bits)=0
OFFSET(prb_desc_ring.descs)=8
OFFSET(prb_desc_ring.infos)=16
OFFSET(prb_desc_ring.head_id)=24
OFFSET(prb_desc_ring.tail_id)=32
SIZE(prb_desc)=24
OFFSET(prb_desc.state_var)=0
OFFSET(prb_desc.text_blk_lpos)=8
SIZE(prb_data_blk_lpos)=16
OFFSET(prb_data_blk_lpos.begin)=0
OFFSET(prb_data_blk_lpos.next)=8
SIZE(printk_info)=88
OFFSET(printk_info.seq)=0
OFFSET(printk_info.ts_nsec)=8
OFFSET(printk_info.text_len)=16
OFFSET(printk_info.caller_id)=20
OFFSET(printk_info.dev_info)=24
SIZE(dev_printk_info)=64
OFFSET(dev_printk_info.subsystem)=0
LENGTH(printk_info_subsystem)=16
OFFSET(dev_printk_info.device)=16
LENGTH(printk_info_device)=48
SIZE(prb_data_ring)=32
OFFSET(prb_data_ring.size_bits)=0
OFFSET(prb_data_ring.data)=8
OFFSET(prb_data_ring.head_lpos)=16
OFFSET(prb_data_ring.tail_lpos)=24
SIZE(atomic_long_t)=8
OFFSET(atomic_long_t.counter)=0
LENGTH(free_area.free_list)=5
NUMBER(NR_FREE_PAGES)=0
NUMBER(PG_lru)=4
NUMBER(PG_private)=13
NUMBER(PG_swapcache)=10
NUMBER(PG_swapbacked)=19
NUMBER(PG_slab)=9
NUMBER(PG_hwpoison)=22
NUMBER(PG_head_mask)=65536
NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)=-129
NUMBER(HUGETLB_PAGE_DTOR)=2
NUMBER(PAGE_OFFLINE_MAPCOUNT_VALUE)=-257
NUMBER(VA_BITS)=48
NUMBER(kimage_voffset)=0xffff7ffc67a00000
NUMBER(PHYS_OFFSET)=0x40000000
NUMBER(TCR_EL1_T1SZ)=0x10
KERNELOFFSET=c0000
NUMBER(KERNELPACMASK)=0x0
CRASHTIME=1640857111
offset_note: 4200 (0x1068)
size_note: 4560 (0x11d0)
notes_buf: aaab02543e30
num_vmcoredd_notes: 0
num_prstatus_notes: 8
notes[0]: 0
notes[1]: 0
notes[2]: 0
notes[3]: 0
notes[4]: aaab02543e30 (NT_PRSTATUS)
si.signo: 0 si.code: 0 si.errno: 0
cursig: 0 sigpend: 0 sighold: 0
pid: 1408 ppid: 0 pgrp: 0 sid:0
utime: 0.000000 stime: 0.000000
cutime: 0.000000 cstime: 0.000000
X0: ffff0000c8742800 X1: 0000000000000000 X2: ffff00036b6e90c0
X3: ffff800011bb22e8 X4: ffff00036b6e90c0 X5: 0000000000000000
X6: 000000000000000f X7: ffff80001181f550 X8: 0000000000000000
X9: ffff8000102448fc X10: 00000000ffff8000 X11: ffff800011adf550
X12: 0720072007200720 X13: 0720072007200720 X14: 0720072007200720
X15: ffff00036b6e9740 X16: 0000000000000000 X17: 0000000000000000
X18: 0000000000000030 X19: ffff00036b6e90c0 X20: ffff800011bb22a8
X21: 0000000000000000 X22: ffff800011e08000 X23: ffff80001329bab8
X24: ffff800011cf2000 X25: ffff800010cdcbc0 X26: 0000000000000000
X27: 0000000000000000 X28: ffff00036b6e90c0 X29: ffff80001329ba70
LR: ffff8000102448fc SP: ffff80001329ba70 PC: ffff8000102449d4
PSTATE: 60000085 FPVALID: 00000000
notes[5]: aaab02543fcc (NT_PRSTATUS)
si.signo: 0 si.code: 0 si.errno: 0
cursig: 0 sigpend: 0 sighold: 0
pid: 0 ppid: 0 pgrp: 0 sid:0
utime: 0.000000 stime: 0.000000
cutime: 0.000000 cstime: 0.000000
X0: 00000000000000e0 X1: ffff800011c60520 X2: 0000000000000001
X3: ffff80001097e240 X4: 0000000000000015 X5: 00ffffffffffffff
X6: 0000be9186c23431 X7: 00000010ab4a0098 X8: ffff0000c0398d20
X9: ffff80001097e268 X10: 0000000000000cc0 X11: 0000000000000000
X12: 0000000000000000 X13: 0000000000000000 X14: 0000000000000000
X15: 0000000000000000 X16: 0000000000000000 X17: 0000000000000000
X18: 0000000000000000 X19: 0000000000000001 X20: ffff800011c605a0
X21: ffff0003d4738600 X22: ffff800011c60520 X23: 0000000000000001
X24: 000001b6696821aa X25: 0000000000000000 X26: 0000000000000000
X27: 0000000000000000 X28: 0000000000000000 X29: ffff800011f73e90
LR: ffff800010c0b0a0 SP: ffff800011f73e90 PC: ffff800010c0b0a8
PSTATE: 60c00005 FPVALID: 00000000
notes[6]: aaab02544168 (NT_PRSTATUS)
si.signo: 0 si.code: 0 si.errno: 0
cursig: 0 sigpend: 0 sighold: 0
pid: 0 ppid: 0 pgrp: 0 sid:0
utime: 0.000000 stime: 0.000000
cutime: 0.000000 cstime: 0.000000
X0: 00000000000000e0 X1: ffff800011c60520 X2: 0000000000000001
X3: ffff80001097e240 X4: 0000000000000015 X5: 00ffffffffffffff
X6: 0000be9186c23431 X7: 0000000d7156c757 X8: ffff0000c039d020
X9: ffff80001097e268 X10: 0000000000000cc0 X11: 0000000000000000
X12: 0000000000000000 X13: 0000000000000000 X14: 0000000000000000
X15: 0000000000000000 X16: 0000000000000000 X17: 0000000000000000
X18: 0000000000000000 X19: 0000000000000001 X20: ffff800011c605a0
X21: ffff0003d4759600 X22: ffff800011c60520 X23: 0000000000000001
X24: 000001b666fedcd8 X25: 0000000000000000 X26: 0000000000000000
X27: 0000000000000000 X28: 0000000000000000 X29: ffff800011f7be90
LR: ffff800010c0b0a0 SP: ffff800011f7be90 PC: ffff800010c0b0a8
PSTATE: 60c00005 FPVALID: 00000000
notes[7]: aaab02544304 (NT_PRSTATUS)
si.signo: 0 si.code: 0 si.errno: 0
cursig: 0 sigpend: 0 sighold: 0
pid: 0 ppid: 0 pgrp: 0 sid:0
utime: 0.000000 stime: 0.000000
cutime: 0.000000 cstime: 0.000000
X0: 00000000000000e0 X1: ffff800011c60520 X2: 0000000000000001
X3: ffff80001097e240 X4: 0000000000000015 X5: 00ffffffffffffff
X6: 0000be9186c23431 X7: 00000012126509af X8: ffff0000c039e0e0
X9: ffff80001097e268 X10: 0000000000000cc0 X11: 0000000000000000
X12: 0000000000000000 X13: 0000000000000000 X14: 0000000000000000
X15: 0000000000000000 X16: 0000000000000000 X17: 0000000000000000
X18: 0000000000000000 X19: 0000000000000001 X20: ffff800011c605a0
X21: ffff0003d477a600 X22: ffff800011c60520 X23: 0000000000000001
X24: 000001b669615082 X25: 0000000000000000 X26: 0000000000000000
X27: 0000000000000000 X28: 0000000000000000 X29: ffff800011f83e90
LR: ffff800010c0b0a0 SP: ffff800011f83e90 PC: ffff800010c0b0a8
PSTATE: 60c00005 FPVALID: 00000000
snapshot_task: 0
num_qemu_notes: 0
NOTE offsets: 1068 (NT_PRSTATUS)
1204 (NT_PRSTATUS)
13a0 (NT_PRSTATUS)
153c (NT_PRSTATUS)
offset_eraseinfo: 0 (0x0)
size_eraseinfo: 0 (0x0)
start_pfn_64: (unused)
end_pfn_64: (unused)
max_mapnr_64: 4286464 (0x416800)
data_offset: 109000
block_size: 4096
block_shift: 12
bitmap: ffff9f48f010
bitmap_len: 1073152
max_mapnr: 4286464 (0x416800)
dumpable_bitmap: ffff9f388010
byte: 0
bit: 0
compressed_page: aaab0256f330
curbufptr: aaab0256c320
page_cache_hdr[0]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3aa3f6000
pg_bufptr: aaab0255f320
pg_hit_count: 1
page_cache_hdr[1]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3aa3f7000
pg_bufptr: aaab02560320
pg_hit_count: 1
page_cache_hdr[2]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3aa3f8000
pg_bufptr: aaab02561320
pg_hit_count: 1
page_cache_hdr[3]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3aa3f9000
pg_bufptr: aaab02562320
pg_hit_count: 1
page_cache_hdr[4]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3aa3fa000
pg_bufptr: aaab02563320
pg_hit_count: 1
page_cache_hdr[5]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3aa3fb000
pg_bufptr: aaab02564320
pg_hit_count: 1
page_cache_hdr[6]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3aa3fc000
pg_bufptr: aaab02565320
pg_hit_count: 1
page_cache_hdr[7]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3aa3fd000
pg_bufptr: aaab02566320
pg_hit_count: 1
page_cache_hdr[8]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3aa4af000
pg_bufptr: aaab02567320
pg_hit_count: 1
page_cache_hdr[9]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3a926b000
pg_bufptr: aaab02568320
pg_hit_count: 10
page_cache_hdr[10]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3a9540000
pg_bufptr: aaab02569320
pg_hit_count: 2
page_cache_hdr[11]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3a9541000
pg_bufptr: aaab0256a320
pg_hit_count: 9
page_cache_hdr[12]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3ab6e9000
pg_bufptr: aaab0256b320
pg_hit_count: 1
page_cache_hdr[13]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3ab6ea000
pg_bufptr: aaab0256c320
pg_hit_count: 1
page_cache_hdr[14]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3aa3f4000
pg_bufptr: aaab0256d320
pg_hit_count: 1
page_cache_hdr[15]:
pg_flags: 1 (PAGE_VALID)
pg_addr: 3aa3f5000
pg_bufptr: aaab0256e320
pg_hit_count: 1
page_cache_buf: aaab0255f320
evict_index: 14
evictions: 2734
accesses: 23443
cached_reads: 20693 (88%)
valid_pages: aaab0255d250
total_valid_pages: 154959
Thanks, it looks correctly mapped.
num_prstatus_notes: 8
notes[0]: 0
notes[1]: 0
notes[2]: 0
notes[3]: 0
notes[4]: aaab02543e30 (NT_PRSTATUS)
...
I tried, but it does not work.
What errors do you see? The same segfault by bt -c 0
?
Thanks, it looks correctly mapped.
num_prstatus_notes: 8 notes[0]: 0 notes[1]: 0 notes[2]: 0 notes[3]: 0 notes[4]: aaab02543e30 (NT_PRSTATUS) ...
I tried, but it does not work.
What errors do you see? The same segfault by
bt -c 0
?
yes.
crash> bt -c 1
PID: 0 TASK: ffff0000c03510c0 CPU: 1 COMMAND: "swapper/1"
#0 [ffff800011f73e90] arch_cpu_idle at ffff800010c0b0a4
crash> bt -c 2
PID: 0 TASK: ffff0000c039a180 CPU: 2 COMMAND: "swapper/2"
#0 [ffff800011f7be90] arch_cpu_idle at ffff800010c0b0a4
crash> bt -c 2
PID: 0 TASK: ffff0000c039a180 CPU: 2 COMMAND: "swapper/2"
#0 [ffff800011f7be90] arch_cpu_idle at ffff800010c0b0a4
crash> bt -c 3
PID: 0 TASK: ffff0000c039b240 CPU: 3 COMMAND: "swapper/3"
#0 [ffff800011f83e90] arch_cpu_idle at ffff800010c0b0a4
crash> bt -c 0
PID: 0 TASK: ffff8000117fa240 CPU: 0 COMMAND: "swapper/0"
Segmentation fault (core dumped)
So perhaps dd->nt_prstatus_percpu
is not the cause.
Is it possible to debug where crash fails?
I don't have an arm machine and cannot reproduce this.
So perhaps
dd->nt_prstatus_percpu
is not the cause. Is it possible to debug where crash fails? I don't have an arm machine and cannot reproduce this.
following is gdb info, the panic_task_regs[0] is same as panic_task_regs[4], while cpu 0 does not save crash_notes, it shound be empty.
#0 arm64_is_kernel_exception_frame (bt=bt@entry=0xffffcd47d9f8, stkptr=stkptr@entry=18446603336542697776) at arm64.c:1925
1925 if (INSTACK(regs->sp, bt) && INSTACK(regs->regs[29], bt) &&
[Current thread is 1 (Thread 0xffff8aa7f010 (LWP 128066))]
(gdb) bt
#0 arm64_is_kernel_exception_frame (bt=bt@entry=0xffffcd47d9f8, stkptr=stkptr@entry=18446603336542697776) at arm64.c:1925
#1 0x0000aaaab26b2ef4 in arm64_back_trace_cmd (bt=0xffffcd47d9f8) at arm64.c:2760
#2 0x0000aaaab2684058 in back_trace (bt=0xffffcd47d9f8) at kernel.c:3186
#3 0x0000aaaab2685be4 in cmd_bt () at kernel.c:2789
#4 0x0000aaaab25fe2fc in exec_command () at main.c:892
#5 0x0000aaaab25fe5b8 in main_loop () at main.c:839
#6 0x0000aaaab292216c in captured_main (data=data@entry=0xffffcd47e1e0) at main.c:1284
#7 gdb_main (args=args@entry=0xffffcd47e220) at main.c:1313
#8 0x0000aaaab292225c in gdb_main_entry (argc=<optimized out>, argv=<optimized out>) at main.c:1338
#9 0x0000aaaab25f873c in main (argc=3, argv=0xffffcd47e418) at main.c:720
(gdb) p machdep->machspec->panic_task_regs[0]
$1 = {{user_regs = {regs = {18446462602095896576, 0, 18446462613420150976, 18446603336518673128, 18446462613420150976, 0, 15,
18446603336514925904, 0, 18446603336492009724, 4294934528, 18446603336517809488, 513418191660123936, 513418191660123936,
513418191660123936, 18446462613420152640, 0, 0, 48, 18446462613420150976, 18446603336518673064, 0, 18446603336521121792,
18446603336542698168, 18446603336519983104, 18446603336503118784, 0, 0, 18446462613420150976, 18446603336542698096,
18446603336492009724}, sp = 18446603336542698096, pc = 18446603336492009940, pstate = 1610612869}, {regs = {18446462602095896576,
0, 18446462613420150976, 18446603336518673128, 18446462613420150976, 0, 15, 18446603336514925904, 0, 18446603336492009724,
4294934528, 18446603336517809488, 513418191660123936, 513418191660123936, 513418191660123936, 18446462613420152640, 0, 0, 48,
18446462613420150976, 18446603336518673064, 0, 18446603336521121792, 18446603336542698168, 18446603336519983104,
18446603336503118784, 0, 0, 18446462613420150976, 18446603336542698096, 18446603336492009724}, sp = 18446603336542698096,
pc = 18446603336492009940, pstate = 1610612869}}, orig_x0 = 0, syscallno = 0}
(gdb) p machdep->machspec->panic_task_regs[4]
$2 = {{user_regs = {regs = {18446462602095896576, 0, 18446462613420150976, 18446603336518673128, 18446462613420150976, 0, 15,
18446603336514925904, 0, 18446603336492009724, 4294934528, 18446603336517809488, 513418191660123936, 513418191660123936,
513418191660123936, 18446462613420152640, 0, 0, 48, 18446462613420150976, 18446603336518673064, 0, 18446603336521121792,
18446603336542698168, 18446603336519983104, 18446603336503118784, 0, 0, 18446462613420150976, 18446603336542698096,
18446603336492009724}, sp = 18446603336542698096, pc = 18446603336492009940, pstate = 1610612869}, {regs = {18446462602095896576,
0, 18446462613420150976, 18446603336518673128, 18446462613420150976, 0, 15, 18446603336514925904, 0, 18446603336492009724,
4294934528, 18446603336517809488, 513418191660123936, 513418191660123936, 513418191660123936, 18446462613420152640, 0, 0, 48,
18446462613420150976, 18446603336518673064, 0, 18446603336521121792, 18446603336542698168, 18446603336519983104,
18446603336503118784, 0, 0, 18446462613420150976, 18446603336542698096, 18446603336492009724}, sp = 18446603336542698096,
pc = 18446603336492009940, pstate = 1610612869}}, orig_x0 = 0, syscallno = 0}
So perhaps
dd->nt_prstatus_percpu
is not the cause. Is it possible to debug where crash fails? I don't have an arm machine and cannot reproduce this.
I have some debug information: dd->nt_prstatus_percpu is correctly mapped in map_cpus_to_prstatus_kdump_cmprs with your patch. but it is later than machdep->machspec->panic_task_regs saved. machdep->machspec->panic_task_regs is saved from arm64_get_crash_notes called from arm64_init. while map_cpus_to_prstatus_kdump_cmprs is called from task_init. task_init is called later than machdep_init(arm64_init).
I got a vmcore with the same backtrace from customers, the "bt -a" may trigger the segfault on this specific vmcore. It seems to be a similar case.
task_init is called later than machdep_init(arm64_init).
Thanks for debugging.
How does this work with the patch above?
--- a/arm64.c
+++ b/arm64.c
@@ -472,7 +472,7 @@ arm64_init(int when)
arm64_stackframe_init();
break;
- case POST_VM:
+ case POST_INIT:
/*
* crash_notes contains machine specific information about the
* crash. In particular, it contains CPU registers at the time
map_cpus_to_prstatus_kdump_cmprs
yes, it works, with previous patch for map_cpus_to_prstatus_kdump_cmprs.
Thanks for testing. It's just an idea, will check if there is a better way.
@lian-bo, could you test this patch with the vmcore you got? I think I will go with this.
--- a/arm64.c
+++ b/arm64.c
@@ -472,7 +472,7 @@ arm64_init(int when)
arm64_stackframe_init();
break;
- case POST_VM:
+ case POST_INIT:
/*
* crash_notes contains machine specific information about the
* crash. In particular, it contains CPU registers at the time
diff --git a/diskdump.c b/diskdump.c
index 3e1cfd548c96..d5674276e1fd 100644
--- a/diskdump.c
+++ b/diskdump.c
@@ -111,8 +111,7 @@ map_cpus_to_prstatus_kdump_cmprs(void)
if (pc->flags2 & QEMU_MEM_DUMP_COMPRESSED) /* notes exist for all cpus */
goto resize_note_pointers;
- if (!(online = get_cpus_online()) || (online == kt->cpus) ||
- machine_type("ARM64"))
+ if (!(online = get_cpus_online()) || (online == kt->cpus))
goto resize_note_pointers;
if (CRASHDEBUG(1))
Sure. Crash got a lot of warnings when running the bt command on my vmcore, but I have no much time to investigate the details. As you know, currently I'm working on another issues. I will help to test it further once I have time. But anyway, could you post it to upstream firstly? We can continue to talk about it there. Thanks.
ok, will post.
my platform is aarch64 with kernel version 5.10.23, crash 7.2.9, kexec-tools 2.0.21, makedumpfile 1.6.9 when system crashes, cpu 0 and some other cpus are failed to stop ,following is some information about vmcore. cpu126 is the panic cpu, and cpu 1 also failed to stop. use help -D to get vmcore info, found that only one elf note parsed from vmcore, which should be cpu126, because other cpus are failed to stop, and only cpu126 can show backtrace.
I found that arm64_get_crash_notes function, when get crash_notes failed, it will change to call diskdump_get_prstatus_percpu to get elf note from nt_prstatus_percpu, cpu0 will get dd->nt_prstatus_percpu[0] as note.
dd->nt_prstatus_percpu is parsed from vmcore for each cpu , when cpu offline or stop failed before crash, crash notes or elf notes failed to be saved, therefore use cpu as index to get note from dd->nt_prstatus_percpu will be wrong.
Any ideas to avoid to get wrong note for offline cpu or cpu failed to save notes ?