AmpereComputing / ampere-lts-kernel

Linux 5.4 and 5.10 Longterm kernel (LTS) with Ampere patches
21 stars 17 forks source link

Backport Pseudo-NMI based hard lockup detector #181

Open adamliyi opened 1 year ago

adamliyi commented 1 year ago

https://patchwork.kernel.org/project/linux-arm-kernel/cover/20220903093415.15850-1-lecopzer.chen@mediatek.com/

Backport to 5.15.y tree, and test.

adamliyi commented 1 year ago

Backported, PR here: https://github.com/AmpereComputing/ampere-lts-kernel/pull/182

Kernel config:

set CONFIG_ARM64_PSEUDO_NMI=y enable CONFIG_HARDLOCKUP_DETECTOR=y set CONFIG_LKDTM=y for test add irqchip.gicv3_pseudo_nmi=1 in cmdline

boot system, dmesg shows:

[root@adam_mj_cent83 ~]# dmesg | grep -i nmi
[    0.000000] Kernel command line: BOOT_IMAGE=(hd3,gpt2)/vmlinuz-5.15.23+ root=/dev/mapper/cl-root ro rd.lvm.lv=cl/root rd.lvm.lv=cl/swap earlycon=pl011,mmio32,0x100002600000 crashkernel=768M@0x400100000000 crash_kexec_post_notifiers kpti=off irqchip.gicv3_pseudo_nmi=1
[    0.000000] GICv3: Pseudo-NMIs enabled using relaxed ICC_PMR_EL1 synchronisation
[   13.636029] hw perfevents: enabled with armv8_pmuv3_0 PMU driver, 7 counters available, using NMIs
[   13.684007] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.

Run: echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT

[  757.732740] lkdtm: Performing direct entry HARDLOCKUP
[  806.732804] NMI watchdog: Watchdog detected hard LOCKUP on cpu 87
[  806.732809] Modules linked in: nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink tun bridge stp llc rfkill rpcrdma rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_umad ib_iser ib_ipoib rdma_cm iw_cm ib_cm libiscsi scsi_transport_iscsi cdc_ether usbnet mii mlx5_ib crct10dif_ce ghash_ce gf128mul acpi_ipmi sha1_ce ib_uverbs sbsa_gwdt ipmi_ssif watchdog ib_core ipmi_devintf ipmi_msghandler xgene_hwmon nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c sr_mod cdrom sg ast drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core fb_sys_fops mlxfw drm_ttm_helper igb psample uas ttm ptp nvme usb_storage i2c_algo_bit drm nvme_core pps_core i2c_designware_platform i2c_designware_core dm_mod i2c_dev fuse sha256_generic libsha256 sha2_ce sha256_arm64
[  806.732896] CPU: 87 PID: 4499 Comm: bash Not tainted 5.15.23+ #4
[  806.732899] Hardware name: WIWYNN Mt.Jade Server System B81.030Z1.0007/Mt.Jade Motherboard, BIOS 2.10.20220531 (SCP: 2.10.20220531) 2022/05/31
[  806.732903] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  806.732906] pc : lkdtm_HARDLOCKUP+0x20/0x28
[  806.732921] lr : lkdtm_do_action+0x2c/0x38
[  806.732924] sp : ffff80002d923cf0
[  806.732925] pmr_save: 00000060
[  806.732926] x29: ffff80002d923cf0 x28: ffff3fffac031140 x27: 0000000000000000
[  806.732928] x26: 0000000000000000 x25: ffff80002d923e10 x24: 000000000000000b
[  806.732930] x23: 000000000000001d x22: ffffcac8da13f5e0 x21: ffff3fffaa3cb000
[  806.732932] x20: ffffcac8da414fa8 x19: 000000000000001e x18: 0000000000000010
[  806.732934] x17: 0000000000000000 x16: 0000000000000000 x15: ffffffffffffffff
[  806.732935] x14: 0000000000000000 x13: 50554b434f4c4452 x12: ffff460effbbbfe8
[  806.732937] x11: 0000000000000003 x10: ffff460effa3bfa8 x9 : ffffcac8d9c442d4
[  806.732939] x8 : 000000000017ffe8 x7 : 0000000000000000 x6 : ffff400f40c8a108
[  806.732941] x5 : ffff400f40c8a108 x4 : 0000000000000000 x3 : ffff400f40c95cf0
[  806.732942] x2 : 2f932eb80a167d00 x1 : 0000000000000000 x0 : 0000000000000060
[  806.732945] Call trace:
[  806.732947]  lkdtm_HARDLOCKUP+0x20/0x28
[  806.732948]  direct_entry+0x148/0x1c8
[  806.732950]  full_proxy_write+0x78/0xa0
[  806.732960]  vfs_write+0x118/0x2a8
[  806.732969]  ksys_write+0x70/0xf0
[  806.732971]  __arm64_sys_write+0x24/0x30
[  806.732973]  invoke_syscall+0x7c/0x100
[  806.732981]  el0_svc_common.constprop.3+0x170/0x1a0
[  806.732984]  do_el0_svc+0x68/0x80
[  806.732985]  el0_svc+0x68/0xd8
[  806.732992]  el0t_64_sync_handler+0x40/0xb8
[  806.732995]  el0t_64_sync+0x180/0x184