Closed smokris closed 1 year ago
This has something to do with illumos#14892. I think the VERIFY() there may be too strong. I'll have to consult with Patrick about it.
IFF (sic.) the VERIFY() is indeed too strong (i.e. p->p_sigfd can legitimately be NULL here), here's the fix:
diff --git a/usr/src/uts/common/io/signalfd.c b/usr/src/uts/common/io/signalfd.c
index 601afce398..c9be797f58 100644
--- a/usr/src/uts/common/io/signalfd.c
+++ b/usr/src/uts/common/io/signalfd.c
@@ -404,10 +404,11 @@ signalfd_pollers_dissociate(signalfd_state_t *state)
}
VERIFY3P(sp->sp_proc, ==, p);
VERIFY3P(sp->sp_state, ==, state);
- VERIFY3P(p->p_sigfd, !=, NULL);
- sigfd_proc_state_t *pstate = p->p_sigfd;
- list_remove(&pstate->sigfd_list, sp);
+ if (p->p_sigfd != NULL) {
+ sigfd_proc_state_t *pstate = p->p_sigfd;
+ list_remove(&pstate->sigfd_list, sp);
+ }
sp->sp_proc = NULL;
/* Wake any lingering pollers referencing the pollhead */
Otherwise the fix will be more complicated.
Fixed in -gate and in 20230209 respin.
Thanks, @danmcd!
Via email, Dan provided MDB commands that identified that the process leading to this crash was systemd-journald
. I confirmed that I was able to reproduce the panic on 20230209T001143Z by running systemctl restart systemd-journald
in the LX zone. After updating to 20230214T155114Z, I'm no longer able to reproduce the panic.
This morning, one of our systems panicked for the first time in 6 months:
::msgbuf
```console > ::msgbuf MESSAGE SmartOS Version joyent_20230209T001143Z 64-bit Copyright 2022-2023 MNX Cloud, Inc. x86_feature: lgpg x86_feature: tsc x86_feature: msr x86_feature: mtrr x86_feature: pge x86_feature: de x86_feature: cmov x86_feature: mmx x86_feature: mca x86_feature: pae x86_feature: cv8 x86_feature: pat x86_feature: sep x86_feature: sse x86_feature: sse2 x86_feature: htt x86_feature: asysc x86_feature: nx x86_feature: sse3 x86_feature: cx16 x86_feature: cmp x86_feature: tscp x86_feature: mwait x86_feature: cpuid x86_feature: ssse3 x86_feature: sse4_1 x86_feature: sse4_2 x86_feature: clfsh x86_feature: 64 x86_feature: vmx x86_feature: ibrs x86_feature: ibpb x86_feature: stibp x86_feature: ssbd x86_feature: flush_cmd x86_feature: core_thermal x86_feature: lfence_serializing mem = 23058560K (0x57f620000) TSC calibrated using PIT; freq is 1599 MHz ACPI: RSDP 0x00000000000FA9C0 000024 (v02 ACPIAM) ACPI: XSDT 0x00000000DF790100 00007C (v01 SMCI 20131107 MSFT 00000097) ACPI: FACP 0x00000000DF790290 0000F4 (v03 110713 FACP1130 20131107 MSFT 00000097) Firmware Warning (ACPI): 32/64X length mismatch in FADT/Gpe0Block: 128/64 (20180629/tbfadt-750) ACPI: DSDT 0x00000000DF790630 0067B2 (v01 1F580 1F580000 00000000 INTL 20051117) ACPI: FACS 0x00000000DF79E000 000040 ACPI: APIC 0x00000000DF790390 0000D2 (v01 110713 APIC1130 20131107 MSFT 00000097) ACPI: MCFG 0x00000000DF790470 00003C (v01 110713 OEMMCFG 20131107 MSFT 00000097) ACPI: OEMB 0x00000000DF79E040 00007B (v01 110713 OEMB1130 20131107 MSFT 00000097) ACPI: HPET 0x00000000DF79A630 000038 (v01 110713 OEMHPET 20131107 MSFT 00000097) ACPI: DMAR 0x00000000DF79E0C0 000128 (v01 AMI OEMDMAR 00000001 MSFT 00000097) ACPI: SSDT 0x00000000DF79E6D0 000363 (v01 DpgPmm CpuPm 00000012 INTL 20051117) ACPI: EINJ 0x00000000DF79A670 000130 (v01 AMIER AMI_EINJ 20131107 MSFT 00000097) ACPI: BERT 0x00000000DF79A800 000030 (v01 AMIER AMI_BERT 20131107 MSFT 00000097) ACPI: ERST 0x00000000DF79A830 000210 (v01 AMIER AMI_ERST 20131107 MSFT 00000097) ACPI: HEST 0x00000000DF79AA40 0000A8 (v01 AMIER ABC_HEST 20131107 MSFT 00000097) ACPI: 2 ACPI AML tables successfully acquired and loaded ACPI: Enabled 1 GPEs in block 00 to 3F SMBIOS v2.6 loaded (1876 bytes) Skipping psm: xpv_psm root nexus = i86pc NOTICE: iommulib_nexus_register: rootnex-1: Succesfully registered NEXUS i86pc nexops=fffffffffbd81fe0 pseudo0 at root pseudo0 is /pseudo scsi_vhci0 at root scsi_vhci0 is /scsi_vhci NOTICE: reprogram io-range on ppb[0/1/0]: 0x2000 ~ 0x2fff NOTICE: reprogram mem-range on ppb[0/1/0]: 0xf0000000 ~ 0xf00fffff NOTICE: reprogram io-range on ppb[0/3/0]: 0x3000 ~ 0x3fff NOTICE: reprogram mem-range on ppb[0/3/0]: 0xf0100000 ~ 0xf01fffff NOTICE: reprogram io-range on ppb[0/5/0]: 0x4000 ~ 0x4fff NOTICE: reprogram mem-range on ppb[0/5/0]: 0xf0200000 ~ 0xf02fffff NOTICE: reprogram io-range on ppb[0/7/0]: 0x5000 ~ 0x5fff NOTICE: reprogram mem-range on ppb[0/7/0]: 0xf0300000 ~ 0xf03fffff NOTICE: reprogram io-range on ppb[0/9/0]: 0x6000 ~ 0x6fff NOTICE: reprogram mem-range on ppb[0/9/0]: 0xf0400000 ~ 0xf04fffff Reading Intel IOMMU boot options npe0 at root: space 0 offset 0 npe0 is /pci@0,0 PCI Express-device: isa@1f, isa0 NOTICE: apic: local nmi: 255 0x0 1 NOTICE: apic: Using ACPI (MADT) for SMP configuration NOTICE: apic: Using APIC interrupt routing mode NOTICE: amd_iommu: No AMD IOMMU ACPI IVRS table pseudo-device: acpippm0 acpippm0 is /pseudo/acpippm@0 pseudo-device: ppm0 ppm0 is /pseudo/ppm@0 ramdisk0 at root ramdisk0 is /ramdisk root on /ramdisk:a fstype ufs ACPI: Dynamic OEM Table Load: ACPI: SSDT 0xFFFFFE1F2F1D6A88 0004D5 (v01 PmRef P001Cst 00003001 INTL 20051117) NOTICE: SpeedStep support is being disabled due to errors parsing ACPI P-state objects exported by BIOS. acpinex0 at root acpinex0 is /fw pseudo-device: dld0 dld0 is /pseudo/dld@0 PCI Express-device: pci8086,3a40@1c, pcieb0 pcieb0 is /pci@0,0/pci8086,3a40@1c PCI Express-device: pci8086,3a42@1c,1, pcieb1 pcieb1 is /pci@0,0/pci8086,3a42@1c,1 PCI Express-device: pci8086,244e@1e, pci_pci0 pci_pci0 is /pci@0,0/pci8086,244e@1e PCI Express-device: pci15d9,f580@1a,7, ehci0 ehci0 is /pci@0,0/pci15d9,f580@1a,7 PCI Express-device: pci15d9,f580@1d,7, ehci1 ehci1 is /pci@0,0/pci15d9,f580@1d,7 PCI Express-device: pci15d9,f580@1a, uhci0 uhci0 is /pci@0,0/pci15d9,f580@1a PCI Express-device: pci15d9,f580@1a,1, uhci1 uhci1 is /pci@0,0/pci15d9,f580@1a,1 PCI Express-device: pci15d9,f580@1a,2, uhci2 uhci2 is /pci@0,0/pci15d9,f580@1a,2 PCI Express-device: pci15d9,f580@1d, uhci3 uhci3 is /pci@0,0/pci15d9,f580@1d PCI Express-device: pci15d9,f580@1d,1, uhci4 uhci4 is /pci@0,0/pci15d9,f580@1d,1 PCI Express-device: pci15d9,f580@1d,2, uhci5 uhci5 is /pci@0,0/pci15d9,f580@1d,2 8042 device: keyboard@0, kb8042 # 0 kb80420 is /pci@0,0/isa@1f/i8042@1,60/keyboard@0 cpu0: x86 (chipid 0x0 GenuineIntel 106A5 family 6 model 26 step 5 clock 1600 MHz) cpu0: Intel(r) Xeon(r) CPU E5506 @ 2.13GHz KPTI enabled (PCID not supported, INVPCID not supported) cpu1: microcode has been updated from version 0x16 to 0x1d cpu1: x86 (chipid 0x0 GenuineIntel 106A5 family 6 model 26 step 5 clock 1600 MHz) cpu1: Intel(r) Xeon(r) CPU E5506 @ 2.13GHz cpu1 initialization complete - online cpu2: microcode has been updated from version 0x16 to 0x1d cpu2: x86 (chipid 0x0 GenuineIntel 106A5 family 6 model 26 step 5 clock 1600 MHz) cpu2: Intel(r) Xeon(r) CPU E5506 @ 2.13GHz cpu2 initialization complete - online cpu3: microcode has been updated from version 0x16 to 0x1d cpu3: x86 (chipid 0x0 GenuineIntel 106A5 family 6 model 26 step 5 clock 1600 MHz) cpu3: Intel(r) Xeon(r) CPU E5506 @ 2.13GHz cpu3 initialization complete - online USB 2.0 device (usb781,5571) operating at hi speed (USB 2.x) on USB 2.0 root hub: storage@2, scsa2usb0 at bus address 2 SanDisk Firebird USB Flash Drive 4C532000031128119340 scsa2usb0 is /pci@0,0/pci15d9,f580@1a,7/storage@2 /pci@0,0/pci15d9,f580@1a,7/storage@2 (scsa2usb0) online sd0 at scsa2usb0: target 0 lun 0 sd0 is /pci@0,0/pci15d9,f580@1a,7/storage@2/disk@0,0 /pci@0,0/pci15d9,f580@1a,7/storage@2/disk@0,0 (sd0) online pseudo-device: audio0 audio0 is /pseudo/audio@0 USB 2.0 device (usb51d,2) operating at full speed (USB 1.x) on USB 1.10 root hub: input@1, ugen0 at bus address 2 American Power Conversion Back-UPS XS 1500M FW:947.d10 .D USB FW:d10 3B1903X69241 ugen0 is /pci@0,0/pci15d9,f580@1a/input@1 /pci@0,0/pci15d9,f580@1a/input@1 (ugen0) online pseudo-device: stmf_sbd0 stmf_sbd0 is /pseudo/stmf_sbd@0 pseudo-device: lofi0 lofi0 is /pseudo/lofi@0 pseudo-device: lofi1 lofi1 is /pseudo/lofi@1 /pseudo/lofi@1 (lofi1) online pseudo-device: dtrace0 dtrace0 is /pseudo/dtrace@0 pseudo-device: ucode0 ucode0 is /pseudo/ucode@0 pseudo-device: devinfo0 devinfo0 is /pseudo/devinfo@0 pci0 at root: space f7 offset 0 pci0 is /pci@f7,0 iscsi0 at root iscsi0 is /iscsi acpinex: sb@0, acpinex1 acpinex1 is /fw/sb@0 pseudo-device: pseudo1 pseudo1 is /pseudo/zconsnex@1 pseudo-device: pseudo2 pseudo2 is /pseudo/zfdnex@2 ISA-device: pit_beep0 pit_beep0 is /pci@0,0/isa@1f/pit_beep IDE device at targ 0, lun 0 lastlun 0x0 model HUS726060ALA640 ATA/ATAPI-9 supported, majver 0x3fc minver 0x29 IDE device at targ 0, lun 0 lastlun 0x0 model HUS726060ALA640 ATA/ATAPI-9 supported, majver 0x3fc minver 0x29 PCI Express-device: ide@1, ata1 ata1 is /pci@0,0/pci-ide@1f,5/ide@1 UltraDMA mode 5 selected IDE device at targ 1, lun 0 lastlun 0x0 model HUS726060ALA640 ATA/ATAPI-9 supported, majver 0x3fc minver 0x29 PCI Express-device: ide@0, ata2 ata2 is /pci@0,0/pci-ide@1f,2/ide@0 Disk0:(I can't share the full
vmdump
since it contains user data, but I'd be happy to run additionalmdb
commands if needed.)