freebsd / drm-kmod

drm driver for FreeBSD
148 stars 68 forks source link

kernel panic from `intel_gt_sysfs_get_drvdata` after `sysctl -a` (INVARIANTS kernel, master branch 6.1) #280

Open emaste opened 5 months ago

emaste commented 5 months ago

Describe the bug panic in strncmp() from intel_gt_sysfs_get_drvdata() upon sysctl -a


Fatal trap 12: page fault while in kernel mode
cpuid = 6; apic id = 06
fault virtual address   = 0x0
fault code      = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80fbf610
stack pointer           = 0x28:0xfffffe0155d3eb40
frame pointer           = 0x28:0xfffffe0155d3eb40
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process     = 3758 (sysctl)
rdi: 0000000000000000 rsi: ffffffff83dc3f1d rdx: 0000000000000001
rcx: 0000000000000000  r8: 0000000000000000  r9: 0000000000010000
rax: ffffffff813456e0 rbx: fffffe01418f6e88 rbp: fffffe0155d3eb40
r10: 0000000000000001 r11: ffffffff83d38f80 r12: 0000000000000013
r13: fffffe0155d3ecc0 r14: ffffffff83df5950 r15: fffffe01418f6e88
trap number     = 12
panic: page fault
cpuid = 6
time = 1705955573
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0155d3e810
vpanic() at vpanic+0x132/frame 0xfffffe0155d3e940
panic() at panic+0x43/frame 0xfffffe0155d3e9a0
trap_fatal() at trap_fatal+0x40c/frame 0xfffffe0155d3ea00
trap_pfault() at trap_pfault+0xae/frame 0xfffffe0155d3ea70
calltrap() at calltrap+0x8/frame 0xfffffe0155d3ea70
--- trap 0xc, rip = 0xffffffff80fbf610, rsp = 0xfffffe0155d3eb40, rbp = 0xfffffe0155d3eb40 ---
strncmp() at strncmp+0x10/frame 0xfffffe0155d3eb40
intel_gt_sysfs_get_drvdata() at intel_gt_sysfs_get_drvdata+0x1e/frame 0xfffffe0155d3eb60
throttle_reason_bool_show() at throttle_reason_bool_show+0x15/frame 0xfffffe0155d3eb80
sysctl_handle_attr() at sysctl_handle_attr+0x73/frame 0xfffffe0155d3ebd0
sysctl_root_handler_locked() at sysctl_root_handler_locked+0xa2/frame 0xfffffe0155d3ec20
sysctl_root() at sysctl_root+0x22e/frame 0xfffffe0155d3eca0
userland_sysctl() at userland_sysctl+0x184/frame 0xfffffe0155d3ed50
sys___sysctl() at sys___sysctl+0x5c/frame 0xfffffe0155d3ee00
amd64_syscall() at amd64_syscall+0x15e/frame 0xfffffe0155d3ef30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0155d3ef30
--- syscall (202, FreeBSD ELF64, __sysctl), rip = 0x157fae7cf4fa, rsp = 0x157fadb1b868, rbp = 0x157fadb1b8a0 ---

FreeBSD version Paste the output of uname -aKU

PCI Info

hostb0@pci0:0:0:0:  class=0x060000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x9a14 subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = '11th Gen Core Processor Host Bridge/DRAM Registers'
    class      = bridge
    subclass   = HOST-PCI
vgapci0@pci0:0:2:0: class=0x030000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x9a49 subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = 'TigerLake-LP GT2 [Iris Xe Graphics]'
    class      = display
    subclass   = VGA
none0@pci0:0:4:0:   class=0x118000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x9a03 subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = 'TigerLake-LP Dynamic Tuning Processor Participant'
    class      = dasp
pcib1@pci0:0:6:0:   class=0x060400 rev=0x01 hdr=0x01 vendor=0x8086 device=0x9a09 subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = '11th Gen Core Processor PCIe Controller'
    class      = bridge
    subclass   = PCI-PCI
pcib2@pci0:0:7:0:   class=0x060400 rev=0x01 hdr=0x01 vendor=0x8086 device=0x9a23 subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = 'Tiger Lake-LP Thunderbolt 4 PCI Express Root Port'
    class      = bridge
    subclass   = PCI-PCI
pcib3@pci0:0:7:1:   class=0x060400 rev=0x01 hdr=0x01 vendor=0x8086 device=0x9a25 subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = 'Tiger Lake-LP Thunderbolt 4 PCI Express Root Port'
    class      = bridge
    subclass   = PCI-PCI
pcib4@pci0:0:7:2:   class=0x060400 rev=0x01 hdr=0x01 vendor=0x8086 device=0x9a27 subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = 'Tiger Lake-LP Thunderbolt 4 PCI Express Root Port'
    class      = bridge
    subclass   = PCI-PCI
pcib5@pci0:0:7:3:   class=0x060400 rev=0x01 hdr=0x01 vendor=0x8086 device=0x9a29 subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = 'Tiger Lake-LP Thunderbolt 4 PCI Express Root Port'
    class      = bridge
    subclass   = PCI-PCI
none1@pci0:0:8:0:   class=0x088000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x9a11 subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = 'GNA Scoring Accelerator module'
    class      = base peripheral
none2@pci0:0:10:0:  class=0x118000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x9a0d subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = 'Tigerlake Telemetry Aggregator Driver'
    class      = dasp
xhci0@pci0:0:13:0:  class=0x0c0330 rev=0x01 hdr=0x00 vendor=0x8086 device=0x9a13 subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = 'Tiger Lake-LP Thunderbolt 4 USB Controller'
    class      = serial bus
    subclass   = USB
none3@pci0:0:13:2:  class=0x0c0340 rev=0x01 hdr=0x00 vendor=0x8086 device=0x9a1b subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = 'Tiger Lake-LP Thunderbolt 4 NHI'
    class      = serial bus
    subclass   = USB
none4@pci0:0:13:3:  class=0x0c0340 rev=0x01 hdr=0x00 vendor=0x8086 device=0x9a1d subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = 'Tiger Lake-LP Thunderbolt 4 NHI'
    class      = serial bus
    subclass   = USB
none5@pci0:0:18:0:  class=0x070000 rev=0x20 hdr=0x00 vendor=0x8086 device=0xa0fc subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = 'Tiger Lake-LP Integrated Sensor Hub'
    class      = simple comms
    subclass   = UART
xhci1@pci0:0:20:0:  class=0x0c0330 rev=0x20 hdr=0x00 vendor=0x8086 device=0xa0ed subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = 'Tiger Lake-LP USB 3.2 Gen 2x1 xHCI Host Controller'
    class      = serial bus
    subclass   = USB
none6@pci0:0:20:2:  class=0x050000 rev=0x20 hdr=0x00 vendor=0x8086 device=0xa0ef subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = 'Tiger Lake-LP Shared SRAM'
    class      = memory
    subclass   = RAM
ig4iic0@pci0:0:21:0:    class=0x0c8000 rev=0x20 hdr=0x00 vendor=0x8086 device=0xa0e8 subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = 'Tiger Lake-LP Serial IO I2C Controller'
    class      = serial bus
ig4iic1@pci0:0:21:1:    class=0x0c8000 rev=0x20 hdr=0x00 vendor=0x8086 device=0xa0e9 subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = 'Tiger Lake-LP Serial IO I2C Controller'
    class      = serial bus
ig4iic2@pci0:0:21:3:    class=0x0c8000 rev=0x20 hdr=0x00 vendor=0x8086 device=0xa0eb subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = 'Tiger Lake-LP Serial IO I2C Controller'
    class      = serial bus
none7@pci0:0:22:0:  class=0x078000 rev=0x20 hdr=0x00 vendor=0x8086 device=0xa0e0 subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = 'Tiger Lake-LP Management Engine Interface'
    class      = simple comms
pcib6@pci0:0:29:0:  class=0x060400 rev=0x20 hdr=0x01 vendor=0x8086 device=0xa0b1 subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = 'Tiger Lake-LP PCI Express Root Port'
    class      = bridge
    subclass   = PCI-PCI
isab0@pci0:0:31:0:  class=0x060100 rev=0x20 hdr=0x00 vendor=0x8086 device=0xa082 subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = 'Tiger Lake-LP LPC Controller'
    class      = bridge
    subclass   = PCI-ISA
hdac0@pci0:0:31:3:  class=0x040380 rev=0x20 hdr=0x00 vendor=0x8086 device=0xa0c8 subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = 'Tiger Lake-LP Smart Sound Technology Audio Controller'
    class      = multimedia
    subclass   = HDA
ichsmb0@pci0:0:31:4:    class=0x0c0500 rev=0x20 hdr=0x00 vendor=0x8086 device=0xa0a3 subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = 'Tiger Lake-LP SMBus Controller'
    class      = serial bus
    subclass   = SMBus
none8@pci0:0:31:5:  class=0x0c8000 rev=0x20 hdr=0x00 vendor=0x8086 device=0xa0a4 subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = 'Tiger Lake-LP SPI Controller'
    class      = serial bus
nvme0@pci0:1:0:0:   class=0x010802 rev=0x01 hdr=0x00 vendor=0x15b7 device=0x5011 subvendor=0x15b7 subdevice=0x5011
    vendor     = 'Sandisk Corp'
    device     = 'WD PC SN810 / Black SN850 NVMe SSD'
    class      = mass storage
    subclass   = NVM
iwlwifi0@pci0:170:0:0:  class=0x028000 rev=0x1a hdr=0x00 vendor=0x8086 device=0x2725 subvendor=0x8086 subdevice=0x0024
    vendor     = 'Intel Corporation'
    device     = 'Wi-Fi 6 AX210/AX211/AX411 160MHz'
    class      = network

DRM KMOD version 1af4c68be62c22429de556c5aa6e0c8bde584f0c from git

To Reproduce

  1. kldload drm-kmod
  2. sysctl -a
evadot commented 5 months ago

This has been reported by bapt@ and dumbbell@ too but I don't have a recent enough Intel machine to reproduce. I guess this is https://github.com/freebsd/drm-kmod/blob/master/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c#L20 this strncmp which means that we probably have something wrong wrt our kobj somewhere.

emaste commented 5 months ago

That panic is not reproducible with throttle_reason_attrs stubbed out:

#if 0
        if (GRAPHICS_VER(gt->i915) >= 11) {
                ret = sysfs_create_files(kobj, throttle_reason_attrs);
                if (ret)
                        drm_warn(&gt->i915->drm,
                                 "failed to create gt%u throttle sysfs files (%pe)",
                                 gt->info.id, ERR_PTR(ret));
        }
#endif

However I then get a hang when invoking sysctl -a.

emaste commented 5 months ago

sysctl -a also fails on my daily driver laptop. Same kernel/drm-kmod version as above, hardware is:

vgapci0@pci0:0:2:0:     class=0x030000 rev=0x02 hdr=0x00 vendor=0x8086 device=0x3ea0 subvendor=0x17aa subdevice=0x2292
    vendor     = 'Intel Corporation'
    device     = 'WhiskeyLake-U GT2 [UHD Graphics 620]'
    class      = display
    subclass   = VGA

Updated headline to indicate that this is with GENERIC (so INVARIANTS enabled), and I don't recall what the failure was here (panic or hang).

emaste commented 5 months ago

This appears to be resolved with #283