freebsd / drm-kmod

drm driver for FreeBSD
155 stars 69 forks source link

INVARIANTS panic in intel_engine_init_cmd_parser #282

Closed emaste closed 8 months ago

emaste commented 8 months ago

Describe the bug panic: node is already on list or was not zeroed immediately upon boot

[drm] Initialized 5 GT workarounds on global
[drm] Initialized 8 engine workarounds on rcs'0
[drm] Initialized 5 whitelist workarounds on rcs'0
[drm] Initialized 14 context workarounds on rcs'0
panic: node is already on list or was not zeroed
cpuid = 7
time = 1706190275
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00dc5a04e0
vpanic() at vpanic+0x132/frame 0xfffffe00dc5a0610
panic() at panic+0x43/frame 0xfffffe00dc5a0670
intel_engine_init_cmd_parser() at intel_engine_init_cmd_parser+0x5de/frame 0xfffffe00dc5a06e0
intel_engines_init() at intel_engines_init+0x374/frame 0xfffffe00dc5a0760
intel_gt_init() at intel_gt_init+0x177/frame 0xfffffe00dc5a0790
i915_gem_init() at i915_gem_init+0x95/frame 0xfffffe00dc5a07d0
i915_driver_probe() at i915_driver_probe+0xeaa/frame 0xfffffe00dc5a0830
i915_pci_probe() at i915_pci_probe+0xa3/frame 0xfffffe00dc5a0890
linux_pci_attach_device() at linux_pci_attach_device+0x474/frame 0xfffffe00dc5a08e0
device_attach() at device_attach+0x3c5/frame 0xfffffe00dc5a0920
device_probe_and_attach() at device_probe_and_attach+0x70/frame 0xfffffe00dc5a0950
bus_generic_driver_added() at bus_generic_driver_added+0x77/frame 0xfffffe00dc5a0970
devclass_driver_added() at devclass_driver_added+0x3f/frame 0xfffffe00dc5a09b0
devclass_add_driver() at devclass_add_driver+0x138/frame 0xfffffe00dc5a09f0
_linux_pci_register_driver() at _linux_pci_register_driver+0xc1/frame 0xfffffe00dc5a0a20
i915kms_evh() at i915kms_evh+0x223/frame 0xfffffe00dc5a0a50
module_register_init() at module_register_init+0xb6/frame 0xfffffe00dc5a0a80
linker_load_module() at linker_load_module+0xc1f/frame 0xfffffe00dc5a0d80
kern_kldload() at kern_kldload+0x16f/frame 0xfffffe00dc5a0dd0
sys_kldload() at sys_kldload+0x5c/frame 0xfffffe00dc5a0e00
amd64_syscall() at amd64_syscall+0x15e/frame 0xfffffe00dc5a0f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00dc5a0f30
--- syscall (304, FreeBSD ELF64, kldload), rip = 0xc6b7244bf7a, rsp = 0xc6b702bf4a8, rbp = 0xc6b702bfa20 ---

FreeBSD version FreeBSD 15.0-CURRENT wipbsd-n267714-217416d818df GENERIC amd64 (This is my WIP branch with a number of changes but should not be related; this is the same kernel as in #280 which worked until sysctl -a)

PCI Info

vgapci0@pci0:0:2:0:     class=0x030000 rev=0x02 hdr=0x00 vendor=0x8086 device=0x3ea0 subvendor=0x17aa subdevice=0x2292
    vendor     = 'Intel Corporation'
    device     = 'WhiskeyLake-U GT2 [UHD Graphics 620]'
    class      = display
    subclass   = VGA

DRM KMOD version From git, 1af4c68be62c22429de556c5aa6e0c8bde584f0c

To Reproduce kldload i915kms

Screenshots N/A

Additional context Note this is GENERIC with INVARIANTS

emaste commented 8 months ago

The panic is from

#define hash_add_rcu(ht, node, key) do {                                \
        struct lkpi_hash_head *__head = &(ht)[hash_min(key, HASH_BITS(ht))]; \
        __hash_node_type_assert(node); \
        KASSERT(((struct lkpi_hash_entry *)(node))->entry.cle_prev == NULL, \
            ("node is already on list or was not zeroed")); \
        CK_LIST_INSERT_HEAD(&__head->head, \
            (struct lkpi_hash_entry *)(node), entry); \
} while (0)

which has been there since f9e90c24737f9

emaste commented 8 months ago

Panic does not occur with https://github.com/emaste/drm-kmod/commit/b6ecd6f88afe3ae026c940290c65163ce21f1d44 applied. I assume this code has just not been run w/ INVARIANTS previously.

evadot commented 8 months ago

Mhm weird, is kmalloc supposed to bzero in linux ?

evadot commented 8 months ago

So having looked at the Linux code it seems that they don't check this, also hash_add seems to use hlist and not the hash rcu ones that we do use.

emaste commented 8 months ago

Linux has kzalloc for a zeroed allocation, kmalloc does not. I suspect that the KASSERT in _hash_addrcu is not valid and should be removed (or, we decide it's actually a valuable check, and we have to modify callers to zero the whole allocation or at least the cle_prev)

emaste commented 8 months ago

https://reviews.freebsd.org/D43645

emaste commented 8 months ago

Fixed by https://github.com/freebsd/freebsd-src/commit/7e77089dccd702eb767350a8bd3d20102c4fb591