freebsd / drm-kmod

drm driver for FreeBSD
148 stars 68 forks source link

Uncaught kernel crash (likely FP-related) #228

Closed ngortheone closed 2 months ago

ngortheone commented 1 year ago

FreeBSD 14-Current most recent main branch build. CPU Ryzen 9 5950x GPU: AMD Radeon Pro W5700 (navi10), amdgpu kernel module WM: Sway, wayland, DPI scaling 2 (4k screen) DRM: 5.10.163

Running a slightly buggy vulkan application results in a crash and reboot. I was unable to get the stacktrace, even with the following sysctls set the the computer hangs for a second or two and then reboots.

debug.debugger_on_trap: 1
debug.debugger_on_recursive_panic: 1
debug.debugger_on_panic: 1

The way to reproduce: In project https://github.com/Yatekii/imgui-wgpu-rs run cargo run --example hello-world (rust is needed)

The crash happens on certain desktop configurations (DPI scale, window sizes, layout..)

I strongly suspect this bug on Linux results in a crash on FreeBSD. The behavior exactly matches this bug description, with DPI scaling factor set to 1 or with some window configurations the crash does not happen (or happens less deterministically). DPI scaling factor changes the amount of "visible" pixels to the applications, and in certain cases FP operations do not fail....

(but I have no hard evidence of that, it may be something else)

Please advice on further debugging steps

P.S. I haven't tested this on X (and I don't have a good setup to do so), and I am not sure if it is relevant...


Details:

Standard generic kernel with debugging enabled

~ » uname -a
FreeBSD zen.hq 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n260536-81471049650f: Fri Feb  3 17:43:47 PST 2023     root@zen.hq:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64

my src.conf

WITHOUT_FREEBSD_UPDATE=yes
WITHOUT_NTP=yes
WITHOUT_PORTSNAP=yes
WITHOUT_SENDMAIL=yes
WITHOUT_MAILWRAPPER=yes
WITHOUT_TCP_WRAPPERS=yes
WITH_BIND_NOW=yes
WITH_RETPOLINE=yes
WITH_KERNEL_RETPOLINE=yes

sysctl.conf

kern.elf32.aslr.enable=1
kern.elf32.aslr.honor_sbrk=0
kern.elf32.aslr.pie_enable=1
kern.elf32.allow_wx=0
kern.elf64.aslr.enable=1
kern.elf64.aslr.honor_sbrk=0
kern.elf64.aslr.pie_enable=1
kern.elf64.allow_wx=0
kern.ipc.shm_use_phys=1
kern.msgbuf_show_timestamp=1
kern.randompid=1
kern.random.fortuna.minpoolsize=128

hw.kbd.keymap_restrict_change=4
security.bsd.hardlink_check_gid=1
security.bsd.hardlink_check_uid=1
security.bsd.see_jail_proc=0
security.bsd.see_other_gids=0
security.bsd.see_other_uids=0
security.bsd.stack_guard_page=1
security.bsd.unprivileged_proc_debug=0
security.bsd.unprivileged_read_msgbuf=0

Packages

# pkg info | grep drm
drm-510-kmod-5.10.163          DRM drivers modules
libdrm-2.4.114,1               Userspace interface to kernel Direct Rendering Module services
 # kldstat
Id Refs Address                Size Name
 1  131 0xffffffff80200000  215b5d0 kernel
 2    1 0xffffffff8235c000     7c90 cryptodev.ko
 4    1 0xffffffff82369000   6c4948 zfs.ko
 5    1 0xffffffff82a2e000     a478 cuse.ko
 6    1 0xffffffff83400000   412990 amdgpu.ko
 7    2 0xffffffff83210000    72d78 drm.ko
 8    1 0xffffffff83283000     22a8 iic.ko
 9    3 0xffffffff83286000     30d8 linuxkpi_gplv2.ko
10    4 0xffffffff8328a000     6320 dmabuf.ko
11    1 0xffffffff83291000     c768 ttm.ko
12    1 0xffffffff8329e000    2f198 amdgpu_navi10_sos_bin.ko
13    1 0xffffffff832ce000    292d8 amdgpu_navi10_asd_bin.ko
14    1 0xffffffff832f8000     a3d8 amdgpu_navi10_ta_bin.ko
15    1 0xffffffff83303000    437a0 amdgpu_navi10_smc_bin.ko
16    1 0xffffffff83347000    425d8 amdgpu_navi10_pfp_bin.ko
17    1 0xffffffff8338a000    425d8 amdgpu_navi10_me_bin.ko
18    1 0xffffffff83813000    42558 amdgpu_navi10_ce_bin.ko
19    1 0xffffffff833cd000     cc88 amdgpu_navi10_rlc_bin.ko
20    1 0xffffffff83856000    43a08 amdgpu_navi10_mec_bin.ko
21    1 0xffffffff8389a000    43a08 amdgpu_navi10_mec2_bin.ko
22    1 0xffffffff833da000     a4d8 amdgpu_navi10_sdma_bin.ko
23    1 0xffffffff833e5000     a4d8 amdgpu_navi10_sdma1_bin.ko
24    1 0xffffffff838de000    63f18 amdgpu_navi10_vcn_bin.ko
25    1 0xffffffff833f0000     3370 acpi_wmi.ko
26    1 0xffffffff83942000    86190 if_iwlwifi.ko
27    1 0xffffffff833f4000     3210 intpm.ko
28    1 0xffffffff833f8000     2178 smbus.ko
29    1 0xffffffff833fb000     3360 uhid.ko
30    1 0xffffffff839c9000     33a0 usbhid.ko
31    5 0xffffffff839cd000     31f0 hidbus.ko
32    1 0xffffffff839d1000     3340 wmt.ko
33    1 0xffffffff839d5000     d5ac snd_uaudio.ko
34    1 0xffffffff839e3000     21e0 hms.ko
35    3 0xffffffff839e6000     40a8 hidmap.ko
36    1 0xffffffff839eb000     21e0 hcons.ko
37    1 0xffffffff839ee000     21e0 hsctrl.ko

pciconf

oot@zen:~ # pciconf -lv
hostb0@pci0:0:0:0:  class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1480 subvendor=0x1022 subdevice=0x1480
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Starship/Matisse Root Complex'
    class      = bridge
    subclass   = HOST-PCI
none0@pci0:0:0:2:   class=0x080600 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1481 subvendor=0x1022 subdevice=0x1481
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Starship/Matisse IOMMU'
    class      = base peripheral
    subclass   = IOMMU
hostb1@pci0:0:1:0:  class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1482 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Starship/Matisse PCIe Dummy Host Bridge'
    class      = bridge
    subclass   = HOST-PCI
pcib1@pci0:0:1:1:   class=0x060400 rev=0x00 hdr=0x01 vendor=0x1022 device=0x1483 subvendor=0x1022 subdevice=0x1453
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Starship/Matisse GPP Bridge'
    class      = bridge
    subclass   = PCI-PCI
pcib2@pci0:0:1:2:   class=0x060400 rev=0x00 hdr=0x01 vendor=0x1022 device=0x1483 subvendor=0x1022 subdevice=0x1453
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Starship/Matisse GPP Bridge'
    class      = bridge
    subclass   = PCI-PCI
hostb2@pci0:0:2:0:  class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1482 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Starship/Matisse PCIe Dummy Host Bridge'
    class      = bridge
    subclass   = HOST-PCI
hostb3@pci0:0:3:0:  class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1482 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Starship/Matisse PCIe Dummy Host Bridge'
    class      = bridge
    subclass   = HOST-PCI
pcib6@pci0:0:3:1:   class=0x060400 rev=0x00 hdr=0x01 vendor=0x1022 device=0x1483 subvendor=0x1022 subdevice=0x1453
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Starship/Matisse GPP Bridge'
    class      = bridge
    subclass   = PCI-PCI
hostb4@pci0:0:4:0:  class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1482 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Starship/Matisse PCIe Dummy Host Bridge'
    class      = bridge
    subclass   = HOST-PCI
hostb5@pci0:0:5:0:  class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1482 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Starship/Matisse PCIe Dummy Host Bridge'
    class      = bridge
    subclass   = HOST-PCI
hostb6@pci0:0:7:0:  class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1482 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Starship/Matisse PCIe Dummy Host Bridge'
    class      = bridge
    subclass   = HOST-PCI
pcib9@pci0:0:7:1:   class=0x060400 rev=0x00 hdr=0x01 vendor=0x1022 device=0x1484 subvendor=0x1022 subdevice=0x1484
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]'
    class      = bridge
    subclass   = PCI-PCI
hostb7@pci0:0:8:0:  class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1482 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Starship/Matisse PCIe Dummy Host Bridge'
    class      = bridge
    subclass   = HOST-PCI
pcib10@pci0:0:8:1:  class=0x060400 rev=0x00 hdr=0x01 vendor=0x1022 device=0x1484 subvendor=0x1022 subdevice=0x1484
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]'
    class      = bridge
    subclass   = PCI-PCI
intsmb0@pci0:0:20:0:    class=0x0c0500 rev=0x61 hdr=0x00 vendor=0x1022 device=0x790b subvendor=0x1022 subdevice=0x790b
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'FCH SMBus Controller'
    class      = serial bus
    subclass   = SMBus
isab0@pci0:0:20:3:  class=0x060100 rev=0x51 hdr=0x00 vendor=0x1022 device=0x790e subvendor=0x1022 subdevice=0x790e
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'FCH LPC Bridge'
    class      = bridge
    subclass   = PCI-ISA
hostb8@pci0:0:24:0: class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1440 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Matisse/Vermeer Data Fabric: Device 18h; Function 0'
    class      = bridge
    subclass   = HOST-PCI
hostb9@pci0:0:24:1: class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1441 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Matisse/Vermeer Data Fabric: Device 18h; Function 1'
    class      = bridge
    subclass   = HOST-PCI
hostb10@pci0:0:24:2:    class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1442 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Matisse/Vermeer Data Fabric: Device 18h; Function 2'
    class      = bridge
    subclass   = HOST-PCI
hostb11@pci0:0:24:3:    class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1443 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Matisse/Vermeer Data Fabric: Device 18h; Function 3'
    class      = bridge
    subclass   = HOST-PCI
hostb12@pci0:0:24:4:    class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1444 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Matisse/Vermeer Data Fabric: Device 18h; Function 4'
    class      = bridge
    subclass   = HOST-PCI
hostb13@pci0:0:24:5:    class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1445 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Matisse/Vermeer Data Fabric: Device 18h; Function 5'
    class      = bridge
    subclass   = HOST-PCI
hostb14@pci0:0:24:6:    class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1446 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Matisse/Vermeer Data Fabric: Device 18h; Function 6'
    class      = bridge
    subclass   = HOST-PCI
hostb15@pci0:0:24:7:    class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1447 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Matisse/Vermeer Data Fabric: Device 18h; Function 7'
    class      = bridge
    subclass   = HOST-PCI
nvme0@pci0:1:0:0:   class=0x010802 rev=0x00 hdr=0x00 vendor=0x144d device=0xa80a subvendor=0x144d subdevice=0xa801
    vendor     = 'Samsung Electronics Co Ltd'
    device     = 'NVMe SSD Controller PM9A1/PM9A3/980PRO'
    class      = mass storage
    subclass   = NVM
xhci0@pci0:2:0:0:   class=0x0c0330 rev=0x00 hdr=0x00 vendor=0x1022 device=0x43ee subvendor=0x1b21 subdevice=0x1142
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = '500 Series Chipset USB 3.1 XHCI Controller'
    class      = serial bus
    subclass   = USB
ahci0@pci0:2:0:1:   class=0x010601 rev=0x00 hdr=0x00 vendor=0x1022 device=0x43eb subvendor=0x1b21 subdevice=0x1062
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = '500 Series Chipset SATA Controller'
    class      = mass storage
    subclass   = SATA
pcib3@pci0:2:0:2:   class=0x060400 rev=0x00 hdr=0x01 vendor=0x1022 device=0x43e9 subvendor=0x1b21 subdevice=0x0201
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = '500 Series Chipset Switch Upstream Port'
    class      = bridge
    subclass   = PCI-PCI
pcib4@pci0:3:7:0:   class=0x060400 rev=0x00 hdr=0x01 vendor=0x1022 device=0x43ea subvendor=0x1b21 subdevice=0x3308
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    class      = bridge
    subclass   = PCI-PCI
pcib5@pci0:3:9:0:   class=0x060400 rev=0x00 hdr=0x01 vendor=0x1022 device=0x43ea subvendor=0x1b21 subdevice=0x3308
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    class      = bridge
    subclass   = PCI-PCI
iwlwifi0@pci0:4:0:0:    class=0x028000 rev=0x1a hdr=0x00 vendor=0x8086 device=0x2723 subvendor=0x8086 subdevice=0x0084
    vendor     = 'Intel Corporation'
    device     = 'Wi-Fi 6 AX200'
    class      = network
re0@pci0:42:0:0:    class=0x020000 rev=0x15 hdr=0x00 vendor=0x10ec device=0x8168 subvendor=0x1462 subdevice=0x7c56
    vendor     = 'Realtek Semiconductor Co., Ltd.'
    device     = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller'
    class      = network
    subclass   = ethernet
pcib7@pci0:43:0:0:  class=0x060400 rev=0x00 hdr=0x01 vendor=0x1002 device=0x1478 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'Navi 10 XL Upstream Port of PCI Express Switch'
    class      = bridge
    subclass   = PCI-PCI
pcib8@pci0:44:0:0:  class=0x060400 rev=0x00 hdr=0x01 vendor=0x1002 device=0x1479 subvendor=0x1002 subdevice=0x1479
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'Navi 10 XL Downstream Port of PCI Express Switch'
    class      = bridge
    subclass   = PCI-PCI
vgapci0@pci0:45:0:0:    class=0x030000 rev=0x00 hdr=0x00 vendor=0x1002 device=0x7312 subvendor=0x1002 subdevice=0x031e
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'Navi 10 [Radeon Pro W5700]'
    class      = display
    subclass   = VGA
hdac0@pci0:45:0:1:  class=0x040300 rev=0x00 hdr=0x00 vendor=0x1002 device=0xab38 subvendor=0x1002 subdevice=0xab38
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'Navi 10 HDMI Audio'
    class      = multimedia
    subclass   = HDA
xhci1@pci0:45:0:2:  class=0x0c0330 rev=0x00 hdr=0x00 vendor=0x1002 device=0x7316 subvendor=0x1002 subdevice=0x7316
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    class      = serial bus
    subclass   = USB
none1@pci0:45:0:3:  class=0x0c8000 rev=0x00 hdr=0x00 vendor=0x1002 device=0x7314 subvendor=0x1002 subdevice=0x0408
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'Navi 10 USB'
    class      = serial bus
none2@pci0:46:0:0:  class=0x130000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x148a subvendor=0x1022 subdevice=0x148a
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Starship/Matisse PCIe Dummy Function'
    class      = non-essential instrumentation
none3@pci0:47:0:0:  class=0x130000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1485 subvendor=0x1022 subdevice=0x1485
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Starship/Matisse Reserved SPP'
    class      = non-essential instrumentation
none4@pci0:47:0:1:  class=0x108000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1486 subvendor=0x1022 subdevice=0x1486
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Starship/Matisse Cryptographic Coprocessor PSPCPP'
    class      = encrypt/decrypt
xhci2@pci0:47:0:3:  class=0x0c0330 rev=0x00 hdr=0x00 vendor=0x1022 device=0x149c subvendor=0x1022 subdevice=0x148c
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Matisse USB 3.0 Host Controller'
    class      = serial bus
    subclass   = USB
hdac1@pci0:47:0:4:  class=0x040300 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1487 subvendor=0x1462 subdevice=0x9c56
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Starship/Matisse HD Audio Controller'
    class      = multimedia
    subclass   = HDA
ngortheone commented 1 year ago

I can rule out mysysctl.conf - crash happens with empty sysctl.conf

ngortheone commented 1 year ago

It looks like update to 5.13 fixes this, so far I didn't have a single crash

evadot commented 1 year ago

It looks like update to 5.13 fixes this, so far I didn't have a single crash

Can you test if there is no regression between 5.15 and the drm-515-kmod port ? Thanks.

ngortheone commented 10 months ago

@evadot I've been running 5.15 for several months now and I didn't have a single crash.

evadot commented 2 months ago

Closing as it's fixed