freebsd / drm-kmod

drm driver for FreeBSD
155 stars 69 forks source link

Update to Linux 5.13 drivers #224

Closed dumbbell closed 1 year ago

dumbbell commented 1 year ago

This is the backport of the DRM drivers from Linux 5.13.

Progress:

Changes in Linux 5.13

You can read this Phoronix article to learn about the changes in the DRM drivers in Linux 5.13: https://www.phoronix.com/news/Linux-5.13-DRM-Graphics

Patches to linuxkpi

This update depends on the following patches to linuxkpi in FreeBSD:

All patches are merged into main.

How to test

You need to run a recent FreeBSD 14-CURRENT to test it.

Here are some instructions:

  1. You need to checkout the FreeBSD main src branch, and compile a kernel from that branch:

    git clone https://github.com/freebsd/freebsd-src.git
    cd freebsd-src
    make -j8 buildkernel DEBUG_FLAGS=-g
    
    # This installs the kernel under another name, `kernel.drm`. Thus, you keep the default kernel
    # in case of trouble.
    sudo make installkernel DEBUG_FLAGS=-g INSTKERNNAME=kernel.drm
  2. You need to checkout the branch referenced in this pull request and compile it:

    git clone -b update-to-v5.13 https://github.com/dumbbell/drm-kmod.git
    cd drm-kmod
    make -j8 DEBUG_FLAGS=-g
    sudo make install DEBUG_FLAGS=-g KMODDIR=/boot/kernel.drm

    This will need access to the FreeBSD src tree cloned above. I don't remember the name of the variable to point the build to it. You can link /usr/src to your clone and it will be enough.

  3. You will need GPU firmwares in the kernel.drm directory as well. To compile and install them:

    git clone https://github.com/dumbbell/drm-kmod-firmware.git
    cd drm-kmod-firmware
    make -j8 DEBUG_FLAGS=-g OSVERSION=1400000
    sudo make install DEBUG_FLAGS=-g KMODDIR=/boot/kernel.drm OSVERSION=1400000
  4. Load the relevant driver(s) as you usually do.

mekanix commented 1 year ago

I have -CURRENT that I regularly build on threadripper + 7900XT based machine and I would like to help with testing. As linux firmware got recent addition I guess I will need that, too, so I'm wondering if recent firmware is included. Anyway, please let me know how I can help.

dumbbell commented 1 year ago

Thank you @mekanix!

Unfortunately your GPU is way to recent for the driver on FreeBSD. Support for the Radeon 7900 XT was added to Linux 6.0 according to Phoronix:

So what ends up being the upstream open-source Linux driver requirements for the new Radeon RX 7900 series? I'm pleased to say it's Linux 6.0+ and Mesa 22.2+!

I don't have a PCI ID to verify that, but we are about 7 releases of Linux late for this GPU.

Edit: Just to clarify, the firmwares alone are not enough. We need to update the driver to bring support for new GPUs.

mekanix commented 1 year ago

I found and read that article, too, but that is not the minimal version. After research this is what I've found.

As 7900xt driver support Ubuntu 20.04.5 which runs on kernel 5.15, we are 2 not 6 versions away. The same driver supports SLES 15 SP 4 and it runs on kernel 5.14, so we might be only one version away.

I know it is not something that will be done quick but FreeBSD CURRENT remains on that machine and my question remains: how can I help?

dumbbell commented 1 year ago

The backport of all 5.13 patches is finished. The next task at the top of the TODO is to test it. Things like:

I'm using an i915-based laptop with an amdgpu-based external AMD Radeon 6700 XT GPU on a daily basis. I'm currently chasing a slow down compared to 5.12 with the Radeon.

If you have access to supported hardware, you could test this branch with it and see how it goes. Is this something you could help with?

mekanix commented 1 year ago

I have Vega based laptop, so it will take some time to compile and test on it. I'll write here what I've found with amdgpu.

dumbbell commented 1 year ago

I found the issue causing slow down with my Radeon. It was a bug in our implementation of atomic_long_sub() in linuxkpi.

I will submit patches to https://reviews.freebsd.org/ and update this pull request description. Meanwhile, feel free to ue the freebsd-src branch mentionned in the description.

There is another bug that I don't know how to fix yet, but we should rarely hit it. It comes from a sleep we could perform as part of vm_fault():

Sleeping thread (tid 101392, pid 3574) owns a non-sleepable lock
KDB: stack backtrace of thread 101392:
sched_switch() at sched_switch+0x845/frame 0xfffffe016d209630
mi_switch() at mi_switch+0xc2/frame 0xfffffe016d209650
sleepq_timedwait() at sleepq_timedwait+0x2f/frame 0xfffffe016d209690
linux_add_to_sleepqueue() at linux_add_to_sleepqueue+0x85/frame 0xfffffe016d2096e0
linux_schedule_timeout() at linux_schedule_timeout+0x87/frame 0xfffffe016d209720
dma_fence_default_wait() at dma_fence_default_wait+0x16c/frame 0xfffffe016d209780
dma_fence_wait_timeout() at dma_fence_wait_timeout+0x34/frame 0xfffffe016d209790
dma_resv_wait_timeout_rcu() at dma_resv_wait_timeout_rcu+0x1b0/frame 0xfffffe016d2097f0
ttm_bo_wait() at ttm_bo_wait+0x4e/frame 0xfffffe016d209810
amdgpu_bo_move() at amdgpu_bo_move+0x68f/frame 0xfffffe016d2098b0
ttm_bo_handle_move_mem() at ttm_bo_handle_move_mem+0xdf/frame 0xfffffe016d209910
ttm_bo_swapout() at ttm_bo_swapout+0x264/frame 0xfffffe016d2099b0
ttm_device_swapout() at ttm_device_swapout+0xb2/frame 0xfffffe016d209a10
ttm_global_swapout() at ttm_global_swapout+0x78/frame 0xfffffe016d209a50
ttm_tt_populate() at ttm_tt_populate+0x9d/frame 0xfffffe016d209a90
ttm_bo_vm_fault_reserved() at ttm_bo_vm_fault_reserved+0x253/frame 0xfffffe016d209b30
amdgpu_ttm_fault() at amdgpu_ttm_fault+0x4e/frame 0xfffffe016d209b60
linux_cdev_pager_populate() at linux_cdev_pager_populate+0x126/frame 0xfffffe016d209be0
vm_fault_allocate() at vm_fault_allocate+0x315/frame 0xfffffe016d209c50
vm_fault() at vm_fault+0x2e9/frame 0xfffffe016d209d60
vm_fault_trap() at vm_fault_trap+0x6d/frame 0xfffffe016d209db0
trap_pfault() at trap_pfault+0x1f3/frame 0xfffffe016d209e10
trap() at trap+0x440/frame 0xfffffe016d209f30
calltrap() at calltrap+0x8/frame 0xfffffe016d209f30

The call to ttm_global_swapout() from ttm_tt_populate() is new in TTM in Linux 5.13.

I will continue to use this 5.13 driver on a daily basis to see how it goes.

dumbbell commented 1 year ago

The backport of 5.13 is done, I'm marking this pull request as ready for review.

The last patch is from Linux 5.14 to fix a regression with hardware video decoding amdgpu.

I still need to submit linuxkpi patches for review.

gmekicaxcient commented 1 year ago

I get db> prompt every time I startx. As there is nothing in /var/crash, how can I extract more info?

dumbbell commented 1 year ago

I always set debug.debugger_on_panic=0 in /etc/sysctl.conf to skip ddb(4) and get a core dump directly.

But from the debugger, you can type call doadump to get that dump.

I didn't test X.Org yet, only Wayland (using Sway).

dumbbell commented 1 year ago

@gmekicaxcient: I also had to set dumpon_flags="-Z" in /etc/rc.conf so that core dumps are compressed. Otherwise my swap partition was too small.

mekanix commented 1 year ago

I must be doing something wrong. I can't get the dump either with call doadump nor debug.debugger_on_panic=0. I do have dumpon_flags="-Z" and when I call doadump I can see Dump complete = 0, but the only file I have in /var/crash is minidump. What am I doing wrong?

dumbbell commented 1 year ago

Here is a checklist:

  1. Do you have a swap partition configured?
  2. Do you have dumpdev="AUTO" or dumpdev set to an existing swap partition in your /etc/rc.conf (it is set to AUTO by default in /etc/default/rc.conf)
  3. Does dumpon -l returns a device? The appropriate one?

You can trigger a panic manually to verify that you get the expected behavior regardless of the DRM debugging attempts by running sysctl debug.kdb.panic=1.

mekanix commented 1 year ago
# sysrc -n dumpdev
AUTO
# dumpon -l
nvdp3
# gpart show
=>        40  1000215136  nvd0  GPT  (477G)
          40      532480     1  efi  (260M)
      532520        1024     2  freebsd-boot  (512K)
      533544         984        - free -  (492K)
      534528     4194304     3  freebsd-swap  (2.0G)
     4728832   995485696     4  freebsd-zfs  (475G)
  1000214528         648        - free -  (324K)
# swapinfo
Device          1K-blocks     Used    Avail Capacity
/dev/nvd0p3.eli   2097152        0  2097152     0%

Is it a problem that it's a GELI partition?

dumbbell commented 1 year ago

Good question, I never used encrypted partitions.

Can you try to set the late option in your /etc/fstab for the swap partition? See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=198598

mekanix commented 1 year ago

That did it. Thank you for the "class on FreeBSD debugging" :o) Anyway, the files are vmcore.1.zst, info.1 and core.txt.1.

dumbbell commented 1 year ago

To get a meaningful core.txt.*, you need to install gdb from the Ports.

vmcore.1.zst is useless alone: the kernel (and all its modules) that created it is required. That said, the code could contain sensitive information (regardless of the availability of the associated kernel)! If it's a computer you use for something else that FreeBSD testing, you should remote it ASAP!

mekanix commented 1 year ago

Oh, I know what is memory dump security wise, and yes, this laptop is currently just for testing this PR. Thank you for recommendations anyway! I will try to see what I can do with gdb and where it will lead me.

dumbbell commented 1 year ago

After you installed GDB, you could already run:

cd /var/crash
unzstd vmcore.1.zst
kgdb121 /boot/kernel.drm/kernel vmcore.1

And from GDB prompt:

bt

And share the entire output (from the kgdb121 command line to the last line of bt).

JustAnotherHumanBeing commented 1 year ago

As I mentioned in my email message to you, this is what can be seen in the information for the crash:

panic: mutex hrtimer not owned at /root/freebsd-src/sys/kern/kern_mutex.c:204

dumbbell commented 1 year ago

Thank you @JustAnotherHumanBeing for your report. I also saw your email. Could you please share more logs (dmesg) and perhaps a core.txt file after you configure kernel core dumps?

mekanix commented 1 year ago
kgdb121 /boot/kernel/kernel vmcore.0
GNU gdb (GDB) 12.1 [GDB v12.1 for FreeBSD]
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd14.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...

Unread portion of the kernel message buffer:

Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address   = 0x138
fault code      = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff838c7df4
stack pointer           = 0x28:0xfffffe01103b2aa0
frame pointer           = 0x28:0xfffffe01103b2ae0
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 3
current process     = 97963 (Xorg)
rdi: fffff80002107a00 rsi: fffff80002107a00 rdx:                d
rcx:                d  r8:                0  r9:        400000001
rax:                0 rbx: ffffffff813f4c40 rbp: fffffe01103b2ae0
r10:                0 r11:                0 r12:                7
r13: fffffe011fbcc010 r14: fffff80002107a00 r15: fffff80002107100
trap number     = 12
panic: page fault
cpuid = 3
time = 1673886351
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01103b2860
vpanic() at vpanic+0x151/frame 0xfffffe01103b28b0
panic() at panic+0x43/frame 0xfffffe01103b2910
trap_fatal() at trap_fatal+0x409/frame 0xfffffe01103b2970
trap_pfault() at trap_pfault+0xab/frame 0xfffffe01103b29d0
calltrap() at calltrap+0x8/frame 0xfffffe01103b29d0
--- trap 0xc, rip = 0xffffffff838c7df4, rsp = 0xfffffe01103b2aa0, rbp = 0xfffffe01103b2ae0 ---
drm_pci_set_busid() at drm_pci_set_busid+0x144/frame 0xfffffe01103b2ae0
drm_setversion() at drm_setversion+0xe4/frame 0xfffffe01103b2b20
drm_ioctl_kernel() at drm_ioctl_kernel+0xc7/frame 0xfffffe01103b2b70
drm_ioctl() at drm_ioctl+0x2b1/frame 0xfffffe01103b2c60
linux_file_ioctl() at linux_file_ioctl+0x307/frame 0xfffffe01103b2cc0
kern_ioctl() at kern_ioctl+0x202/frame 0xfffffe01103b2d30
sys_ioctl() at sys_ioctl+0x12a/frame 0xfffffe01103b2e00
amd64_syscall() at amd64_syscall+0x12e/frame 0xfffffe01103b2f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe01103b2f30
--- syscall (54, FreeBSD ELF64, ioctl), rip = 0x82974894a, rsp = 0x820fd3708, rbp = 0x820fd3750 ---
Uptime: 4m32s
Dumping 968 out of 15137 MB:..2%..12%..22%..32%..42%..52%..62%..72%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:59
59      __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:59
#1  dump_savectx () at /usr/src/sys/kern/kern_shutdown.c:405
#2  0xffffffff80bee5d5 in dumpsys (di=0x0) at /usr/src/sys/x86/include/dump.h:87
#3  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:434
#4  kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:541
#5  0xffffffff80beea1e in vpanic (fmt=<optimized out>, ap=ap@entry=0xfffffe01103b28f0) at /usr/src/sys/kern/kern_shutdown.c:979
#6  0xffffffff80bee783 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:903
#7  0xffffffff810d2c89 in trap_fatal (frame=0xfffffe01103b29e0, eva=312) at /usr/src/sys/amd64/amd64/trap.c:955
#8  0xffffffff810d2d3b in trap_pfault (frame=0xfffffe01103b29e0, usermode=false, signo=<optimized out>, ucode=<optimized out>)
    at /usr/src/sys/amd64/amd64/trap.c:763
#9  <signal handler called>
#10 0xffffffff838c7df4 in drm_pci_set_busid (dev=dev@entry=0xfffffe011fbcc010, master=master@entry=0xfffff8006ef25c00)
    at /usr/home/meka/repos/drm-kmod/drivers/gpu/drm/drm_pci.c:196
#11 0xffffffff838bfcf4 in drm_set_busid (dev=0xfffffe011fbcc010, file_priv=<optimized out>)
    at /usr/home/meka/repos/drm-kmod/drivers/gpu/drm/drm_ioctl.c:160
#12 drm_setversion (dev=dev@entry=0xfffffe011fbcc010, data=data@entry=0xfffffe01103b2bb0, file_priv=file_priv@entry=0xfffff803885cca00)
    at /usr/home/meka/repos/drm-kmod/drivers/gpu/drm/drm_ioctl.c:414
#13 0xffffffff838bf6b7 in drm_ioctl_kernel (linux_file=linux_file@entry=0xfffff8001830e780,
    func=func@entry=0xffffffff838bfc10 <drm_setversion>, kdata=kdata@entry=0xfffffe01103b2bb0, flags=2)
    at /usr/home/meka/repos/drm-kmod/drivers/gpu/drm/drm_ioctl.c:806
#14 0xffffffff838bfa61 in drm_ioctl (filp=0xfffff80002107a00, cmd=<optimized out>, arg=65536)
    at /usr/home/meka/repos/drm-kmod/drivers/gpu/drm/drm_ioctl.c:913
#15 0xffffffff80e61f97 in linux_file_ioctl_sub (fp=0xfffff801dc5395f0, filp=0xfffff8001830e780, cmd=<optimized out>, data=<optimized out>,
    fop=<optimized out>, td=<optimized out>) at /usr/src/sys/compat/linuxkpi/common/src/linux_compat.c:1124
#16 linux_file_ioctl (fp=0xfffff801dc5395f0, cmd=<optimized out>, data=<optimized out>, cred=<optimized out>, td=<optimized out>)
    at /usr/src/sys/compat/linuxkpi/common/src/linux_compat.c:1748
#17 0xffffffff80c66be2 in fo_ioctl (fp=0xfffff801dc5395f0, com=3222299655, data=0x400000001, active_cred=0xd, td=<optimized out>)
    at /usr/src/sys/sys/file.h:367
#18 kern_ioctl (td=td@entry=0xfffffe010f972740, fd=<optimized out>, com=com@entry=3222299655,
    data=0x400000001 <error: Cannot access memory at address 0x400000001>, data@entry=0xfffffe01103b2d50 "\001")
    at /usr/src/sys/kern/sys_generic.c:807
#19 0xffffffff80c6692a in sys_ioctl (td=0xfffffe010f972740, uap=0xfffffe010f972b38) at /usr/src/sys/kern/sys_generic.c:715
#20 0xffffffff810d363e in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:190
#21 amd64_syscall (td=0xfffffe010f972740, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1200
#22 <signal handler called>
#23 0x000000082974894a in ?? ()
Backtrace stopped: Cannot access memory at address 0x820fd3708
JustAnotherHumanBeing commented 1 year ago

Why are we getting different types of crashes? Your crash is "Fatal trap 12: page fault while in kernel mode" and mine is "panic: mutex hrtimer not owned".

evadot commented 1 year ago

This one should be fixed by https://github.com/freebsd/drm-kmod/commit/729dea5ff1b87cba1168049a1b335f8b3293e829 @dumbbell can you rebase on top of master ?

dumbbell commented 1 year ago

This one should be fixed by 729dea5 @dumbbell can you rebase on top of master ?

Thanks, I rebased the branch and force-pushed.

@gmekicaxcient: This should solve the crash you hit.

dumbbell commented 1 year ago

Why are we getting different types of crashes? Your crash is "Fatal trap 12: page fault while in kernel mode" and mine is "panic: mutex hrtimer not owned".

@JustAnotherHumanBeing: It could be many things, like different GPU or different applications.

JustAnotherHumanBeing commented 1 year ago

panic: mutex hrtimer not owned at /root/freebsd-src/sys/kern/kern_mutex.c:204 cpuid = 17 time = 1673889147 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0116f61c20 vpanic() at vpanic+0x151/frame 0xfffffe0116f61c70 panic() at panic+0x43/frame 0xfffffe0116f61cd0 __mtx_assert() at __mtx_assert+0x9d/frame 0xfffffe0116f61ce0 _callout_stop_safe() at _callout_stop_safe+0x56/frame 0xfffffe0116f61d50 linux_hrtimer_try_to_cancel() at linux_hrtimer_try_to_cancel+0x11/frame 0xfffffe0116f61d60 i915_request_retire() at i915_request_retire+0x4b/frame 0xfffffe0116f61db0 engine_retire() at engine_retire+0xa8/frame 0xfffffe0116f61df0 linux_work_fn() at linux_work_fn+0xe2/frame 0xfffffe0116f61e40 taskqueue_run_locked() at taskqueue_run_locked+0xaa/frame 0xfffffe0116f61ec0 taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe0116f61ef0 fork_exit() at fork_exit+0x80/frame 0xfffffe0116f61f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0116f61f30 --- trap 0x9fae70d5, rip = 0xd222e3cd1413e3ae, rsp = 0xb62e2bc1501fa7a2, rbp = 0x37864858bbe5549e --- Uptime: 1m16s Dumping 2277 out of 65300 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /root/freebsd-src/sys/amd64/include/pcpu_aux.h:59 59 asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 curthread () at /root/freebsd-src/sys/amd64/include/pcpu_aux.h:59

1 dump_savectx () at /root/freebsd-src/sys/kern/kern_shutdown.c:405

2 0xffffffff80bee5d5 in dumpsys (di=0x0)

at /root/freebsd-src/sys/x86/include/dump.h:87

3 doadump (textdump=1) at /root/freebsd-src/sys/kern/kern_shutdown.c:434

4 kern_reboot (howto=260) at /root/freebsd-src/sys/kern/kern_shutdown.c:541

5 0xffffffff80beea1e in vpanic (fmt=,

ap=ap@entry=0xfffffe0116f61cb0)
at /root/freebsd-src/sys/kern/kern_shutdown.c:979

6 0xffffffff80bee783 in panic (fmt=)

at /root/freebsd-src/sys/kern/kern_shutdown.c:903

7 0xffffffff80bc965d in __mtx_assert (c=,

c@entry=<error reading variable: value is not available>, 
what=<unavailable>, 
what@entry=<error reading variable: value is not available>, 
file=<unavailable>, 
file@entry=<error reading variable: value is not available>, 
line=<unavailable>, 
line@entry=<error reading variable: value is not available>)
at /root/freebsd-src/sys/kern/kern_mutex.c:1092

8 0xffffffff80c0dd76 in _callout_stop_safe (c=0xfffffe01bafce598, flags=0,

drain=0x0) at /root/freebsd-src/sys/kern/kern_timeout.c:1111

9 0xffffffff80e67681 in linux_hrtimer_try_to_cancel (

hrtimer=hrtimer@entry=0xfffffe01bafce570)
at /root/freebsd-src/sys/compat/linuxkpi/common/src/linux_hrtimer.c:78

10 0xffffffff8342860b in __rq_cancel_watchdog (rq=0xfffffe01bafce340)

at /root/drm-kmod/drivers/gpu/drm/i915/i915_request.c:377

11 i915_request_retire (rq=0xfffffe01bafce340)

at /root/drm-kmod/drivers/gpu/drm/i915/i915_request.c:392

12 0xffffffff8351cec8 in retire_requests (tl=0xfffff8003c6d8600)

at /root/drm-kmod/drivers/gpu/drm/i915/gt/intel_gt_requests.c:22

13 engine_retire (work=)

at /root/drm-kmod/drivers/gpu/drm/i915/gt/intel_gt_requests.c:78

14 0xffffffff80e765d2 in linux_work_fn (context=0xfffffe01a38fd5d8,

pending=<optimized out>)
at /root/freebsd-src/sys/compat/linuxkpi/common/src/linux_work.c:299

15 0xffffffff80c5356a in taskqueue_run_locked (

queue=queue@entry=0xfffff80001ed2400)
at /root/freebsd-src/sys/kern/subr_taskqueue.c:514

16 0xffffffff80c54612 in taskqueue_thread_loop (

arg=arg@entry=0xfffff80001eb0040)
at /root/freebsd-src/sys/kern/subr_taskqueue.c:826

17 0xffffffff80ba5180 in fork_exit (

callout=0xffffffff80c54550 <taskqueue_thread_loop>, 
arg=0xfffff80001eb0040, frame=0xfffffe0116f61f40)
at /root/freebsd-src/sys/kern/kern_fork.c:1102

18

19 0xd222e3cd1413e3ae in ?? ()

Backtrace stopped: Cannot access memory at address 0xb62e2bc1501fa7a2 (kgdb)

dumbbell commented 1 year ago

@JustAnotherHumanBeing: Thanks! I pushed a fix to the freebsd-src linuxkpi-5.13 branch (or at least, I hope it's fixed as I can't reproduce the problem). You will have to recompile/reinstall the kernel.

JustAnotherHumanBeing commented 1 year ago

Yes, the problem has been fixed. Thanks!

dumbbell commented 1 year ago

I submitted all linuxkpi and lindebugfs patches for review. The links are all listed in the pull request description.

mekanix commented 1 year ago

While Xorg itself works, resume is broken. I can ssh to the machine when this happens so please tell me if I can get any more info out of it. I tested it on CURRENT with drm-kmod from ports and it works, so there's a regression somewhere and I'm not sure I can find it.

evadot commented 1 year ago

Set hw.dri.__drm_debug=0x1FF in /boot/loader.conf and see after resume if there is anything in dmesg. Also test master from the repo (5.12) to check if it works on your machine, it does on all mines (amdgpu and i915kms). If 5.12 does work you could git bisect

mekanix commented 1 year ago

I'm starting the build of 5.12 and I'll see. As this is the laptop it will take ages, hence my following question: can I build on some other machine and use laptop to deploy? I have a threadripper with CURRENT so if I could build on it and test on laptop I could report my findings faster.

dumbbell commented 1 year ago

Yes, you can build both the kernel and drm-kmod on any computer, as long as you then use the correct combination of kernel+drm-kmod.

I tested suspend/resume in 5.13 on my i915 laptop and it worked.

I have a Radeon GPU (amdgpu) in an external Thunderbolt case, with other devices provided by or connected to it (keyboard, mouse, headphone amp, Ethernert adapter). Thunderbolt is unsupported by FreeBSD and I rely on the laptop's firmware to initialize it so FreeBSD detects devices including the GPU. Suspend/resume kind of worked with it too but some devices were missing after resume. The GPU was there though.

mekanix commented 1 year ago

I had to tweak NFS on the desktop to get the make working on the laptop. The idea is to have /usr/src, /usr/obj and /repos/drm shared over NFS and it requires -maproot=root for /usr/obj for make to stop complaining. I can confirm the same behavior with clean build and NFS share.

For 5.12, what do I need to do? I see first commit here being b5dc7ca4a26195ffa95b3c7972fcca4828ed46f1 so in @dumbbell repo the first commit before that one is 2341c8c7087740dd03754c06b273cff31852a745. Is it OK to assume that's 5.12 or there's some other source?

dumbbell commented 1 year ago

drm-mod's master branch is 5.12. The idea is to first test if drivers built from that branch work for you or not.

The update-to-v5.13 branch is based on master. Thus you can bisect between the tip of update-to-v5.13 and master. Don't hesitate if you need more guidance with bisecting.

mekanix commented 1 year ago

I get the following error

make DEBUG_FLAGS=-g
===> dmabuf (all)
Warning: Object directory not changed from original /usr/home/meka/repos/drm/drm-kmod/dmabuf
===> linuxkpi (all)
Warning: Object directory not changed from original /usr/home/meka/repos/drm/drm-kmod/linuxkpi
===> ttm (all)
Warning: Object directory not changed from original /usr/home/meka/repos/drm/drm-kmod/ttm
===> drm (all)
Warning: Object directory not changed from original /usr/home/meka/repos/drm/drm-kmod/drm
/usr/local/bin/ccache cc  -O2 -pipe '-DKBUILD_MODNAME="drm"' '-DLINUXKPI_PARAM_PREFIX=drm_' -DDRM_SYSCTL_PARAM_PREFIX=_dri -DLINUXKPI_VERSION=50000 -DCONFIG_DRM_AMDGPU_CIK -DCONFIG_DRM_AMDGPU_SI -DCONFIG_DRM_AMD_DC -DCONFIG_DRM_AMD_DC_SI -DCONFIG_AMD_PMC -DCONFIG_DRM_I915_FORCE_PROBE='"*"' -DCONFIG_DRM_I915_CAPTURE_ERROR -DCONFIG_DRM_I915_USERFAULT_AUTOSUSPEND=250 -DCONFIG_DRM_I915_STOP_TIMEOUT=100 -DCONFIG_DRM_I915_PREEMPT_TIMEOUT=640 -DCONFIG_DRM_I915_HEARTBEAT_INTERVAL=2500 -DCONFIG_DRM_I915_TIMESLICE_DURATION=1 -DCONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT=8000 -DCONFIG_DRM_I915_FENCE_TIMEOUT=10000 -DCONFIG_DRM_MIPI_DSI -DCONFIG_DRM_PANEL_ORIENTATION_QUIRKS -DCONFIG_DRM_FBDEV_EMULATION -DCONFIG_DRM_FBDEV_OVERALLOC=100 -DCONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG -DCONFIG_BACKLIGHT_CLASS_DEVICE -DCONFIG_DEBUG_FS -DCONFIG_DMI -DCONFIG_FB -DCONFIG_MTRR -DCONFIG_PCI -DCONFIG_PM -DCONFIG_SMP -DCONFIG_ACPI -DCONFIG_ACPI_SLEEP -DCONFIG_X86 -DCONFIG_X86_PAT -DCONFIG_64BIT -DCONFIG_AS_MOVNTDQA -DCONFIG_COMPAT -DCONFIG_X86_64 -DCONFIG_DRM_AMD_DC_DCN -DCONFIG_DRM_AMD_DC_DCN3_0 -DCONFIG_DRM_AMD_DC_DCN3_01 -DCONFIG_DRM_AMD_DC_DCN3_02  -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc  -I/usr/home/meka/repos/drm/drm-kmod/linuxkpi/gplv2/include -I/usr/home/meka/repos/drm/drm-kmod/linuxkpi/bsd/include -I/usr/src/sys/compat/linuxkpi/common/include -I/usr/home/meka/repos/drm/drm-kmod/linuxkpi/dummy/include -I/usr/home/meka/repos/drm/drm-kmod/drivers/gpu/drm -I/usr/home/meka/repos/drm/drm-kmod/include -I/usr/home/meka/repos/drm/drm-kmod/include/drm -I/usr/home/meka/repos/drm/drm-kmod/include/uapi -I/usr/home/meka/repos/drm/drm-kmod/drivers/gpu -include /usr/home/meka/repos/drm/drm-kmod/drm/opt_global.h -I. -I/usr/src/sys -I/usr/src/sys/contrib/ck/include -fno-common -g -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -fdebug-prefix-map=./machine=/usr/src/sys/amd64/include -fdebug-prefix-map=./x86=/usr/src/sys/x86/include -fdebug-prefix-map=./i386=/usr/src/sys/i386/include     -MD  -MF.depend.drm_modes.o -MTdrm_modes.o -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float  -fno-asynchronous-unwind-tables -ffreestanding -fwrapv -fstack-protector -Wno-pointer-sign -Wno-format -Wno-format-zero-length   -mno-aes -mno-avx  -std=iso9899:1999 -c /usr/home/meka/repos/drm/drm-kmod/drivers/gpu/drm/drm_modes.c -o drm_modes.o
/usr/home/meka/repos/drm/drm-kmod/drivers/gpu/drm/drm_modes.c:1323:29: error: incompatible function pointer types passing 'int (void *, struct list_head *, struct list_head *)' to parameter of type 'int (*)(void *, const struct list_head *, const struct list_head *)' [-Werror,-Wincompatible-function-pointer-types]
        list_sort(NULL, mode_list, drm_mode_compare);
                                   ^~~~~~~~~~~~~~~~
/usr/src/sys/compat/linuxkpi/common/include/linux/list.h:507:65: note: passing argument to parameter 'cmp' here
extern void list_sort(void *priv, struct list_head *head, int (*cmp)(void *priv,
                                                                ^
1 error generated.
*** Error code 1

Stop.
make[1]: stopped in /usr/home/meka/repos/drm/drm-kmod/drm
*** Error code 1

Stop.
make: stopped in /usr/home/meka/repos/drm/drm-kmod
mekanix commented 1 year ago

Ignore it, I just forgot to switch /usr/src to main branch. That brings me to the question. If I need to switch sources and drm at the same time, how do I bisect? I mean, bisect alone is easy, but how do I do it on two repos?

dumbbell commented 1 year ago

The linuxkpi-5.13 branch in freebsd-src contains one API breaking changes: the commit named "linuxkpi: list_sort()'s callback now takes list arguments". I don't put a commit hash here because it will change with the future rebases.

When you get a compilation failure around drm_mode_compare as you reported, you just need to switch between this commit and the tip of the linuxkpi-5.13 branch and recompile/reinstall the kernel.

mekanix commented 1 year ago

a7c53d295565fbcdd67076b5c30fabf3fea1fb10 is the first bad commit Last few good and bad commits were based on linuxkpi-5.13 branch so at least I'm certain it's only about drm. Is there something else I can check? Like reverting that commit on top of update-to-v5.13 of drm-kmod?

mekanix commented 1 year ago

By the way, how to check the minimal linux version that supports my hardware if I take ID of the card?

dumbbell commented 1 year ago

Yes, you can revert just this commit and see how it goes.

PCI IDs supported by the amdgpu driver are in drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c. You'll find a huge table with all the PCI IDs:

static const struct pci_device_id pciidlist[] = {
#ifdef  CONFIG_DRM_AMDGPU_SI
    {0x1002, 0x6780, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_TAHITI},
    {0x1002, 0x6784, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_TAHITI},
    {0x1002, 0x6788, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_TAHITI},
...

However, I don't find your (0x744C) in this file, even on the very latest 6.2-rc4 Linux kernel. Perhaps they use a different method for newer hardware? No idea.

evadot commented 1 year ago

Yes, you can revert just this commit and see how it goes.

PCI IDs supported by the amdgpu driver are in drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c. You'll find a huge table with all the PCI IDs:

static const struct pci_device_id pciidlist[] = {
#ifdef  CONFIG_DRM_AMDGPU_SI
  {0x1002, 0x6780, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_TAHITI},
  {0x1002, 0x6784, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_TAHITI},
  {0x1002, 0x6788, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_TAHITI},
...

However, I don't find your (0x744C) in this file, even on the very latest 6.2-rc4 Linux kernel. Perhaps they use a different method for newer hardware? No idea.

Not really surprising, navi31 is really new, looks like the card was released on November 2022 so even for Linux this will take time.

mekanix commented 1 year ago

I can confirm, the code with the commit reverted works. @evadot we strongly suspect 7900xt has to be supported on 5.14

evadot commented 1 year ago

I can confirm, the code with the commit reverted works. @evadot we strongly suspect 7900xt has to be supported on 5.14

No, your card is RDNA3 and this isn't supported in amdgpu in Linux right now. The latest supported for Linux is RDNA2 and for us this is the same but with less devices.

dumbbell commented 1 year ago

a7c53d2 is the first bad commit

I noticed this commit was reverted in Linux 5.14 for other reasons: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=715bfff397634c44d616e27e11c873be1d442977

christian-moerz commented 1 year ago

To add some more testing: I've checked out main line src, applied your patches and did buildworld/buildkernel with DEBUG turned on. Then compiled your repo (commit 38a3a17ac550eb542627a5ea65b1e44a60648104) including firmware modules.

I've run kldload i915kms and kldunload multiple times on a 12th gen Intel successfully. Obviously, graphics is not (yet) recognized, but there don't seem to be any unexpected side effects otherwise.

Is there anything else, I can do test-wise?

dumbbell commented 1 year ago

there don't seem to be any unexpected side effects otherwise

I didn't checked unloading thoroughly. Do you see memory leaks reported by the kernel?

Is there anything else, I can do test-wise?

I pushed an update-to-5.14 branch (+ the corresponding linuxkpi-5.14 branch for freebsd-src) which I didn't create a pull request for yet. The backport is finished since yesterday and I barely tested it. I still need to double-check if I didn't miss any patches. You could test that if you want. I know the driver attaches to my Framework's 12th gen GPU :-) But that's all for now, running Sway failed and I didn't test X.Org.

Update: The pull request is ready, see #226.

nevillehay commented 1 year ago

The backport of all 5.13 patches is finished. The next task at the top of the TODO is to test it. Things like:

* Wayland and X.Org

* suspend/resume

* gaming and other 3D heavier uses

* assisted video decoding

* all drivers: i915kms, amdgpu, radeonkms

I'm using an i915-based laptop with an amdgpu-based external AMD Radeon 6700 XT GPU on a daily basis. I'm currently chasing a slow down compared to 5.12 with the Radeon.

If you have access to supported hardware, you could test this branch with it and see how it goes. Is this something you could help with?

I will try this tomorrow with my RX6600. Thanks for your work!

nevillehay commented 1 year ago

This is the backport of the DRM drivers from Linux 5.13.

Progress:

Changes in Linux 5.13

You can read this Phoronix article to learn about the changes in the DRM drivers in Linux 5.13: https://www.phoronix.com/news/Linux-5.13-DRM-Graphics

Patches to linuxkpi

This update depends on the following patches to linuxkpi in FreeBSD:

* ~https://reviews.freebsd.org/D37909~

* ~https://reviews.freebsd.org/D37910~

* ~https://reviews.freebsd.org/D37911~

* ~https://reviews.freebsd.org/D37912~

* ~https://reviews.freebsd.org/D37913~

* ~https://reviews.freebsd.org/D37914~

* ~https://reviews.freebsd.org/D37915~

* ~https://reviews.freebsd.org/D37916~

* https://reviews.freebsd.org/D37932

* ~https://reviews.freebsd.org/D37933~

* ~https://reviews.freebsd.org/D37935~

* ~https://reviews.freebsd.org/D38077~

* ~https://reviews.freebsd.org/D38078~

* ~https://reviews.freebsd.org/D38079~

* ~https://reviews.freebsd.org/D38080~

* ~https://reviews.freebsd.org/D38081~

* https://reviews.freebsd.org/D38082

* ~https://reviews.freebsd.org/D38083~

* ~https://reviews.freebsd.org/D38084~

* ~https://reviews.freebsd.org/D38085~

* ~https://reviews.freebsd.org/D38086~

* ~https://reviews.freebsd.org/D38087~

* ~https://reviews.freebsd.org/D38088~

* ~https://reviews.freebsd.org/D38089~

* ~https://reviews.freebsd.org/D38090~

These patches are maintained in the following repository and branch: https://github.com/dumbbell/freebsd-src/tree/linuxkpi-5.13

How to test

You need to run a recent FreeBSD 14-CURRENT to test it.

Here are some instructions:

1. You need to checkout the FreeBSD src branch I mentionned, [`linuxkpi-5.13`](https://github.com/dumbbell/freebsd-src/tree/linuxkpi-5.13), and compile a kernel from that branch:
   ```shell
   git clone -b linuxkpi-5.13 https://github.com/dumbbell/freebsd-src.git
   cd freebsd-src
   make -j8 buildkernel DEBUG_FLAGS=-g

   # This installs the kernel under another name, `kernel.drm`. Thus, you keep the default kernel
   # in case of trouble.
   sudo make installkernel DEBUG_FLAGS=-g INSTKERNNAME=kernel.drm
   ```

2. You need to checkout the branch referenced in this pull request and compile it:
   ```
   git clone -b update-to-v5.13 https://github.com/dumbbell/drm-kmod.git
   cd drm-kmod
   make -j8 DEBUG_FLAGS=-g
   sudo make install DEBUG_FLAGS=-g KMODDIR=/boot/kernel.drm
   ```

   This will need access to the FreeBSD src tree cloned above. I don't remember the name of the variable to point the build to it. You can link `/usr/src` to your clone and it will be enough.

3. You will need GPU firmwares in the `kernel.drm` directory as well. To compile and install them:
   ```
   git clone https://github.com/dumbbell/drm-kmod-firmware.git
   cd drm-kmod-firmware
   make -j8 DEBUG_FLAGS=-g OSVERSION=1400000
   sudo make install DEBUG_FLAGS=-g KMODDIR=/boot/kernel.drm OSVERSION=1400000
   ```

4. Load the relevant driver(s) as you usually do.

The link to usr/src is

export SYSDIR=/path/to/src/sys (the sys is important)

nevillehay commented 1 year ago

I don't know what I'm doing wrong. All the builds went well and when I kldload amdgpu my system freezes completely. I have an RX 6600. Any way to debug a system freeze?