FreeBSDDesktop / kms-drm

the DRM part of the linuxkpi-based KMS
63 stars 26 forks source link

[radeonkms] Random panics after update 4.11.g20181027_1 -> 4.16.g20181215 #130

Open abishai opened 5 years ago

abishai commented 5 years ago

I upgraded ports from october snapshot to current and system begin panic randomly (3 panics already for today). Obviously, all ports changed, but this one looks like drm issue.

vgapci0@pci0:4:0:0:     class=0x030000 card=0x30251043 chip=0x677b1002 rev=0x00 hdr=0x00
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'Caicos PRO [Radeon HD 7450]'
    class      = display
    subclass   = VGA
Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address   = 0x11d8
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff82cc1a94
stack pointer           = 0x28:0xfffffe0090d10040
frame pointer           = 0x28:0xfffffe0090d10070
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 3
current process         = 2005 (Xorg:rcs0)
trap number             = 12
panic: page fault
cpuid = 3
time = 1548497261
KDB: stack backtrace:
#0 0xffffffff80be7977 at kdb_backtrace+0x67
#1 0xffffffff80b9b563 at vpanic+0x1a3
#2 0xffffffff80b9b3b3 at panic+0x43
#3 0xffffffff8107496f at trap_fatal+0x35f
#4 0xffffffff810749c9 at trap_pfault+0x49
#5 0xffffffff81073fee at trap+0x29e
#6 0xffffffff8104f1d5 at calltrap+0x8
#7 0xffffffff82cd99c1 at radeon_sa_bo_new+0x361
#8 0xffffffff82cc8d91 at radeon_ib_get+0x31
#9 0xffffffff82cb41ef at radeon_cs_ioctl+0x25f
#10 0xffffffff82dafcd6 at drm_ioctl_kernel+0xf6
#11 0xffffffff82daff71 at drm_ioctl+0x281
#12 0xffffffff82cbf8ae at radeon_drm_ioctl+0x4e
#13 0xffffffff82e00bd4 at linux_file_ioctl+0x204
#14 0xffffffff80c04f3d at kern_ioctl+0x26d
#15 0xffffffff80c04c5e at sys_ioctl+0x15e
#16 0xffffffff81075449 at amd64_syscall+0x369
#17 0xffffffff8104fabd at fast_syscall_common+0x101
drm-fbsd12.0-kmod-4.16.g20181215   =
drm-kmod-g20181126                 =
libdrm-2.4.96,1                    =

I enabled DEBUG for drm-kmod and waiting for a new one, but maybe this is already a known issue? FreeBSD is 12.0

abishai commented 5 years ago

The same package works on my intel laptop without any panics

BSDer commented 5 years ago

Hi, not sure if this is related, but here I have random panics with the same config (12.0, drm-fbsd12.0-kmod-4.16.g20181215)

# kgdb /usr/lib/debug/boot/kernel/kernel.debug /var/crash/vmcore.0
GNU gdb (GDB) 8.2 [GDB v8.2 for FreeBSD]
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd12.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/lib/debug/boot/kernel/kernel.debug...done.

Unread portion of the kernel message buffer:
panic: BUG ON ret failed at /wrkdirs/usr/ports/graphics/drm-fbsd12.0-kmod/work/kms-drm-71fcc9f/drivers/gpu/drm/ttm/ttm_tt.c:259
cpuid = 1
time = 1549320271
KDB: stack backtrace:
#0 0xffffffff80bf0857 at kdb_backtrace+0x67
#1 0xffffffff80ba47d3 at vpanic+0x1a3
#2 0xffffffff80ba4623 at panic+0x43
#3 0xffffffff829d88e3 at ttm_tt_destroy+0xb3
#4 0xffffffff829dd92a at ttm_bo_cleanup_refs+0x1fa
#5 0xffffffff829dca9a at ttm_bo_delayed_delete+0x1aa
#6 0xffffffff829dcc9a at ttm_bo_delayed_workqueue+0x1a
#7 0xffffffff82a14cd4 at linux_work_fn+0xf4
#8 0xffffffff80c02b14 at taskqueue_run_locked+0x154
#9 0xffffffff80c03ea8 at taskqueue_thread_loop+0x98
#10 0xffffffff80b65453 at fork_exit+0x83
#11 0xffffffff8105a1fe at fork_trampoline+0xe
Uptime: 4m27s
Dumping 3531 MB (6 chunks)
  chunk 0: 1MB (159 pages) ... ok
  chunk 1: 2891MB (739869 pages) 2875 2859 2843 2827 2811 2795 2779 2763 2747 2731 2715 2699 2683 2667 2651 2635 2619 2603 2587 2571 2555 2539 2523 2507 2491 2475 2459 2443 2427 2411 2395 2379 2363 2347 2331 2315 2299 2283 2267 2251 2235 2219 2203 2187 2171 2155 2139 2123 2107 2091 2075 2059 2043 2027 2011 1995 1979 1963 1947 1931 1915 1899 1883 1867 1851 1835 1819 1803 1787 1771 1755 1739 1723 1707 1691 1675 1659 1643 1627 1611 1595 1579 1563 1547 1531 1515 1499 1483 1467 1451 1435 1419 1403 1387 1371 1355 1339 1323 1307 1291 1275 1259 1243 1227 1211 1195 1179 1163 1147 1131 1115 1099 1083 1067 1051 1035 1019 1003 987 971 955 939 923 907 891 875 859 843 827 811 795 779 763 747 731 715 699 683 667 651 635 619 603 587 571 555 539 523 507 491 475 459 443 427 411 395 379 363 347 331 315 299 283 267 251 235 219 203 187 171 155 139 123 107 91 75 59 43 27 11 ... ok
  chunk 2: 1MB (94 pages) ... ok
  chunk 3: 144MB (36864 pages) 129 113 97 81 65 49 33 17 1 ... ok
  chunk 4: 1MB (1 pages) ... ok
  chunk 5: 496MB (126976 pages) 481 465 449 433 417 401 385 369 353 337 321 305 289 273 257 241 225 209 193 177 161 145 129 113 97 81 65 49 33 17

__curthread () at ./machine/pcpu.h:230
230             __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (OFFSETOF_CURTHREAD));
(kgdb) list
225     static __inline __pure2 struct thread *
226     __curthread(void)
227     {
228             struct thread *td;
229
230             __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (OFFSETOF_CURTHREAD));
231             return (td);
232     }
233     #ifdef __clang__
234     #pragma clang diagnostic pop
(kgdb)
grahamperrin commented 4 years ago

fault code = supervisor read data, page not present

https://s.put.re/CQqe7eag.png (linked from https://gitter.im/FreeBSDDesktop/Lobby/archives/2019/06/18), is this the same type of panic?

BSDer commented 4 years ago

Sorry I can't help here anymore, we had to switch to a production-ready Linux distro. Unfortunately FreeBSD still lacks quite a lot in terms of graphical support and many perceive it as a proof-of-concept platform, hence not suitable for production. Good luck.

johalun commented 4 years ago

@BSDer That's unfortunate. The problem is that all of us working on graphics are doing so on our spare time. If corporations would invest and hire people to work on graphics, we'd have solid support in no time. Graphics on FreeBSD just hasn't been a priority yet for corporations but we hope to change that. Intel though is pretty well supported now, amdgpu not so bad, radeonkms is the worst one.

andreasdr commented 4 years ago

Thats very sad to hear :( I started 3D development on FBSD with properitary NVIDIA driver. It worked like a charm. Also GL with AMDGPU is very stable and highly usable. AMDGPU with Vulkan has issues though.

valpackett commented 4 years ago

Vulkan mostly works fine, though I was able to hang the whole system with GTK 3.94's Vulkan backend :D

andreasdr commented 4 years ago

I have also a bug with Vulkan that might be related. When rendering 11000 trees instanced then the system hangs. I will file a bug report for this one later. The code itself is a test of my 3d engine. It does also hang Ubuntu 19.04 but not Windows or MacOSX with MoltenVK.

johalun commented 4 years ago

I started 3D development on FBSD with properitary NVIDIA driver. It worked like a charm.

I tend to forget about that one. Should have written "with regards to open source drivers" (sorry Nvidia) :smiley:

andreasdr commented 4 years ago

It also has no Vulkan :(

BSDer commented 4 years ago

Hi @abishai , is this booting from BIOS or EFI? Thank you.

abishai commented 4 years ago

@BSDer I boot from UEFI with vt console disabled (known issue). When I switched to i3 the panics become rather infrequent. With compton disabled, I believe I've seen no panics at all. I think that acceleration is the cause. The less you use it, the less is probability to trigger one. I miss the old 4.11 drm, it was rock stable :(