Closed cperciva closed 5 years ago
Is this a regression?
I can neither confirm nor deny. This is the first time I've tried drm-next on this hardware.
Disabling hangcheck by setting i915.enable_hangcheck=0
in i915_hangcheck_elapsed
fixes the panic, with no apparent adverse effects; so maybe there's actually two bugs here:
Resetting the GPU driver doesn't work.
The hangcheck firing when the GPU didn't actually hang.
This exactly happens on my i7-5500U (Broadwell) laptop with the latest drm-next as of this writing. Drops me into ddb after a successful modeset from the UEFI framebuffer.
@cperciva , where exactly did you pass that parameter? Certainly not in loader.conf
or kenv
as loading the module still results in the same panic, and I'm not sure where you got the i915_hangcheck_elapsed
from besides the printed output when module loading panics.
@vishwin That wasn't a parameter, that was me adding a line of code and recompiling.
@cperciva got it. For those else who may be searching and wondering, change the variable in i915_params.c
.
Well, this is weird. This panic and the hang in #165 went away when I switched from building a kernel from the drm-next branch to building a kernel from HEAD and building drm-next-kmod from the ports tree. And I can't see anything at all in the tree which would explain this.
So, uhh... good work guys?
Ok, I figured out why this problem comes and goes. On my laptop, with the code in the tree, I consistently get this panic when I load i915kms if the laptop isn't plugged in. Running on AC power, no panic.
0.o
I get panics randomly when I am on AC as well, but definitely consistent panics (so far) on battery.
What is your hardware? I haven't hit any issues myself, trying to figure out what it corresponds to.
ThinkPad W550s, Intel i7-5500U (Broadwell) with a headless (Optimus) Nvidia Quadro. The Nvidia card is not used at all, nor is the driver for such loaded.
@vishwin Can you paste the output of running 'pciconf -lbVc' as root?
hostb0@pci0:0:0:0: class=0x060000 card=0x222317aa chip=0x16048086 rev=0x09 hdr=0x00
cap 09[e0] = vendor (length 12) Intel cap 0 version 1
vgapci0@pci0:0:2:0: class=0x030000 card=0x222517aa chip=0x16168086 rev=0x09 hdr=0x00
bar [10] = type Memory, range 64, base 0xf2000000, size 16777216, enabled
bar [18] = type Prefetchable Memory, range 64, base 0xc0000000, size 536870912, enabled
bar [20] = type I/O Port, range 32, base 0x4000, size 64, enabled
cap 05[90] = MSI supports 1 message
cap 01[d0] = powerspec 2 supports D0 D3 current D0
cap 13[a4] = PCI Advanced Features: FLR TP
hdac0@pci0:0:3:0: class=0x040300 card=0x222317aa chip=0x160c8086 rev=0x09 hdr=0x00
bar [10] = type Memory, range 64, base 0xf4230000, size 16384, enabled
cap 01[50] = powerspec 2 supports D0 D3 current D0
cap 05[60] = MSI supports 1 message enabled with 1 message
cap 10[70] = PCI-Express 1 root endpoint max data 128(128) FLR NS
xhci0@pci0:0:20:0: class=0x0c0330 card=0x222317aa chip=0x9cb18086 rev=0x03 hdr=0x00
bar [10] = type Memory, range 64, base 0xf4220000, size 65536, enabled
cap 01[70] = powerspec 2 supports D0 D3 current D0
cap 05[80] = MSI supports 8 messages, 64 bit enabled with 1 message
none0@pci0:0:22:0: class=0x078000 card=0x222317aa chip=0x9cba8086 rev=0x03 hdr=0x00
bar [10] = type Memory, range 64, base 0xf4239000, size 32, enabled
cap 01[50] = powerspec 3 supports D0 D3 current D0
cap 05[8c] = MSI supports 1 message, 64 bit
em0@pci0:0:25:0: class=0x020000 card=0x222617aa chip=0x15a38086 rev=0x03 hdr=0x00
bar [10] = type Memory, range 32, base 0xf4200000, size 131072, enabled
bar [14] = type Memory, range 32, base 0xf423e000, size 4096, enabled
bar [18] = type I/O Port, range 32, base 0x4080, size 32, enabled
cap 01[c8] = powerspec 2 supports D0 D3 current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 13[e0] = PCI Advanced Features: FLR TP
hdac1@pci0:0:27:0: class=0x040300 card=0x222317aa chip=0x9ca08086 rev=0x03 hdr=0x00
bar [10] = type Memory, range 64, base 0xf4234000, size 16384, enabled
cap 01[50] = powerspec 3 supports D0 D3 current D0
cap 05[60] = MSI supports 1 message, 64 bit enabled with 1 message
pcib1@pci0:0:28:0: class=0x060400 card=0x222317aa chip=0x9c9a8086 rev=0xe3 hdr=0x01
cap 10[40] = PCI-Express 2 root port max data 128(128)
link x1(x1) speed 2.5(5.0) ASPM L0s/L1(L0s/L1)
slot 5 power limit 100 mW
cap 05[80] = MSI supports 1 message
cap 0d[90] = PCI Bridge card=0x222317aa
cap 01[a0] = powerspec 3 supports D0 D3 current D0
ecap 0000[100] = unknown 0
ecap 001e[200] = unknown 1
pcib2@pci0:0:28:1: class=0x060400 card=0x222317aa chip=0x9c948086 rev=0xe3 hdr=0x01
cap 10[40] = PCI-Express 2 root port max data 128(128)
link x1(x1) speed 2.5(5.0) ASPM L1(L0s/L1)
slot 2 power limit 100 mW
cap 05[80] = MSI supports 1 message
cap 0d[90] = PCI Bridge card=0x222317aa
cap 01[a0] = powerspec 3 supports D0 D3 current D0
ecap 0000[100] = unknown 0
ecap 001e[200] = unknown 1
pcib3@pci0:0:28:4: class=0x060400 card=0x222317aa chip=0x9c988086 rev=0xe3 hdr=0x01
cap 10[40] = PCI-Express 2 root port max data 128(128)
link x4(x4) speed 5.0(5.0) ASPM L0s/L1(L0s/L1)
slot 4 power limit 250 mW
cap 05[80] = MSI supports 1 message
cap 0d[90] = PCI Bridge card=0x222317aa
cap 01[a0] = powerspec 3 supports D0 D3 current D0
ecap 0000[100] = unknown 0
ecap 001e[200] = unknown 1
ehci0@pci0:0:29:0: class=0x0c0320 card=0x222317aa chip=0x9ca68086 rev=0x03 hdr=0x00
bar [10] = type Memory, range 32, base 0xf423d000, size 1024, enabled
cap 01[50] = powerspec 3 supports D0 D3 current D0
cap 0a[58] = EHCI Debug Port at offset 0xa0 in map 0x14
cap 13[98] = PCI Advanced Features: FLR TP
isab0@pci0:0:31:0: class=0x060100 card=0x222317aa chip=0x9cc38086 rev=0x03 hdr=0x00
cap 09[e0] = vendor (length 12) Intel cap 1 version 0
features: AMT, 4 PCI-e x1 slots
ahci0@pci0:0:31:2: class=0x010601 card=0x222317aa chip=0x9c838086 rev=0x03 hdr=0x00
bar [10] = type I/O Port, range 32, base 0x40a8, size 8, enabled
bar [14] = type I/O Port, range 32, base 0x40b4, size 4, enabled
bar [18] = type I/O Port, range 32, base 0x40a0, size 8, enabled
bar [1c] = type I/O Port, range 32, base 0x40b0, size 4, enabled
bar [20] = type I/O Port, range 32, base 0x4060, size 32, enabled
bar [24] = type Memory, range 32, base 0xf423c000, size 2048, enabled
cap 05[80] = MSI supports 1 message enabled with 1 message
cap 01[70] = powerspec 3 supports D0 D3 current D0
cap 12[a8] = SATA Index-Data Pair
none1@pci0:0:31:3: class=0x0c0500 card=0x222317aa chip=0x9ca28086 rev=0x03 hdr=0x00
bar [10] = type Memory, range 64, base 0xf4238000, size 256, enabled
bar [20] = type I/O Port, range 32, base 0xefa0, size 32, enabled
none2@pci0:0:31:6: class=0x118000 card=0x222317aa chip=0x9ca48086 rev=0x03 hdr=0x00
bar [10] = type Memory, range 64, base 0xf423b000, size 4096, enabled
cap 01[50] = powerspec 3 supports D0 D3 current D0
cap 05[80] = MSI supports 1 message
none3@pci0:2:0:0: class=0xff0000 card=0x222317aa chip=0x522710ec rev=0x01 hdr=0x00
bar [10] = type Memory, range 32, base 0xf4100000, size 4096, enabled
cap 01[40] = powerspec 3 supports D0 D1 D2 D3 current D0
cap 05[50] = MSI supports 1 message, 64 bit
cap 10[70] = PCI-Express 2 endpoint max data 128(128) RO
link x1(x1) speed 2.5(2.5) ASPM L0s/L1(L0s/L1)
ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected
ecap 0003[140] = Serial 1 00000001004ce000
ecap 0018[150] = LTR 1
ecap 001e[158] = unknown 1
iwm0@pci0:3:0:0: class=0x028000 card=0x52108086 chip=0x095b8086 rev=0x59 hdr=0x00
bar [10] = type Memory, range 64, base 0xf4000000, size 8192, enabled
cap 01[c8] = powerspec 3 supports D0 D3 current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[40] = PCI-Express 2 endpoint max data 128(128) FLR RO NS
link x1(x1) speed 2.5(2.5) ASPM L1(L1)
ecap 0001[100] = AER 1 0 fatal 0 non-fatal 0 corrected
ecap 0003[140] = Serial 1 340286ffff030d90
ecap 0018[14c] = LTR 1
ecap 001e[154] = unknown 1
vgapci1@pci0:8:0:0: class=0x030200 card=0x222517aa chip=0x137a10de rev=0xa2 hdr=0x00
bar [10] = type Memory, range 32, base 0xf3000000, size 16777216, enabled
bar [14] = type Prefetchable Memory, range 64, base 0xe0000000, size 268435456, enabled
bar [1c] = type Prefetchable Memory, range 64, base 0xf0000000, size 33554432, enabled
bar [24] = type I/O Port, range 32, base 0x3000, size 128, enabled
cap 01[60] = powerspec 3 supports D0 D3 current D0
cap 05[68] = MSI supports 1 message, 64 bit
cap 10[78] = PCI-Express 2 endpoint max data 128(256) RO NS
link x4(x4) speed 5.0(8.0) ASPM L0s/L1(L0s/L1)
ecap 0002[100] = VC 1 max VC0
ecap 0018[250] = LTR 1
ecap 001e[258] = unknown 1
ecap 0004[128] = Power Budgeting 1
ecap 000b[600] = Vendor 1 ID 1
ecap 0019[900] = PCIe Sec 1 lane errors 0
For reference, this is what the panic looks like now (as of the latest drm-next-kmod
in ports):
ardmore dumped core - see /var/crash/vmcore.0
Mon Oct 23 04:08:09 EDT 2017
FreeBSD ardmore 12.0-CURRENT FreeBSD 12.0-CURRENT #1 fcca5326804(master): Mon Oct 23 03:38:35 EDT 2017 root@ardmore:/usr/local/obj/usr/local/src/sys/GENERIC amd64
panic: page fault
GNU gdb (GDB) 8.0.1 [GDB v8.0.1 for FreeBSD]
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd12.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...done.
done.
Unread portion of the kernel message buffer:
<6>[drm] GPU HANG: ecode 8:0:0xfffffffe, reason: Hang on render ring, action: reset
<6>[drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
<6>[drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
<6>[drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
<6>[drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
<6>[drm] GPU crash dump saved to /sys/class/drm/card0/error
<5>drm/i915: Resetting chip after gpu hang
Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address = 0xa8
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff847f552c
stack pointer = 0x28:0xfffffe0233ee9580
frame pointer = 0x28:0xfffffe0233ee95e0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 0 (linuxkpi_long_wq_3)
trap number = 12
panic: page fault
cpuid = 1
time = 1508746026
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0233ee9160
vpanic() at vpanic+0x19c/frame 0xfffffe0233ee91e0
panic() at panic+0x43/frame 0xfffffe0233ee9240
trap_fatal() at trap_fatal+0x352/frame 0xfffffe0233ee9290
trap_pfault() at trap_pfault+0x62/frame 0xfffffe0233ee92f0
trap() at trap+0x2c5/frame 0xfffffe0233ee94b0
calltrap() at calltrap+0x8/frame 0xfffffe0233ee94b0
--- trap 0xc, rip = 0xffffffff847f552c, rsp = 0xfffffe0233ee9580, rbp = 0xfffffe0233ee95e0 ---
reset_common_ring() at reset_common_ring+0x12c/frame 0xfffffe0233ee95e0
i915_gem_reset_engine() at i915_gem_reset_engine+0xef/frame 0xfffffe0233ee9640
i915_gem_reset() at i915_gem_reset+0x62/frame 0xfffffe0233ee9670
i915_reset() at i915_reset+0x162/frame 0xfffffe0233ee96d0
i915_reset_and_wakeup() at i915_reset_and_wakeup+0xc9/frame 0xfffffe0233ee9730
i915_handle_error() at i915_handle_error+0x154/frame 0xfffffe0233ee9830
i915_hangcheck_elapsed() at i915_hangcheck_elapsed+0x654/frame 0xfffffe0233ee9970
linux_work_fn() at linux_work_fn+0xf1/frame 0xfffffe0233ee99e0
taskqueue_run_locked() at taskqueue_run_locked+0x15d/frame 0xfffffe0233ee9a40
taskqueue_thread_loop() at taskqueue_thread_loop+0x88/frame 0xfffffe0233ee9a70
fork_exit() at fork_exit+0x84/frame 0xfffffe0233ee9ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0233ee9ab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Uptime: 12s
Dumping 482 out of 8058 MB:..4%..14%..24%..34%..44%..54%..63%..73%..83%..93%
__curthread () at ./machine/pcpu.h:232
232 __asm("movq %%gs:%1,%0" : "=r" (td)
(kgdb) #0 __curthread () at ./machine/pcpu.h:232
#1 doadump (textdump=1) at /usr/local/src/sys/kern/kern_shutdown.c:317
#2 0xffffffff80a6b8d5 in kern_reboot (howto=260)
at /usr/local/src/sys/kern/kern_shutdown.c:385
#3 0xffffffff80a6bec6 in vpanic (fmt=<optimized out>, ap=0xfffffe0233ee9220)
at /usr/local/src/sys/kern/kern_shutdown.c:778
#4 0xffffffff80a6bf13 in panic (fmt=<unavailable>)
at /usr/local/src/sys/kern/kern_shutdown.c:709
#5 0xffffffff80f14d92 in trap_fatal (frame=0xfffffe0233ee94c0, eva=168)
at /usr/local/src/sys/amd64/amd64/trap.c:799
#6 0xffffffff80f14e02 in trap_pfault (frame=0xfffffe0233ee94c0, usermode=0)
at /usr/local/src/sys/amd64/amd64/trap.c:653
#7 0xffffffff80f145c5 in trap (frame=0xfffffe0233ee94c0)
at /usr/local/src/sys/amd64/amd64/trap.c:420
#8 <signal handler called>
#9 0xffffffff847f552c in ?? ()
#10 0xfffffe0233ee95b0 in ?? ()
#11 0xffffffff846f0e29 in intel_crtc_cursor_set (crtc=0xfffffe0002edce20,
file=<optimized out>, handle=<optimized out>, width=49139184,
height=4294901760)
at /usr/local/src/sys/dev/drm2/i915/intel_display.c:6479
#12 0xffffffff846e83ff in assert_panel_unlocked (dev_priv=<optimized out>,
pipe=<optimized out>)
at /usr/local/src/sys/dev/drm2/i915/intel_display.c:1192
#13 ironlake_pch_enable (crtc=<optimized out>)
at /usr/local/src/sys/dev/drm2/i915/intel_display.c:3144
#14 ironlake_crtc_enable (crtc=0xfffffe0002ed4d38)
at /usr/local/src/sys/dev/drm2/i915/intel_display.c:3388
#15 0xffffffff846e8262 in ironlake_enable_pch_pll (intel_crtc=<optimized out>)
at /usr/local/src/sys/dev/drm2/i915/intel_display.c:1603
#16 ironlake_pch_enable (crtc=<optimized out>)
at /usr/local/src/sys/dev/drm2/i915/intel_display.c:3115
#17 ironlake_crtc_enable (crtc=0x1fffe0002ed3000)
at /usr/local/src/sys/dev/drm2/i915/intel_display.c:3388
#18 0xffffffff846dba52 in intel_crt_set_dpms (
encoder=0xffffffff846e8262 <ironlake_crtc_enable+1554>, mode=3)
at /usr/local/src/sys/dev/drm2/i915/intel_crt.c:115
#19 intel_disable_crt (encoder=0xffffffff846e8262 <ironlake_crtc_enable+1554>)
at /usr/local/src/sys/dev/drm2/i915/intel_crt.c:120
#20 0xffffffff8472e5c9 in ?? () from /boot/kernel/i915kms.ko
#21 0xfffffe0002edcdf0 in ?? ()
#22 0x0000000000000003 in ?? ()
#23 0xfffff80007a353e0 in ?? ()
#24 0x0000000000000002 in ?? ()
#25 0xfffffe0233ee9730 in ?? ()
#26 0x00ffffff00000001 in ?? ()
#27 0x0000000000000003 in ?? ()
#28 0x0000000000000000 in ?? ()
(kgdb)
If it's going to panic, it will always do so after exactly 12 seconds of uptime.
Ok, it helps to know where the null pointer dereference is. If @markjdb and @hselasky don't have time I'll take a look on the weekend.
I have a System76 Galago Pro (https://wiki.freebsd.org/Laptops/System76%20Galago%20Pro) and see exactly the same warning and panic.
I remember there was a tunable you could set to no do that GPU hang check. sysctl compat.linuxkpi.enable_hangcheck=0
I'm currently having consistent panics after updating -CURRENT last night. Trying to disable hangcheck via sysctl, but alas, sysctl: unknown oid 'compat.linuxkpi.enable_hangcheck'
@vishwin You need to set it in /boot/loader.conf
.
Don't remember that working either. The panics have stopped for now so I will try it when it decides to repeatedly panic again.
Okay, so the loader.conf
tunable works. However, suspend via acpiconf -s 3
is borked with the tunable set to disable hangcheck. Sometimes the screen shuts off, sometimes it hangs on whatever kernel messages are scrolling as the ACPI state changes, but both result in a vegetative state (for lack of a better term) and only a hard reset or power cycle will cure things (albeit having to boot again).
Closing this since it has long since been fixed.
Repeatably, about 5 seconds after kldload i915kms (transcribing, apologies for eliding unhelpful boilerplate):
This is on a Core i5-7200U laptop using the latest drm-next code. Any ideas?