I'm using Firefox Nightly (currently built 2019-02-09) on Wayland (natively) with GL acceleration enabled (just GL layer compositing as WebRender+Wayland is incomplete right now). (Also: Mesa 19.0.0-dev as of 2019-02-02, Wayfire as of today, kernel as of yesterday.)
When browsing Reddit redesign (non-old.) in this browser, some weird error happens, with dmesg lines like:
amdgpu_vm_validate_pt_bos() failed.
[drm:amdgpu_ih_process] [drm:amdgpu_cs_ioctl] Failed to process the buffer list -22!
[TTM] Buffer eviction failed
drmn0: failed to get a new IB (-22)
22 is EINVAL, and invalid calls from userspace are okay, but something is happening with locking when that happens.
On a NODEBUG kernel, this results in:
visual artifacts (usually white rectangles blinking over the Reddit page, rarely — texture glitches on the whole browser surface, but none of the artifacts are permanent)
rarely, the browser freezing (though I think that also happened on various other pages randomly)
the browser process becoming unkillable in lkpi-ww state when trying to kill after the freeze (or even when closing normally after the artifacts were present?? I think?? Not sure.) (I think I also might have seen drm_global_mutex a couple times?? like #98 but in the firefox process)
I finally decided to run a debug kernel and found some more information. The debug panic is:
panic: userret: Returning with 1 locks held
For some reason ddb thinks Firefox is unmounting something??
But that's clearly not true. Digging around with kgdb shows that it's ioctl(AMDGPU_CS) (44). Another similar crash I got today seems to be ioctl(DRM_RES_CTX) (38 / 0x26). (UPD: another 38, now on YouTube)
Maybe this is the same as #98, maybe there's two different places, but something in amdgpu (probably in error handling code) is not unlocking some important lock.
I see one BSDFIXME in amdgpu_cs.c, and interestingly it mentions a mutex struct:
/* BSDFIXME: On FreeBSD we don't store the ww_acquire_ctx in the ww_mutex struct */
/* Double check that the BO is reserved by this CS */
There's a weird locking bug in amdgpu somewhere.
I'm using Firefox Nightly (currently built 2019-02-09) on Wayland (natively) with GL acceleration enabled (just GL layer compositing as WebRender+Wayland is incomplete right now). (Also: Mesa 19.0.0-dev as of 2019-02-02, Wayfire as of today, kernel as of yesterday.)
When browsing Reddit redesign (non-
old.
) in this browser, some weird error happens, with dmesg lines like:22 is
EINVAL
, and invalid calls from userspace are okay, but something is happening with locking when that happens.On a
NODEBUG
kernel, this results in:lkpi-ww
state when trying to kill after the freeze (or even when closing normally after the artifacts were present?? I think?? Not sure.) (I think I also might have seendrm_global_mutex
a couple times?? like #98 but in thefirefox
process)I finally decided to run a debug kernel and found some more information. The debug panic is:
For some reason ddb thinks Firefox is unmounting something??
But that's clearly not true. Digging around with kgdb shows that it's
ioctl(AMDGPU_CS)
(44
). Another similar crash I got today seems to beioctl(DRM_RES_CTX)
(38
/0x26
). (UPD: another 38, now on YouTube)Maybe this is the same as #98, maybe there's two different places, but something in amdgpu (probably in error handling code) is not unlocking some important lock.
I see one
BSDFIXME
inamdgpu_cs.c
, and interestingly it mentions a mutex struct: