GhostNaN / mpvpaper

A video wallpaper program for wlroots based wayland compositors.
GNU General Public License v3.0
805 stars 26 forks source link

Random crashes in ca. 1min to 1h on Intel laptop with igpu driver (Iris) calling abort() / SIGABRT #25

Closed gergo-salyi closed 2 years ago

gergo-salyi commented 2 years ago

Hey, first of all thanks for the recent patches.

I was hoping it will solve some random crashes of mpvpaper I had in the past half year, but sadly things didn't change for me.

cpu/gpu: Intel i5-1035G4 (10th gen mobile) display: external HDMI monitor to laptop system: Arch Linux wm: Sway

mpvpaper: reproduced both on 1.2.1 and current master branch mesa: reproduced both on 22.1.7-1 and current main branch kernel: reproduced both on 5.19.7.arch1-1 and current drm-tip

In the coredump all the crashes all traced as abort() < iris_dri.so < libEGL_mesa.so < render() at ../src/main.c:142 :

(gdb) bt
#0  0x00007f25f5c1e4dc in  () at /usr/lib/libc.so.6
#1  0x00007f25f5bce998 in raise () at /usr/lib/libc.so.6
#2  0x00007f25f5bb853d in abort () at /usr/lib/libc.so.6
#3  0x00007f25dd926613 in _iris_batch_flush(iris_batch*, char const*, int) (batch=0x5568aa8f8910, file=<optimized out>, line=<optimized out>)
    at ../mesa-main/src/gallium/drivers/iris/iris_batch.c:1121
#4  0x00007f25dd8fc4d7 in iris_fence_flush(pipe_context*, pipe_fence_handle**, unsigned int) (ctx=0x5568aa8f83f0, out_fence=0x7ffff78af8d8, flags=<optimized out>)
    at ../mesa-main/src/gallium/drivers/iris/iris_fence.c:267
#5  0x00007f25dd0dfbba in tc_flush(pipe_context*, pipe_fence_handle**, unsigned int) (_pipe=0x7f25d4679010, fence=0x7ffff78af8d8, flags=1)
    at ../mesa-main/src/gallium/auxiliary/util/u_threaded_context.c:3157
#6  0x00007f25dcce469a in st_flush (flags=1, fence=0x7ffff78af8d8, st=0x5568aa9198b0) at ../mesa-main/src/mesa/state_tracker/st_cb_flush.c:60
#7  st_context_flush(st_context_iface*, unsigned int, pipe_fence_handle**, void (*)(void*), void*)
    (stctxi=0x5568aa9198b0, flags=2, fence=0x7ffff78af8d8, before_flush_cb=0x7f25dcb35490 <notify_before_flush_cb(void*)>, args=0x7ffff78af8e0)
    at ../mesa-main/src/mesa/state_tracker/st_manager.c:808
#8  0x00007f25dcb34e6e in dri_flush(__DRIcontext*, __DRIdrawable*, unsigned int, __DRI2throttleReason)
    (cPriv=<optimized out>, dPriv=<optimized out>, flags=<optimized out>, reason=<optimized out>) at ../mesa-main/src/gallium/frontends/dri/dri_drawable.c:522
#9  0x00007f25dedf8e3b in dri2_wl_swap_buffers_with_damage (disp=0x5568aa8415a0, draw=0x5568aaa3cbd0, rects=<optimized out>, n_rects=<optimized out>)
    at ../mesa-main/src/egl/drivers/dri2/platform_wayland.c:1592
#10 0x00007f25dede6ae8 in dri2_swap_buffers (disp=0x5568aa8415a0, surf=0x5568aaa3cbd0) at ../mesa-main/src/egl/drivers/dri2/egl_dri2.c:2042
#11 0x00007f25dedd5825 in eglSwapBuffers (dpy=<optimized out>, surface=0x5568aaa3cbd0) at ../mesa-main/src/egl/main/eglapi.c:1421
#12 0x00005568a9bcbc0c in render (output=0x5568aa83c150) at ../src/main.c:142
#13 0x00007f25f5b8f536 in  () at /usr/lib/libffi.so.8
#14 0x00007f25f5b8c037 in  () at /usr/lib/libffi.so.8
#15 0x00007f25f5fee645 in  () at /usr/lib/libwayland-client.so.0
#16 0x00007f25f5feee03 in  () at /usr/lib/libwayland-client.so.0
#17 0x00007f25f5feeffc in wl_display_dispatch_queue_pending () at /usr/lib/libwayland-client.so.0
#18 0x00005568a9bcb5db in main (argc=<optimized out>, argv=<optimized out>) at ../src/main.c:1019

I attach the coredump and the gdb debugging session with (gdb) thread apply all bt full coredump-gdb-analysed.txt core.mpvpaper.1000.6f0499264a6a4d699984290d77c2b318.11720.1662816696000000.gz

I would like to ask your opinion as if this is likely a mpvpaper issue or likely a Mesa issue? If the later, then do you think I should I go ahead and report it on Mesa issues as an Intel igpu driver (Iris) bug? (potentially pointing Mesa devs to mpvpaper to reproduce it?)

GhostNaN commented 2 years ago

Considering it crashed at: https://github.com/GhostNaN/mpvpaper/blob/666f4c9a8fdc7e921073366fc939c335318f723f/src/main.c#L141-L144 And didn't just throw back an error message. I want to say mpvpaper didn't do anything wrong here.

Also, it went from having a pointer to the display, to not? #10 0x00007f25dede6ae8 in dri2_swap_buffers (disp=0x5568aa8415a0, surf=0x5568aaa3cbd0) at ../mesa-main/src/egl/drivers/dri2/egl_dri2.c:2042 #11 0x00007f25dedd5825 in eglSwapBuffers (dpy=<optimized out>, surface=0x5568aaa3cbd0) at ../mesa-main/src/egl/main/eglapi.c:1421

But I'm not a graphics expert, go ahead and take this upstream. Because even if it is mpvpaper, they might be able to guide us in the right direction.

Thank you for your excellent debug logs and effort!

gergo-salyi commented 2 years ago

I'm closing this because since mpvpaper commit 781320f this crash is extremely rare (happened only once for me since) and thus not reproducible.

(For the record: the crash likely happened with two Iris driver I915_GEM_EXECBUFFER2 ioctls which by random chance came within ~5us close in time, caused one of them to fail with errno ENOSPC which Mesa chooses to abort on. The leading cause to this event remains unknown, and could be anywhere in the whole application + libraries. The Mesa issue I reported is here although Mesa devs were not really interested.)

Moreover a week ago I saw a similar backtrace for a vanilla mpv crashing. Likely this situation was not and is not mpvpaper's fault.

GhostNaN commented 2 years ago

Thank you for your investigation into this issue!