danvd / wlroots-eglstreams

A modular Wayland compositor library with EGLStreams support
MIT License
105 stars 11 forks source link

Segfaults in Sway #17

Closed dbrgn closed 2 years ago

dbrgn commented 3 years ago

I have occasional segfaults in Sway:

2021-05-19-130634_1077x136_scrot

The stack trace looks like this:

       Message: Process 1317 (sway) of user 1000 dumped core.

                Stack trace of thread 1317:
                #0  0x00007f98a8316330 n/a (libwayland-server.so.0 + 0x16330)
                #1  0x0000000400005574 n/a (n/a + 0x0)

This library is owned by the wayland package (version 1.19.0-1 in my case), however I assume that the root cause of the segfault is in wlroots.

Would it help if I'd recompile wayland with debug info?

dbrgn commented 3 years ago

Here's a trace with debug symbols:

           PID: 65282 (sway)
        Signal: 11 (SEGV)
     Timestamp: Wed 2021-05-19 14:55:36 CEST (1min 31s ago)
  Command Line: sway --my-next-gpu-wont-be-nvidia -V
    Executable: /usr/bin/sway
     Disk Size: 20.5M
       Message: Process 65282 (sway) of user 1000 dumped core.

                Stack trace of thread 65282:
                #0  0x00007f065a39a158 wl_event_loop_dispatch (libwayland-server.so.0 + 0xb158)
                #1  0x00007f065a397c37 wl_display_run (libwayland-server.so.0 + 0x8c37)
                #2  0x0000563f609557f2 main (sway + 0x107f2)
                #3  0x00007f065a0acb25 __libc_start_main (libc.so.6 + 0x27b25)
                #4  0x0000563f60955b6e _start (sway + 0x10b6e)

                Stack trace of thread 65285:
                #0  0x00007f065a0778ca __futex_abstimed_wait_common64 (libpthread.so.0 + 0x158ca)
                #1  0x00007f065a071270 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xf270)
                #2  0x00007f0655783dc4 n/a (nouveau_dri.so + 0x78adc4)
                #3  0x00007f065576b808 n/a (nouveau_dri.so + 0x772808)
                #4  0x00007f065a06b259 start_thread (libpthread.so.0 + 0x9259)
                #5  0x00007f065a1835e3 __clone (libc.so.6 + 0xfe5e3)

                Stack trace of thread 65294:
                #0  0x00007f065a0778ca __futex_abstimed_wait_common64 (libpthread.so.0 + 0x158ca)
                #1  0x00007f065a071270 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xf270)
                #2  0x00007f0655783dc4 n/a (nouveau_dri.so + 0x78adc4)
                #3  0x00007f065576b808 n/a (nouveau_dri.so + 0x772808)
                #4  0x00007f065a06b259 start_thread (libpthread.so.0 + 0x9259)
                #5  0x00007f065a1835e3 __clone (libc.so.6 + 0xfe5e3)

                Stack trace of thread 65287:
                #0  0x00007f065a0778ca __futex_abstimed_wait_common64 (libpthread.so.0 + 0x158ca)
                #1  0x00007f065a071270 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xf270)
                #2  0x00007f0655783dc4 n/a (nouveau_dri.so + 0x78adc4)
                #3  0x00007f065576b808 n/a (nouveau_dri.so + 0x772808)
                #4  0x00007f065a06b259 start_thread (libpthread.so.0 + 0x9259)
                #5  0x00007f065a1835e3 __clone (libc.so.6 + 0xfe5e3)

                Stack trace of thread 65289:
                #0  0x00007f065a0778ca __futex_abstimed_wait_common64 (libpthread.so.0 + 0x158ca)
                #1  0x00007f065a071270 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xf270)
                #2  0x00007f0655783dc4 n/a (nouveau_dri.so + 0x78adc4)
                #3  0x00007f065576b808 n/a (nouveau_dri.so + 0x772808)
                #4  0x00007f065a06b259 start_thread (libpthread.so.0 + 0x9259)
                #5  0x00007f065a1835e3 __clone (libc.so.6 + 0xfe5e3)

                Stack trace of thread 65298:
                #0  0x00007f065a0778ca __futex_abstimed_wait_common64 (libpthread.so.0 + 0x158ca)
                #1  0x00007f065a071270 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xf270)
                #2  0x00007f0655783dc4 n/a (nouveau_dri.so + 0x78adc4)
                #3  0x00007f065576b808 n/a (nouveau_dri.so + 0x772808)
                #4  0x00007f065a06b259 start_thread (libpthread.so.0 + 0x9259)
                #5  0x00007f065a1835e3 __clone (libc.so.6 + 0xfe5e3)

                Stack trace of thread 65291:
                #0  0x00007f065a0778ca __futex_abstimed_wait_common64 (libpthread.so.0 + 0x158ca)
                #1  0x00007f065a071270 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xf270)
                #2  0x00007f0655783dc4 n/a (nouveau_dri.so + 0x78adc4)
                #3  0x00007f065576b808 n/a (nouveau_dri.so + 0x772808)
                #4  0x00007f065a06b259 start_thread (libpthread.so.0 + 0x9259)
                #5  0x00007f065a1835e3 __clone (libc.so.6 + 0xfe5e3)

                Stack trace of thread 65301:
                #0  0x00007f065a0778ca __futex_abstimed_wait_common64 (libpthread.so.0 + 0x158ca)
                #1  0x00007f065a071270 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xf270)
                #2  0x00007f065576e814 n/a (nouveau_dri.so + 0x775814)
                #3  0x00007f065576b7d8 n/a (nouveau_dri.so + 0x7727d8)
                #4  0x00007f065a06b259 start_thread (libpthread.so.0 + 0x9259)
                #5  0x00007f065a1835e3 __clone (libc.so.6 + 0xfe5e3)

                Stack trace of thread 65304:
                #0  0x00007f065a0778ca __futex_abstimed_wait_common64 (libpthread.so.0 + 0x158ca)
                #1  0x00007f065a071270 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xf270)
                #2  0x00007f065576e814 n/a (nouveau_dri.so + 0x775814)
                #3  0x00007f065576b7d8 n/a (nouveau_dri.so + 0x7727d8)
                #4  0x00007f065a06b259 start_thread (libpthread.so.0 + 0x9259)
                #5  0x00007f065a1835e3 __clone (libc.so.6 + 0xfe5e3)

                Stack trace of thread 65303:
                #0  0x00007f065a0778ca __futex_abstimed_wait_common64 (libpthread.so.0 + 0x158ca)
                #1  0x00007f065a071270 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xf270)
                #2  0x00007f065576e814 n/a (nouveau_dri.so + 0x775814)
                #3  0x00007f065576b7d8 n/a (nouveau_dri.so + 0x7727d8)
                #4  0x00007f065a06b259 start_thread (libpthread.s
sheepymeh commented 3 years ago

I seem to be able to reproduce some kind of crash by opening a very large number of kitty windows quickly.

The logs seem to change each time it crashes:

First crash ``` Process 1342 (sway) of user 1000 dumped core. Found module [...] Stack trace of thread 1342: #0 0x000055b46d608778 n/a (n/a + 0x0) ```
Second crash ``` Process 4685 (sway) of user 1000 dumped core. Found module [...] Stack trace of thread 4685: #0 0x00007f1ae4220004 n/a (libnvidia-eglcore.so.470.63.01 + 0xe0b004) #1 0x00007f1ae4220430 n/a (libnvidia-eglcore.so.470.63.01 + 0xe0b430) #2 0x00007f1ae4274548 n/a (libnvidia-eglcore.so.470.63.01 + 0xe5f548) #3 0x00007f1ae42d167f n/a (libnvidia-eglcore.so.470.63.01 + 0xebc67f) #4 0x00007f1ae42d17de n/a (libnvidia-eglcore.so.470.63.01 + 0xebc7de) #5 0x00007f1ae42d1800 n/a (libnvidia-eglcore.so.470.63.01 + 0xebc800) #6 0x00007f1ae42c8f5d n/a (libnvidia-eglcore.so.470.63.01 + 0xeb3f5d) #7 0x00007f1ae3f7372b n/a (libnvidia-eglcore.so.470.63.01 + 0xb5e72b) #8 0x00007f1ae682c3bf n/a (libwlroots.so.10 + 0x303bf) #9 0x00005585f3a2a052 update_title_texture (sway + 0x53052) #10 0x00005585f3a2d5f9 container_update_title_textures (sway + 0x565f9) #11 0x00005585f3a2d6d3 container_update_representation (sway + 0x566d3) #12 0x00005585f3a34135 container_detach (sway + 0x5d135) #13 0x00005585f3a33aed view_unmap (sway + 0x5caed) #14 0x00005585f39f5d8c handle_unmap (sway + 0x1ed8c) #15 0x00007f1ae687a14e wlr_signal_emit_safe (libwlroots.so.10 + 0x7e14e) #16 0x00007f1ae684f430 n/a (libwlroots.so.10 + 0x53430) #17 0x00007f1ae6871afe n/a (libwlroots.so.10 + 0x75afe) #18 0x00007f1ae5fbfacd n/a (libffi.so.7 + 0x6acd) #19 0x00007f1ae5fbf03a n/a (libffi.so.7 + 0x603a) #20 0x00007f1ae68cd124 n/a (libwayland-server.so.0 + 0xd124) #21 0x00007f1ae68c857c n/a (libwayland-server.so.0 + 0x857c) #22 0x00007f1ae68cb07a wl_event_loop_dispatch (libwayland-server.so.0 + 0xb07a) #23 0x00007f1ae68c8be7 wl_display_run (libwayland-server.so.0 + 0x8be7) #24 0x00005585f39e78f6 main (sway + 0x108f6) #25 0x00007f1ae65d3b25 __libc_start_main (libc.so.6 + 0x27b25) #26 0x00005585f39e7c5e _start (sway + 0x10c5e) ```
Third crash ``` Process 7154 (sway) of user 1000 dumped core. Found module [...] Stack trace of thread 7154: #0 0x00005611f4875b49 render_containers_linear (sway + 0x23b49) #1 0x00005611f4876463 render_container (sway + 0x24463) #2 0x00005611f4875b73 render_containers_linear (sway + 0x23b73) #3 0x00005611f4876463 render_container (sway + 0x24463) #4 0x00005611f4875b73 render_containers_linear (sway + 0x23b73) #5 0x00005611f4876463 render_container (sway + 0x24463) #6 0x00005611f4875b73 render_containers_linear (sway + 0x23b73) #7 0x00005611f4876463 render_container (sway + 0x24463) #8 0x00005611f4875b73 render_containers_linear (sway + 0x23b73) #9 0x00005611f4876463 render_container (sway + 0x24463) #10 0x00005611f4875b73 render_containers_linear (sway + 0x23b73) #11 0x00005611f4876463 render_container (sway + 0x24463) #12 0x00005611f4875b73 render_containers_linear (sway + 0x23b73) #13 0x00005611f4877e3f output_repaint_timer_handler (sway + 0x25e3f) #14 0x00005611f4878070 damage_handle_frame (sway + 0x26070) #15 0x00007f50b012c14e wlr_signal_emit_safe (libwlroots.so.10 + 0x7e14e) #16 0x00007f50b012c14e wlr_signal_emit_safe (libwlroots.so.10 + 0x7e14e) #17 0x00007f50b00e4aa9 n/a (libwlroots.so.10 + 0x36aa9) #18 0x00007f50af797c87 drmHandleEvent (libdrm.so.2 + 0xdc87) #19 0x00007f50b00e367e n/a (libwlroots.so.10 + 0x3567e) #20 0x00007f50b017d07a wl_event_loop_dispatch (libwayland-server.so.0 + 0xb07a) #21 0x00007f50b017abe7 wl_display_run (libwayland-server.so.0 + 0x8be7) #22 0x00005611f48628f6 main (sway + 0x108f6) #23 0x00007f50afe85b25 __libc_start_main (libc.so.6 + 0x27b25) #24 0x00005611f4862c5e _start (sway + 0x10c5e) ```

After each of the logs there would also be the following:

Stack trace of thread 8877:
  #0  0x00007f50aff5da3c recv (libc.so.6 + 0xffa3c)
  #1  0x00007f50aeedf153 n/a (libEGL_nvidia.so.0 + 0x79153)
  #2  0x00007f50aeedff1d n/a (libEGL_nvidia.so.0 + 0x79f1d)
  #3  0x00007f50aeed507d n/a (libEGL_nvidia.so.0 + 0x6f07d)
  #4  0x00007f50aef2271e n/a (libEGL_nvidia.so.0 + 0xbc71e)
  #5  0x00007f50afe44259 start_thread (libpthread.so.0 + 0x9259)
  #6  0x00007f50aff5c5e3 __clone (libc.so.6 + 0xfe5e3)

for every window of kitty opened.

Doing the same for Firefox or the foot terminal did not lead to this issue, so I think it has to do with kitty and the fact that it uses the GPU for rendering. Perhaps it's linked to this issue?

danvd commented 3 years ago

Looking at crashed thread 7154 it seems like sway recursive rendering lead to stack overflow

sheepymeh commented 3 years ago

I couldn't reproduce on another machine running an AMD iGPU, so I think that it might be linked to the proprietary NVIDIA drivers. This issue also occurs when I open many instances of mpv with -vo=gpu --opengl-es=yes quickly.

danvd commented 2 years ago

Closing this. Reopen if issue is still here.