WayfireWM / wayfire

A modular and extensible wayland compositor
https://wayfire.org/
MIT License
2.36k stars 175 forks source link

Display loss recovery attempt for track-wlroots 0.18 branch #2458

Open kode54 opened 2 weeks ago

kode54 commented 2 weeks ago

Here is my attempt at display loss recovery implementation for wlroots 0.18:

https://gist.github.com/kode54/58b9e30ed73f82e1cfb040fe84f36c66

It doesn't work so well.

Last attempt crashes with this backtrace:

#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, 
    no_tid=no_tid@entry=0) at pthread_kill.c:44
44       return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;

#0  __pthread_kill_implementation
    (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x0000790b680a5463 in __pthread_kill_internal (threadid=<optimized out>, signo=6)
    at pthread_kill.c:78
#2  0x0000790b6804c120 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x0000790b680334c3 in __GI_abort () at abort.c:79
#4  0x0000790b680333df in __assert_fail_base
    (fmt=0x790b681c3c20 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5aacbd5146cd "handle || is_shutting_down()", file=file@entry=0x5aacbd5143de "../src/core/output-layout.cpp", line=line@entry=1709, function=function@entry=0x5aacbd517ae0 "wf::output_t* wf::output_layout_t::impl::get_output_coords_at(const wf::pointf_t&, wf::pointf_t&)") at assert.c:94
#5  0x0000790b68044177 in __assert_fail
    (assertion=0x5aacbd5146cd "handle || is_shutting_down()", file=0x5aacbd5143de "../src/core/output-layout.cpp", line=1709, function=0x5aacbd517ae0 "wf::output_t* wf::output_layout_t::impl::get_output_coords_at(const wf::pointf_t&, wf::pointf_t&)") at assert.c:103
#6  0x00005aacbd45f7a8 in wf::output_layout_t::impl::get_output_coords_at(wf::pointf_t const&, wf::pointf_t&) [clone .part.0] [clone .lto_priv.0]
    (closest=<optimized out>, origin=<optimized out>, this=<optimized out>)
    at ../src/core/output-layout.cpp:1709
#7  0x00005aacbd4782c0 in wf::output_layout_t::impl::get_output_coords_at
    (origin=<synthetic pointer>..., this=0x5aace2698960, closest=...) at ../src/core/core.cpp:297
#8  wf::output_layout_t::get_output_coords_at (this=<optimized out>, origin=..., closest=...)
    at ../src/core/output-layout.cpp:1762
#9  wf::compositor_core_impl_t::reconfigure_outputs (this=0x5aace0cd5430)
    at ../src/core/core.cpp:239
#10 0x00005aacbd513796 in main::{lambda(void*)#1}::operator()(void*) const [clone .isra.0]
    (__closure=0x5aace18aca20, data=<optimized out>) at ../src/main.cpp:458
#11 0x00005aacbd442c82 in std::function<void(void*)>::operator()
    (this=<optimized out>, __args#0=<optimized out>)
    at /usr/include/c++/14.2.1/bits/std_function.h:591
#12 wf::wl_listener_wrapper::emit (this=<optimized out>, data=<optimized out>)
    at ../src/wl-listener-wrapper.tpp:57
#13 wf::handle_wrapped_listener (listener=<optimized out>, data=<optimized out>)
    at ../src/wl-listener-wrapper.tpp:10
#14 0x0000790b68a0342e in wl_signal_emit_mutable
    (signal=signal@entry=0x5aace0fc13b8, data=data@entry=0x0)
    at ../wayland-1.23.0/src/wayland-server.c:2314
#15 0x0000790b68913b5f in begin_gles2_buffer_pass
    (buffer=0x5aace1e82560, prev_ctx=0x7ffc170a37a0, timer=0x0)
    at ../wlroots-hidpi-xprop/render/gles2/pass.c:258
#16 gles2_begin_buffer_pass
    (wlr_renderer=<optimized out>, wlr_buffer=0x5aace1e4eb30, options=<optimized out>)
    at ../wlroots-hidpi-xprop/render/gles2/renderer.c:262
#17 0x0000790b6890ce35 in wlr_renderer_begin_buffer_pass
    (renderer=<optimized out>, buffer=<optimized out>, options=<optimized out>)
    at ../wlroots-hidpi-xprop/render/wlr_renderer.c:304
#18 0x00005aacbd4f71da in wf::swapchain_damage_manager_t::start_frame (this=0x5aace1745de0)
    at ../src/output/render-manager.cpp:331
#19 wf::render_manager::impl::paint (this=0x5aace1d6b1b0) at ../src/output/render-manager.cpp:1130
#20 0x00005aacbd442ce6 in std::function<void()>::operator() (this=<optimized out>)
    at /usr/include/c++/14.2.1/bits/std_function.h:591
#21 handle_timeout (data=<optimized out>) at ../src/util.cpp:31
#22 0x0000790b68a053a6 in wl_timer_heap_dispatch (timers=0x5aace0cd5388)
    at ../wayland-1.23.0/src/event-loop.c:527
#23 wl_event_loop_dispatch (loop=0x5aace0cd5330, timeout=<optimized out>, timeout@entry=-1)
    at ../wayland-1.23.0/src/event-loop.c:1098
#24 0x0000790b68a0710f in wl_display_run (display=0x5aace0cd5240)
    at ../wayland-1.23.0/src/wayland-server.c:1530
#25 0x00005aacbd4410bb in main (argc=<optimized out>, argv=<optimized out>) at ../src/main.cpp:509

And then it drops to a terminal and fails to restart cage as my login manager, and hangs the GPU completely.

ammen99 commented 2 weeks ago

If you're doing this, you need to make every plugin which has GL state (textures, framebuffers, programs) reload its state as well.

kode54 commented 2 weeks ago

May as well reload them all, then.

To provoke a reset, at least on amdgpu:

# cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover
ammen99 commented 2 weeks ago

May as well reload them all, then.

To provoke a reset, at least on amdgpu:

# cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover

Unloading a plugin might cause losing a lot of temporary state, which is not what we want in the ideal case .. Not to mention some plugins cannot be unloaded safely.

soreau commented 2 weeks ago

I can't really think of an 'unloadable' plugin that also does GL stuff. Are there any?

ammen99 commented 2 weeks ago

I can't really think of an 'unloadable' plugin that also does GL stuff. Are there any?

I'd prefer to not make assumptions, maybe such a plugin will come in the future.

soreau commented 2 weeks ago

Sure, but I was thinking more along the lines of having no plugins loaded when testing, and if that works, then maybe hinge on unloadable flag for now until it works, then consider adding a new flag.

kode54 commented 1 week ago

Maybe instead a notification should be plumbed to plugins that need it, to notify them that they need to free and reallocate their GPU resources? Would be better than forcing a full unload.

ammen99 commented 1 week ago

Maybe instead a notification should be plumbed to plugins that need it, to notify them that they need to free and reallocate their GPU resources? Would be better than forcing a full unload.

Yes that's the best solution.

kode54 commented 1 day ago

New attempt without any plugins that would have GL, new backtrace:

#0  __pthread_kill_implementation
    (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0)
    at pthread_kill.c:44
#1  0x00007da0c01b6463 in __pthread_kill_internal (threadid=<optimized out>, signo=6)
    at pthread_kill.c:78
#2  0x00007da0c015d120 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007da0c01444c3 in __GI_abort () at abort.c:79
#4  0x00007da0c01443df in __assert_fail_base
    (fmt=0x7da0c02d4c20 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x6362c73a2c88 "wlr_texture_is_gles2(texture)", file=file@entry=0x6362c73a291a "../src/core/opengl.cpp", line=line@entry=580, function=function@entry=0x6362c73a65c0 "wf::texture_t::texture_t(wlr_texture*, std::optional<wlr_fbox>)") at assert.c:94
#5  0x00007da0c0155177 in __assert_fail
    (assertion=0x6362c73a2c88 "wlr_texture_is_gles2(texture)", file=0x6362c73a291a "../src/core/opengl.cpp", line=580, function=0x6362c73a65c0 "wf::texture_t::texture_t(wlr_texture*, std::optional<wlr_fbox>)") at assert.c:103
#6  0x00006362c72fa240 in wf::texture_t::texture_t
    (this=this@entry=0x7fff1604d4a0, texture=0x6362f40c78f0, viewport=Python Exception <class 'gdb.error'>: value has been optimized out
..., this=<optimized out>, texture=<optimized out>, viewport=Python Exception <class 'gdb.error'>: value has been optimized out
...) at ../src/core/opengl.cpp:580
#7  0x00006362c7378926 in wf::scene::wlr_surface_node_t::wlr_surface_render_instance_t::render (this=0x6362f41f62f0, target=..., region=...) at ../src/view/wlr-surface-node.cpp:317
#8  0x00006362c73841b3 in wf::scene::render_instance_t::render
    (this=<optimized out>, target=..., region=..., custom_data=std::any [no contained value]) at ../src/api/wayfire/scene-render.hpp:121
#9  wf::scene::run_render_pass (params=..., flags=flags@entry=3)
--Type <RET> for more, q to quit, c to continue without paging--c
    at ../src/output/render-manager.cpp:1227
#10 0x00006362c7385146 in wf::render_manager::impl::render_output (this=0x6362f39fb180)
    at ../src/output/render-manager.cpp:1092
#11 wf::render_manager::impl::paint (this=0x6362f39fb180)
    at ../src/output/render-manager.cpp:1141
#12 0x00006362c7386428 in wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}::operator()(void*) const (__closure=0x6362f39fb180) at ../src/output/render-manager.cpp:968
#13 std::__invoke_impl<void, wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}&, void*>(std::__invoke_other, wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}&, void*&&) (__f=...) at /usr/include/c++/14.2.1/bits/invoke.h:61
#14 std::__invoke_r<void, wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}&, void*>(wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}&, void*&&)
    (__fn=...) at /usr/include/c++/14.2.1/bits/invoke.h:111
#15 std::_Function_handler<void (void*), wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}>::_M_invoke(std::_Any_data const&, void*&&)
    (__functor=..., __args#0=<optimized out>)
    at /usr/include/c++/14.2.1/bits/std_function.h:290
#16 0x00006362c72d0e82 in std::function<void(void*)>::operator()
    (this=<optimized out>, __args#0=<optimized out>)
    at /usr/include/c++/14.2.1/bits/std_function.h:591
#17 wf::wl_listener_wrapper::emit (this=<optimized out>, data=<optimized out>)
    at ../src/wl-listener-wrapper.tpp:57
#18 wf::handle_wrapped_listener (listener=<optimized out>, data=<optimized out>)
    at ../src/wl-listener-wrapper.tpp:10
#19 0x00007da0c0acc47e in wl_signal_emit_mutable
    (signal=<optimized out>, data=0x6362f3928f30)
    at ../wayland-1.23.1/src/wayland-server.c:2314
#20 0x00007da0c0acdefc in wl_event_loop_dispatch_idle (loop=loop@entry=0x6362f27e2330)
    at ../wayland-1.23.1/src/event-loop.c:970
#21 0x00007da0c0ace177 in wl_event_loop_dispatch
    (loop=0x6362f27e2330, timeout=<optimized out>, timeout@entry=-1)
    at ../wayland-1.23.1/src/event-loop.c:1110
#22 0x00007da0c0ad01f7 in wl_display_run (display=0x6362f27e2240)
    at ../wayland-1.23.1/src/wayland-server.c:1530
#23 0x00006362c72cf2db in main (argc=<optimized out>, argv=<optimized out>)
    at ../src/main.cpp:515
ammen99 commented 1 day ago

Some wild guesses based on the stacktrace - Wayfire keeps a reference of the surface's texture/buffer:

https://github.com/WayfireWM/wayfire/blob/6796b08545b40c74f08242868b6287a6ede41550/src/view/wlr-surface-node.cpp#L61

Depending on how wlroots has implemented GPU reset handling, maybe they change the texture/buffer pointer? So after the reset, we still hold on to the old texture until a new buffer is committed, but the old texture isn't valid anymore because of the gpu reset?