Open kode54 opened 2 weeks ago
If you're doing this, you need to make every plugin which has GL state (textures, framebuffers, programs) reload its state as well.
May as well reload them all, then.
To provoke a reset, at least on amdgpu:
# cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover
May as well reload them all, then.
To provoke a reset, at least on amdgpu:
# cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover
Unloading a plugin might cause losing a lot of temporary state, which is not what we want in the ideal case .. Not to mention some plugins cannot be unloaded safely.
I can't really think of an 'unloadable' plugin that also does GL stuff. Are there any?
I can't really think of an 'unloadable' plugin that also does GL stuff. Are there any?
I'd prefer to not make assumptions, maybe such a plugin will come in the future.
Sure, but I was thinking more along the lines of having no plugins loaded when testing, and if that works, then maybe hinge on unloadable flag for now until it works, then consider adding a new flag.
Maybe instead a notification should be plumbed to plugins that need it, to notify them that they need to free and reallocate their GPU resources? Would be better than forcing a full unload.
Maybe instead a notification should be plumbed to plugins that need it, to notify them that they need to free and reallocate their GPU resources? Would be better than forcing a full unload.
Yes that's the best solution.
New attempt without any plugins that would have GL, new backtrace:
#0 __pthread_kill_implementation
(threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0)
at pthread_kill.c:44
#1 0x00007da0c01b6463 in __pthread_kill_internal (threadid=<optimized out>, signo=6)
at pthread_kill.c:78
#2 0x00007da0c015d120 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x00007da0c01444c3 in __GI_abort () at abort.c:79
#4 0x00007da0c01443df in __assert_fail_base
(fmt=0x7da0c02d4c20 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x6362c73a2c88 "wlr_texture_is_gles2(texture)", file=file@entry=0x6362c73a291a "../src/core/opengl.cpp", line=line@entry=580, function=function@entry=0x6362c73a65c0 "wf::texture_t::texture_t(wlr_texture*, std::optional<wlr_fbox>)") at assert.c:94
#5 0x00007da0c0155177 in __assert_fail
(assertion=0x6362c73a2c88 "wlr_texture_is_gles2(texture)", file=0x6362c73a291a "../src/core/opengl.cpp", line=580, function=0x6362c73a65c0 "wf::texture_t::texture_t(wlr_texture*, std::optional<wlr_fbox>)") at assert.c:103
#6 0x00006362c72fa240 in wf::texture_t::texture_t
(this=this@entry=0x7fff1604d4a0, texture=0x6362f40c78f0, viewport=Python Exception <class 'gdb.error'>: value has been optimized out
..., this=<optimized out>, texture=<optimized out>, viewport=Python Exception <class 'gdb.error'>: value has been optimized out
...) at ../src/core/opengl.cpp:580
#7 0x00006362c7378926 in wf::scene::wlr_surface_node_t::wlr_surface_render_instance_t::render (this=0x6362f41f62f0, target=..., region=...) at ../src/view/wlr-surface-node.cpp:317
#8 0x00006362c73841b3 in wf::scene::render_instance_t::render
(this=<optimized out>, target=..., region=..., custom_data=std::any [no contained value]) at ../src/api/wayfire/scene-render.hpp:121
#9 wf::scene::run_render_pass (params=..., flags=flags@entry=3)
--Type <RET> for more, q to quit, c to continue without paging--c
at ../src/output/render-manager.cpp:1227
#10 0x00006362c7385146 in wf::render_manager::impl::render_output (this=0x6362f39fb180)
at ../src/output/render-manager.cpp:1092
#11 wf::render_manager::impl::paint (this=0x6362f39fb180)
at ../src/output/render-manager.cpp:1141
#12 0x00006362c7386428 in wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}::operator()(void*) const (__closure=0x6362f39fb180) at ../src/output/render-manager.cpp:968
#13 std::__invoke_impl<void, wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}&, void*>(std::__invoke_other, wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}&, void*&&) (__f=...) at /usr/include/c++/14.2.1/bits/invoke.h:61
#14 std::__invoke_r<void, wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}&, void*>(wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}&, void*&&)
(__fn=...) at /usr/include/c++/14.2.1/bits/invoke.h:111
#15 std::_Function_handler<void (void*), wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}>::_M_invoke(std::_Any_data const&, void*&&)
(__functor=..., __args#0=<optimized out>)
at /usr/include/c++/14.2.1/bits/std_function.h:290
#16 0x00006362c72d0e82 in std::function<void(void*)>::operator()
(this=<optimized out>, __args#0=<optimized out>)
at /usr/include/c++/14.2.1/bits/std_function.h:591
#17 wf::wl_listener_wrapper::emit (this=<optimized out>, data=<optimized out>)
at ../src/wl-listener-wrapper.tpp:57
#18 wf::handle_wrapped_listener (listener=<optimized out>, data=<optimized out>)
at ../src/wl-listener-wrapper.tpp:10
#19 0x00007da0c0acc47e in wl_signal_emit_mutable
(signal=<optimized out>, data=0x6362f3928f30)
at ../wayland-1.23.1/src/wayland-server.c:2314
#20 0x00007da0c0acdefc in wl_event_loop_dispatch_idle (loop=loop@entry=0x6362f27e2330)
at ../wayland-1.23.1/src/event-loop.c:970
#21 0x00007da0c0ace177 in wl_event_loop_dispatch
(loop=0x6362f27e2330, timeout=<optimized out>, timeout@entry=-1)
at ../wayland-1.23.1/src/event-loop.c:1110
#22 0x00007da0c0ad01f7 in wl_display_run (display=0x6362f27e2240)
at ../wayland-1.23.1/src/wayland-server.c:1530
#23 0x00006362c72cf2db in main (argc=<optimized out>, argv=<optimized out>)
at ../src/main.cpp:515
Some wild guesses based on the stacktrace - Wayfire keeps a reference of the surface's texture/buffer:
Depending on how wlroots has implemented GPU reset handling, maybe they change the texture/buffer pointer? So after the reset, we still hold on to the old texture until a new buffer is committed, but the old texture isn't valid anymore because of the gpu reset?
Here is my attempt at display loss recovery implementation for wlroots 0.18:
https://gist.github.com/kode54/58b9e30ed73f82e1cfb040fe84f36c66
It doesn't work so well.
Last attempt crashes with this backtrace:
And then it drops to a terminal and fails to restart cage as my login manager, and hangs the GPU completely.