galister / wlx-overlay-s

Access your Wayland/X11 desktop from Monado/WiVRn/SteamVR. Now with Vulkan!
GNU General Public License v3.0
100 stars 18 forks source link

Log panic error due to "DeviceLost" #53

Open AdalynBlack opened 3 days ago

AdalynBlack commented 3 days ago

Hi, I keep getting the following error after using wlx-overlay-s on Fedora Linux, using ALVR, SteamVR, a Quest 2, and Wayland on a Nvidia GPU:

ERROR [log_panics] thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: DeviceLost': /home/runner/.cargo/git/checkouts/vulkano-cb672043253a6e8d/b9f3e89/vulkano/src/command_buffer/traits.rs:381
   0: <backtrace::capture::Backtrace as core::default::Default>::default
   1: log_panics::Config::install_panic_hook::{{closure}}
   2: std::panicking::rust_panic_with_hook
   3: std::panicking::begin_panic_handler::{{closure}}
   4: std::sys_common::backtrace::__rust_end_short_backtrace
   5: rust_begin_unwind
   6: core::panicking::panic_fmt
   7: core::result::unwrap_failed
   8: core::ptr::drop_in_place<vulkano::command_buffer::traits::CommandBufferExecFuture<vulkano::sync::future::now::NowFuture>>
   9: wlx_overlay_s::graphics::WlxCommandBuffer::build_and_execute_now
  10: <wlx_overlay_s::gui::Canvas<D,S> as wlx_overlay_s::backend::overlay::OverlayRenderer>::render
  11: wlx_overlay_s::backend::openvr::openvr_run
  12: wlx_overlay_s::main
  13: std::sys_common::backtrace::__rust_begin_short_backtrace
  14: std::rt::lang_start::{{closure}}
  15: std::rt::lang_start_internal
  16: main
  17: __libc_start_call_main
  18: __libc_start_main_impl
  19: _start

Attached is the full log file as well. I'll be attempting to connect rust gdb to get more data, and I'll add all info from that as soon as the crash happens again whilst using rust gdb

wlx.log

galister commented 3 days ago

Hi, what driver are you on? Did you by any chance try on more than one version?

AdalynBlack commented 3 days ago

I am using the Nvidia-Akmod package on Fedora. According to nvidia settings, the driver version is 550.90.07, and the NVML version is 12.550.90.07. According to DNF, I am using akmod-nvidia version 3:550.90.07-1.fc40 from the rpmfusion non-free nvidia driver repo

I've only tried one version of wlx-overlay-s, that being v0.4.2 in the AppImage format for x86_64

Also, I ran the program through rust-gdb and got the following error:

ERROR [log_panics] thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: DeviceLost': /home/runner/.cargo/git/checkouts/vulkano-cb672043253a6e8d/b9f3e89/vulkano/src/command_buffer/traits.rs:381
   0: <backtrace::capture::Backtrace as core::default::Default>::default
   1: log_panics::Config::install_panic_hook::{{closure}}
   2: std::panicking::rust_panic_with_hook
   3: std::panicking::begin_panic_handler::{{closure}}
   4: std::sys_common::backtrace::__rust_end_short_backtrace
   5: rust_begin_unwind
   6: core::panicking::panic_fmt
   7: core::result::unwrap_failed
   8: core::ptr::drop_in_place<vulkano::command_buffer::traits::CommandBufferExecFuture<vulkano::sync::future::now::NowFuture>>
   9: wlx_overlay_s::graphics::WlxCommandBuffer::build_and_execute_now
  10: wlx_overlay_s::gui::Canvas<D,S>::render_bg
  11: <wlx_overlay_s::gui::Canvas<D,S> as wlx_overlay_s::backend::overlay::OverlayRenderer>::render
  12: wlx_overlay_s::backend::openvr::openvr_run
  13: wlx_overlay_s::main
  14: std::sys_common::backtrace::__rust_begin_short_backtrace
  15: std::rt::lang_start::{{closure}}
  16: std::rt::lang_start_internal
  17: main
  18: __libc_start_call_main
             at /usr/src/debug/glibc-2.39-15.fc40.x86_64/csu/../sysdeps/nptl/libc_start_call_main.h:58:16
  19: __libc_start_main_impl
             at /usr/src/debug/glibc-2.39-15.fc40.x86_64/csu/../csu/libc-start.c:360:3
  20: _start

Not much new from that, but hopefully it helps. I didn't have all of the debug symbols that rust-gdb was looking for, but the symbols didn't exist in my repos for some reason (particularly, rust-gdb says that libgcc-14.1.1-6.fc40.x86_64, libstdc++-14.1.1-6.fc40.x86_64, xorg-x11-drv-nvidia-libs-550.90.07-1.fc40.x86_64, zlib-ng-compat-2.1.6-5.fc40.x86_64, and libselinux-3.6-4.fc40.x86_64 weren't found, although the selinux one happens after the crash)

AdalynBlack commented 3 days ago

Similar error on v0.4.0 as well

ERROR [log_panics] thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: DeviceLost': /home/runner/.cargo/git/checkouts/vulkano-cb672043253a6e8d/94f50f1/vulkano/src/command_buffer/traits.rs:381
   0: <backtrace::capture::Backtrace as core::default::Default>::default
   1: log_panics::Config::install_panic_hook::{{closure}}
   2: std::panicking::rust_panic_with_hook
   3: std::panicking::begin_panic_handler::{{closure}}
   4: std::sys_common::backtrace::__rust_end_short_backtrace
   5: rust_begin_unwind
   6: core::panicking::panic_fmt
   7: core::result::unwrap_failed
   8: core::ptr::drop_in_place<vulkano::command_buffer::traits::CommandBufferExecFuture<vulkano::sync::future::now::NowFuture>>
   9: wlx_overlay_s::graphics::WlxCommandBuffer::build_and_execute_now
  10: <wlx_overlay_s::gui::Canvas<D,S> as wlx_overlay_s::backend::overlay::OverlayRenderer>::render
  11: wlx_overlay_s::backend::openvr::openvr_run
  12: wlx_overlay_s::main
  13: std::sys_common::backtrace::__rust_begin_short_backtrace
  14: std::rt::lang_start::{{closure}}
  15: std::rt::lang_start_internal
  16: main
  17: __libc_start_call_main
             at /usr/src/debug/glibc-2.39-15.fc40.x86_64/csu/../sysdeps/nptl/libc_start_call_main.h:58:16
  18: __libc_start_main_impl
             at /usr/src/debug/glibc-2.39-15.fc40.x86_64/csu/../csu/libc-start.c:360:3
  19: _start
galister commented 3 days ago

vulkan losing your graphics device is likely going to be a system or driver issue, i'm afraid.

one thing i would try is to install an older driver version and see if it works with that.

also, are you actually using selinux? not sure if i saw anyone try to use this on selinux.

AdalynBlack commented 3 days ago

I'm using mostly stock fedora workstation, which is using selinux by default I think, or at least apparmor. I'll look into drivers and see if I can get anything that seems stable. It probably doesn't help that I'm running a 3070 on Wayland though lol. I tried X but it crashed before it even connected to SteamVR because it couldn't find the process

AdalynBlack commented 3 days ago

Doing my own research on the issue, it looks like it could also be the result of undefined behaviour in the code, although that seems unlikely given that the issue doesn't seem to be happening for the majority of people

galister commented 3 days ago

Maybe try if 0.3.x also had this. We updated Vulkano since then.

AdalynBlack commented 2 days ago

I ran v0.3.2 and got a much different error with identical behaviour. Full log is attached to this comment wlx.log That wasn't the log file from the 0.3.2 run. I guess it didn't log. Basically it was a bunch of OS Error 24, saying too many files were open. I'm going to try raising the file descriptor ulimit and see if that works to fix the issue for the time being

ulimit -n 4096 didn't help I'm re-running with rust-gdb and piping it all to a log file so I can report the next time it crashes

AdalynBlack commented 2 days ago

Here's my log from running v0.3.2 in rust-gdb wlx.log

I think it might be from the windows generated by new notifications not being deleted properly? Could be completely wrong though

galister commented 2 days ago

does it still happen on latest release if you enable pw_fallback? (see logs on how)

AdalynBlack commented 2 days ago

I'll give that a try. I set the config and I'll test it out tomorrow

AdalynBlack commented 2 days ago

I decided to run it in the background (no headset connected) and so far so good. I've never had it run this long before without a crash, although the lack of headset could be the reason for that. I'll try it without the fallback mode and see if that crashes

Followup: No headset without the fallback isn't crashing either. I'll just check tomorrow when I have time

AdalynBlack commented 7 hours ago

This seems to have fixed the issue for me (I haven't updated from v3.0.2 btw, but that one was also crashing similarly)

does it still happen on latest release if you enable pw_fallback? (see logs on how)

galister commented 7 hours ago

does obs also misbehave if you leave the screen capture running for long? it's not a perfect match, as obs uses egl, and wlx uses vulkan, but it would be an interesting find if it also has issues