dbalsom / martypc

An IBM PC/XT emulator written in Rust.
Other
570 stars 19 forks source link

"Device Lost" panic at high resolutions #84

Open dbalsom opened 8 months ago

dbalsom commented 8 months ago

I bought an ultrawide monitor recently and started getting very frequent crashes in MartyPC. I was experimenting with lower resolutions for testing and noticed the crashing stopped. Other testers have not had the same issue.

The native resolution of my monitor is 3440x1440, and my graphics card is a Nvidia 3080ti, on Windows 10. Driver version 546.65. If you have a similar issue please report the same specs here.

EDIT: It's definitely resolution related, setting a lower resolution stopped the crashes, but they resumed as soon as I went back to native resolution. It also appears that only the Vulkan backend is affected. After setting WGPU_BACKEND=dx12 the crashes also stopped. This would also indicate that it's not my bug.

jmd-z commented 2 days ago

I don't receive a "Device Lost" message, so I don't know if this is related or not.

Fedora 40, nvidia GeForce 1660Ti, nvidia proprietary drivers ver. 560 with desktop resolution 3840 x 2160

Fresh release build using branches version_0_2_0, version_0_2_3 and version_0_3_0. Run via cd install && cargo run -r, no config file edits.

It fairly reliably occurs when I pause emulation, obscure martypc behind another window, and then bring it back to the foreground. For 12 runs: 9 crashed on restoring the obscured window, 1 crashed when the emulation was first paused, 1 crashed when the window was first obscured, and 1 instance would not crash. Regardless when it did crash, the stack trace was always the same. I have yet to trigger this when the emulation is running or when using a dev build. Perhaps it is timing related? In my case setting my desktop resolution lower (1920 x 1080) does not make a difference. nvidia-smi shows martypc using 25MiB GPU RAM. This doesn't change when paused/running, foreground or obscured.

Tail of terminal output:

Selected machine config ibm5160 has resolved the following ROM sets:
  glabios_xt_turbo
Using default audio device: default
Loaded keyboard mapping file: ./configs/keyboard_layouts/keyboard_US.toml
Segmentation fault (core dumped)

Stack trace:

(gdb) where
#0  0x00007f9b5e810857 in ?? () from /lib64/libnvidia-glcore.so.560.35.03
#1  0x00007f9b5ec279e9 in ?? () from /lib64/libnvidia-glcore.so.560.35.03
#2  0x00007f9b5ec0d380 in ?? () from /lib64/libnvidia-glcore.so.560.35.03
#3  0x00007f9b5eb4b78e in ?? () from /lib64/libnvidia-glcore.so.560.35.03
#4  0x00005612fac147c1 in wgpu_hal::vulkan::instance::<impl wgpu_hal::Surface<wgpu_hal::vulkan::Api> for wgpu_hal::vulkan::Surface>::acquire_texture ()
#5  0x00005612faaee66e in wgpu_core::present::<impl wgpu_core::global::Global<G>>::surface_get_current_texture ()
#6  0x00005612fab49d46 in <wgpu::backend::wgpu_core::ContextWgpuCore as wgpu::context::Context>::surface_get_current_texture ()
#7  0x00005612fab57742 in <T as wgpu::context::DynContext>::surface_get_current_texture ()
#8  0x00005612faa864c9 in wgpu::Surface::get_current_texture ()
#9  0x00005612fa501b95 in pixels::Pixels::render_with ()
#10 0x00005612fa500de5 in <display_backend_pixels::PixelsBackend as display_backend_trait::DisplayBackend<marty_egui::context::GuiRenderContext>>::render ()
#11 0x00005612fa408a9d in <display_manager_wgpu::WgpuDisplayManager as frontend_common::display_manager::DisplayManager<display_backend_pixels::PixelsBackend,marty_egui::context::GuiRenderContext,winit::window::WindowId,winit::window::Window>>::for_each_backend ()
#12 0x00005612fa36a32e in frontend_common::timestep_manager::TimestepManager::wm_update ()
#13 0x00005612fa414577 in martypc_desktop_wgpu::event_loop::handle_event ()
#14 0x00005612fa44e56c in winit::platform_impl::platform::wayland::event_loop::EventLoop<T>::pump_events ()
#15 0x00005612fa44ffab in winit::platform_impl::platform::wayland::event_loop::EventLoop<T>::run_on_demand ()
#16 0x00005612fa37be3b in winit::platform_impl::platform::EventLoop<T>::run ()
#17 0x00005612fa35f965 in martypc_desktop_wgpu::run ()
#18 0x00005612fa34bdc3 in std::sys::backtrace::__rust_begin_short_backtrace ()
#19 0x00005612fa34bdb9 in std::rt::lang_start::{{closure}} ()
#20 0x00005612faf744d0 in std::rt::lang_start_internal ()
#21 0x00005612fa34be15 in main ()
dbalsom commented 2 days ago

that is a crash within libnvidia-glcore.so, being called by an upstream library wgpu - your video drivers shouldn't crash, so that's a bug nvidia has to fix

it's interesting it doesn't happen when the emulation is running. there's not much different going on video-wise when emulation is paused - i'm still updating the screen. the video driver shouldn't know or care that the emulation is paused.

can you get it to reliably crash just pausing emulation while the window is in the foreground?

jmd-z commented 1 day ago

No. Pause&Resume repeatedly, Pause & interact with all the menus, switch to another app for a while all work fine. Well, it pegs one core at 100% cpu, so the fan get a little noisy, but no stability issues that I can trigger as long as it remains on the screen. I can then resume, run some programs, minimize martypc, restore it and then obscure it with another window and then switch back just fine. After all that, Pause and then obscure the window again and it crashes within a few seconds, not waiting to be brought to the foreground this time.

dbalsom commented 1 day ago

can you show me a shot of your performance window while paused? (emulator->performance)

are you pausing from the Machine menu or from the CPU Control window?

jmd-z commented 1 day ago

So far, from the machine menu. Screenshot_20240930_173823 I'm seeing the same behavior though from debug->CPU Control now that I've tried that.

dbalsom commented 1 day ago

the only thing I can postulate is that the screen texture stops being updated when the machine is paused, even though it is continually rendered. maybe somehow this causes the display drive to discard the texture when the program loses focus, and we crash with an invalid texture reference when it is shown again?

I am not sure what I can do about this. I could set a flag to keep copying the buffer while paused, but that seems like a hack. best I can figure is to update wpgu and hope it has been fixed somehow

dbalsom commented 1 day ago

I see you are using vulkan. can you try running with this env set to fall back to opengl:

WGPU_BACKEND=gl

and see if you still see the crashes

jmd-z commented 1 day ago

I've tried, but can't get it to start. Apparently there is an issue in wgpu with nvidia on Linux at the moment: https://github.com/gfx-rs/wgpu/issues/4751 WGPU_BACKEND=gl cargo run -r Failed to create window target!: No suitable wgpu::Adapter found. And of course WGPU_BACKEND=gl,vulkan cargo run -r ends up using the vulkan backend - like it should when gl isn't available.