gfx-rs / wgpu

A cross-platform, safe, pure-Rust graphics API.
https://wgpu.rs
Apache License 2.0
12.25k stars 897 forks source link

Hang in VkDestroyDevice on Windows, if a device is dropped from a thread_local static variable #4973

Open AdrianEddy opened 8 months ago

AdrianEddy commented 8 months ago

Description I'm encountering a hang on Windows with the Vulkan backend in case a wgpu::Device is dropped from a static thread_local variable when the thread ends, despite there being another device in use on the main thread. When the thread_local data is dropped and thus wgpu::Device is dropped, it drops the Vulkan device and is stuck at this line

This doesn't happen with DX12 backend. OpenGL backend panics with thread '<unnamed>' panicked at 'called Result::unwrap() on an Err value: Os { code: 170, kind: ResourceBusy, message: "The requested resource is in use." }', wgpu-hal\src\gles\device.rs:618:43 I didn't check Linux or macOS. The hang also doesn't happen if I hold just the wgpu::Device in the thread_local variable, without wgpu::Queue

Repro steps I can reproduce it consistently using hello_triangle example with the following modifications:

  1. Add this code before async fn run:
    struct Test {
    queue: wgpu::Queue,
    device: wgpu::Device,
    }
    thread_local! {
    static TEST: std::cell::RefCell<Option<Test>> = std::cell::RefCell::new(None);
    }
  2. Add this code after let (device, queue) = adapter.request_device(...):
    let (device2, queue2) = adapter.request_device(&wgpu::DeviceDescriptor::default(), None).await.unwrap();
    std::thread::spawn(move || {
    TEST.with(|t| {
        *t.borrow_mut() = Some(Test {
            queue: queue2,
            device: device2,
        });
    });
    std::thread::sleep(std::time::Duration::from_secs(5));
    println!("thread ends");
    });

Expected vs observed behavior I expected it to just drop the device without affecting the rest of the app, but instead it hangs forever

Platform Windows 11, latest master (d03e290), NVIDIA RTX 3080 Ti with driver 546.33

cwfitzgerald commented 8 months ago

Could you post backtraces from all relevant threads after it gets stuck?

AdrianEddy commented 8 months ago

There's really only 2 threads: main (which still runs the event loop as usual), and the newly created thread that gets stuck. Backtrace is:

>   nvoglv64.dll!00007ffc7b6f6b8a() Unknown
    nvoglv64.dll!00007ffc7bbacfb7() Unknown
    nvoglv64.dll!00007ffc7bb7d19b() Unknown
    VkLayer_khronos_validation.dll!vulkan_layer_chassis::DestroyDevice(VkDevice_T * device, const VkAllocationCallbacks * pAllocator) Line 546  C++
    [External Code] 
    wgpu-examples.exe!ash::device::Device::destroy_device(enum2$<core::option::Option<ref$<ash::vk::definitions::AllocationCallbacks>>> self) Line 966  Rust
    wgpu-examples.exe!wgpu_hal::vulkan::DeviceShared::free_resources() Line 305 Rust
    wgpu-examples.exe!wgpu_hal::vulkan::device::impl$4::exit(wgpu_hal::vulkan::Device self, wgpu_hal::vulkan::Queue queue) Line 833 Rust
    wgpu-examples.exe!wgpu_core::device::resource::impl$1::drop<wgpu_hal::vulkan::Api>(wgpu_core::device::resource::Device<wgpu_hal::vulkan::Api> * self) Line 158  Rust
    wgpu-examples.exe!core::ptr::drop_in_place<wgpu_core::device::resource::Device<wgpu_hal::vulkan::Api>>(wgpu_core::device::resource::Device<wgpu_hal::vulkan::Api> *) Line 497   Rust
    wgpu-examples.exe!alloc::sync::Arc<wgpu_core::device::resource::Device<wgpu_hal::vulkan::Api>>::drop_slow<wgpu_core::device::resource::Device<wgpu_hal::vulkan::Api>>() Line 1263   Rust
    wgpu-examples.exe!alloc::sync::impl$27::drop<wgpu_core::device::resource::Device<wgpu_hal::vulkan::Api>>(alloc::sync::Arc<wgpu_core::device::resource::Device<wgpu_hal::vulkan::Api>> * self) Line 1899 Rust
    wgpu-examples.exe!core::ptr::drop_in_place<alloc::sync::Arc<wgpu_core::device::resource::Device<wgpu_hal::vulkan::Api>>>(alloc::sync::Arc<wgpu_core::device::resource::Device<wgpu_hal::vulkan::Api>> *) Line 497   Rust
    wgpu-examples.exe!core::mem::drop<alloc::sync::Arc<wgpu_core::device::resource::Device<wgpu_hal::vulkan::Api>>>(alloc::sync::Arc<wgpu_core::device::resource::Device<wgpu_hal::vulkan::Api>> _x) Line 987   Rust
    wgpu-examples.exe!wgpu_core::global::Global<wgpu_core::identity::IdentityManagerFactory>::device_drop<wgpu_core::identity::IdentityManagerFactory,wgpu_hal::vulkan::Api>(wgpu_core::id::Id<wgpu_core::device::resource::Device<wgpu_hal::empty::Api>> self) Line 2279   Rust
    wgpu-examples.exe!wgpu::backend::direct::impl$7::device_drop(wgpu::backend::direct::Context * self, wgpu_core::id::Id<wgpu_core::device::resource::Device<wgpu_hal::empty::Api>> * device, wgpu::backend::direct::Device * _device_data) Line 1458  Rust
    wgpu-examples.exe!wgpu::context::impl$5::device_drop<wgpu::backend::direct::Context>(wgpu::backend::direct::Context * self, wgpu::context::ObjectId * device, ref$<dyn$<core::any::Any,core::marker::Send,core::marker::Sync>>) Line 2369   Rust
    wgpu-examples.exe!wgpu::impl$27::drop(wgpu::Device * self) Line 2663    Rust
    wgpu-examples.exe!core::ptr::drop_in_place<wgpu::Device>(wgpu::Device *) Line 497   Rust
    wgpu-examples.exe!core::ptr::drop_in_place<wgpu_examples::hello_triangle::Test>(wgpu_examples::hello_triangle::Test *) Line 497 Rust
    wgpu-examples.exe!core::ptr::drop_in_place<enum2$<core::option::Option<wgpu_examples::hello_triangle::Test>>>(enum2$<core::option::Option<wgpu_examples::hello_triangle::Test>> *) Line 497 Rust
    wgpu-examples.exe!core::ptr::drop_in_place<core::cell::UnsafeCell<enum2$<core::option::Option<wgpu_examples::hello_triangle::Test>>>>(core::cell::UnsafeCell<enum2$<core::option::Option<wgpu_examples::hello_triangle::Test>>> *) Line 497 Rust
    wgpu-examples.exe!core::ptr::drop_in_place<core::cell::RefCell<enum2$<core::option::Option<wgpu_examples::hello_triangle::Test>>>>(core::cell::RefCell<enum2$<core::option::Option<wgpu_examples::hello_triangle::Test>>> *) Line 497   Rust
    wgpu-examples.exe!core::ptr::drop_in_place<enum2$<core::option::Option<core::cell::RefCell<enum2$<core::option::Option<wgpu_examples::hello_triangle::Test>>>>>>(enum2$<core::option::Option<core::cell::RefCell<enum2$<core::option::Option<wgpu_examples::hello_triangle::Test>>>>> *) Line 497   Rust
    [Inline Frame] wgpu-examples.exe!core::mem::drop(enum2$<core::option::Option<core::cell::RefCell<enum2$<core::option::Option<wgpu_examples::hello_triangle::Test>>>>> _x) Line 987  Rust
    wgpu-examples.exe!std::sys::common::thread_local::fast_local::destroy_value::closure$0<core::cell::RefCell<enum2$<core::option::Option<wgpu_examples::hello_triangle::Test>>>>(std::sys::common::thread_local::fast_local::destroy_value::closure_env$0<core::cell::RefCell<enum2$<core::option::Option<wgpu_examples::hello_triangle::Test>>>> *) Line 243 Rust
    wgpu-examples.exe!core::ops::function::FnOnce::call_once<std::sys::common::thread_local::fast_local::destroy_value::closure_env$0<core::cell::RefCell<enum2$<core::option::Option<wgpu_examples::hello_triangle::Test>>>>,tuple$<>>(std::sys::common::thread_local::fast_local::destroy_value::closure_env$0<core::cell::RefCell<enum2$<core::option::Option<wgpu_examples::hello_triangle::Test>>>>) Line 250  Rust
    wgpu-examples.exe!core::panic::unwind_safe::impl$23::call_once<tuple$<>,std::sys::common::thread_local::fast_local::destroy_value::closure_env$0<core::cell::RefCell<enum2$<core::option::Option<wgpu_examples::hello_triangle::Test>>>>>(core::panic::unwind_safe::AssertUnwindSafe<std::sys::common::thread_local::fast_local::destroy_value::closure_env$0<core::cell::RefCell<enum2$<core::option::Option<wgpu_examples::hello_triangle::Test>>>>> self) Line 272   Rust
    wgpu-examples.exe!std::panicking::try::do_call<core::panic::unwind_safe::AssertUnwindSafe<std::sys::common::thread_local::fast_local::destroy_value::closure_env$0<core::cell::RefCell<enum2$<core::option::Option<wgpu_examples::hello_triangle::Test>>>>>,tuple$<>>(unsigned char * data) Line 502    Rust
    [External Code] 
    wgpu-examples.exe!std::panicking::try<tuple$<>,core::panic::unwind_safe::AssertUnwindSafe<std::sys::common::thread_local::fast_local::destroy_value::closure_env$0<core::cell::RefCell<enum2$<core::option::Option<wgpu_examples::hello_triangle::Test>>>>>>(core::panic::unwind_safe::AssertUnwindSafe<std::sys::common::thread_local::fast_local::destroy_value::closure_env$0<core::cell::RefCell<enum2$<core::option::Option<wgpu_examples::hello_triangle::Test>>>>> f) Line 464   Rust
    [Inline Frame] wgpu-examples.exe!std::panic::catch_unwind(core::panic::unwind_safe::AssertUnwindSafe<std::sys::common::thread_local::fast_local::destroy_value::closure_env$0<core::cell::RefCell<enum2$<core::option::Option<wgpu_examples::hello_triangle::Test>>>>> f) Line 142  Rust
    wgpu-examples.exe!std::sys::common::thread_local::fast_local::destroy_value<core::cell::RefCell<enum2$<core::option::Option<wgpu_examples::hello_triangle::Test>>>>(unsigned char * ptr) Line 240   Rust
    wgpu-examples.exe!std::sys::windows::thread_local_dtor::run_keyless_dtors() Line 27 Rust
    wgpu-examples.exe!std::sys::windows::thread_local_key::on_tls_callback() Line 251   Rust
    [External Code] 

Main thread:

    [External Code] 
>   wgpu-examples.exe!winit::platform_impl::platform::event_loop::impl$3::wait_and_dispatch_message::get_msg_with_timeout(windows_sys::Windows::Win32::UI::WindowsAndMessaging::MSG * msg, enum2$<core::option::Option<core::time::Duration>>) Line 327 Rust
    wgpu-examples.exe!winit::platform_impl::platform::event_loop::impl$3::wait_and_dispatch_message::wait_for_msg(windows_sys::Windows::Win32::UI::WindowsAndMessaging::MSG * msg, enum2$<core::option::Option<core::time::Duration>>) Line 354 Rust
    wgpu-examples.exe!winit::platform_impl::platform::event_loop::EventLoop<tuple$<>>::wait_and_dispatch_message<tuple$<>>(enum2$<core::option::Option<core::time::Duration>> self) Line 387    Rust
    wgpu-examples.exe!winit::platform_impl::platform::event_loop::EventLoop<tuple$<>>::run_on_demand<tuple$<>,wgpu_examples::hello_triangle::run::async_fn$0::closure_env$1>(wgpu_examples::hello_triangle::run::async_fn$0::closure_env$1 self) Line 241   Rust
    wgpu-examples.exe!winit::platform_impl::platform::event_loop::EventLoop<tuple$<>>::run<tuple$<>,wgpu_examples::hello_triangle::run::async_fn$0::closure_env$1>(winit::platform_impl::platform::event_loop::EventLoop<tuple$<>> self, wgpu_examples::hello_triangle::run::async_fn$0::closure_env$1 event_handler) Line 218  Rust
    wgpu-examples.exe!winit::event_loop::EventLoop<tuple$<>>::run<tuple$<>,wgpu_examples::hello_triangle::run::async_fn$0::closure_env$1>(winit::event_loop::EventLoop<tuple$<>> self, wgpu_examples::hello_triangle::run::async_fn$0::closure_env$1 event_handler) Line 249    Rust
    wgpu-examples.exe!wgpu_examples::hello_triangle::run::async_fn$0(core::pin::Pin<ref_mut$<enum2$<wgpu_examples::hello_triangle::run::async_fn_env$0>>>, core::task::wake::Context *) Line 109    Rust
    wgpu-examples.exe!pollster::block_on<enum2$<wgpu_examples::hello_triangle::run::async_fn_env$0>>(enum2$<wgpu_examples::hello_triangle::run::async_fn_env$0> fut) Line 128   Rust
    wgpu-examples.exe!wgpu_examples::hello_triangle::main() Line 195    Rust
    wgpu-examples.exe!wgpu_examples::main() Line 229    Rust
    wgpu-examples.exe!core::ops::function::FnOnce::call_once<void (*)(),tuple$<>>(void(*)()) Line 250   Rust
    [Inline Frame] wgpu-examples.exe!core::hint::black_box(tuple$<>) Line 135   Rust
    wgpu-examples.exe!std::sys_common::backtrace::__rust_begin_short_backtrace<void (*)(),tuple$<>>(void(*)() f) Line 141   Rust
    wgpu-examples.exe!std::rt::lang_start::closure$0<tuple$<>>(std::rt::lang_start::closure_env$0<tuple$<>> *) Line 166 Rust
    [Inline Frame] wgpu-examples.exe!std::rt::lang_start_internal::closure$2() Line 148 Rust
    [Inline Frame] wgpu-examples.exe!std::panicking::try::do_call() Line 500    Rust
    [Inline Frame] wgpu-examples.exe!std::panicking::try() Line 464 Rust
    [Inline Frame] wgpu-examples.exe!std::panic::catch_unwind() Line 142    Rust
    wgpu-examples.exe!std::rt::lang_start_internal() Line 148   Rust
    wgpu-examples.exe!std::rt::lang_start<tuple$<>>(void(*)() main, __int64 argc, unsigned char * * argv, unsigned char sigpipe) Line 165   Rust
    [External Code] 
AdrianEddy commented 8 months ago

If thread_local is not used, the device can be dropped in another thread without issues, ie. this works fine:

std::thread::spawn(move || {
    let test = Test { queue2, device2 };
    std::thread::sleep(std::time::Duration::from_secs(5));
    println!("thread ends {:?}", test.device2.limits());
});

Pushing it to another thread on thread_local drop also seems to work fine:

impl Drop for Test {
    fn drop(&mut self) {
        let queue2 = self.queue2.take().unwrap();
        let device2 = self.device2.take().unwrap();
        std::thread::spawn(move || {
            drop(queue2);
            drop(device2);
        });
    }
}

Interestingly enough, pushing only queue2 to new thread on drop, and letting device2 drop on the same thread also makes it work somehow

cwfitzgerald commented 4 months ago

I'm seeing this in the multiple_devices test on windows sometimes. Trying to drop the device hangs with basically identical callstack inside nvoglv64

image

teoxoy commented 2 months ago

I can't pinpoint this to something that we are doing wrong.