gfx-rs / wgpu

A cross-platform, safe, pure-Rust graphics API.
https://wgpu.rs
Apache License 2.0
12.78k stars 938 forks source link

Segfault when creating `Instance` in multiple threads on Linux under Nvidia #5930

Open Xaeroxe opened 4 months ago

Xaeroxe commented 4 months ago

Description ThreadSanitizer shows a data race between threads when these threads are initializing a wgpu::Instance, this only happens with an Nvidia GPU under linux. Drivers are from ubuntu package nvidia-driver-535-server on Ubuntu 20.04 with x86_64 architecture. This frequently leads to SIGSEGV amongst other problems. I recommend locking access to the body of this function behind some kind of global Mutex. Maybe this should be done in wgpu_core? Doing this made the problem go away in my own code base. Creating wgpu::Device and wgpu::Adapter doesn't seem to cause this problem, only creating wgpu::Instance does.

Repro steps Call wgpu::Instance::new() on multiple threads simultaneously on a system matching the above description.

Expected vs observed behavior I expect this function to be thread safe. On my system it is not thread safe.

Extra materials Command executed:

$ RUSTFLAGS="-Zsanitizer=thread" cargo +nightly test -Zbuild-std --target x86_64-unknown-linux-gnu

Snippet from stderr:

WARNING: ThreadSanitizer: data race (pid=1190610)
  Write of size 8 at 0x721000006200 by thread T4:
    #0 free ??:? (...) (BuildId: 868c54771bedf5ad)
    #1 _dl_find_dso_for_object ??:? (ld-linux-x86-64.so.2+0x17b9b) (BuildId: db0420f708b806cf03260aadb916c330049580b7)
    #2 <null> <null> (libvulkan.so.1+0x2b8b7) (BuildId: 492c3c03dd3072e417126b40987c20ba8bdc235c)
    #3 ash::prelude::read_into_uninitialized_vector ??:? (...-4d7d2a9c196fecca+0x4b4290c) (BuildId: 868c54771bedf5ad)
    #4 ash::entry::Entry::enumerate_instance_extension_properties wgpu_hal.d2ca7a18933cd7c8-cgu.10:? (...-4d7d2a9c196fecca+0x4b3da46) (BuildId: 868c54771bedf5ad)
    #5 wgpu_hal::vulkan::instance::<impl wgpu_hal::vulkan::Instance>::enumerate_instance_extension_properties wgpu_hal.d2ca7a18933cd7c8-cgu.01:? (...-4d7d2a9c196fecca+0x496fc1e) (BuildId: 868c54771bedf5ad)
    #6 wgpu_hal::vulkan::instance::<impl wgpu_hal::vulkan::Instance>::desired_extensions ??:? (...-4d7d2a9c196fecca+0x496fcfb) (BuildId: 868c54771bedf5ad)
    #7 wgpu_hal::vulkan::instance::<impl wgpu_hal::Instance for wgpu_hal::vulkan::Instance>::init ??:? (...-4d7d2a9c196fecca+0x49729c2) (BuildId: 868c54771bedf5ad)
    #8 wgpu_core::instance::Instance::new::init wgpu_core.f85910e2654faa5f-cgu.08:? (...-4d7d2a9c196fecca+0x47ba5ef) (BuildId: 868c54771bedf5ad)
    #9 wgpu_core::instance::Instance::new ??:? (...-4d7d2a9c196fecca+0x47b9795) (BuildId: 868c54771bedf5ad)
    #10 wgpu_core::global::Global::new ??:? (...-4d7d2a9c196fecca+0x47b4aa9) (BuildId: 868c54771bedf5ad)
    #11 <wgpu::backend::wgpu_core::ContextWgpuCore as wgpu::context::Context>::init ??:? (...-4d7d2a9c196fecca+0x43a5e11) (BuildId: 868c54771bedf5ad)
    #12 wgpu::Instance::new ??:? (...-4d7d2a9c196fecca+0x436d2f8) (BuildId: 868c54771bedf5ad)

Platform Ubuntu 20.04 Nvidia GPU Driver package nvidia-driver-535-server

Wumpf commented 4 months ago

I marked this as a driver bug since I believe enumerate_instance_extension_properties should be threadsafe. That said, this is a bug that wgpu-hal (!) should likely work around by protecting the call with a global mutex just as you suggest 👍