gfx-rs / wgpu

A cross-platform, safe, pure-Rust graphics API.
https://wgpu.rs
Apache License 2.0
12.17k stars 894 forks source link

Compute pass submitted from a separate thread causes GPU hang or device lost #4877

Open lisyarus opened 9 months ago

lisyarus commented 9 months ago

What am I doing

I am generating texture mipmaps using a simple compute shader. The texture loading process is asynchronous and happens in a separate thread, which loads the texture from file, creates the texture object, writes data to it (queue.write_texture), and submits compute passes that generate mipmap levels.

The relevant code of the project is here.

Observed behavior

Usually, everything works fine, but occasionally the GPU hangs, after which either I have to kill the process manually, or the process crashes once I switch to a different window, with the following error (with RUST_BACKTRACE=full):

thread '<unnamed>' panicked at src/lib.rs:546:5:
Error in wgpuQueueSubmit: Validation Error

Caused by:
    Parent device is lost

stack backtrace:
   0:     0x7f264e0eb3ec - std::backtrace_rs::backtrace::libunwind::trace::ha69d38c49f1bf263
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x7f264e0eb3ec - std::backtrace_rs::backtrace::trace_unsynchronized::h93125d0b85fd543c
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x7f264e0eb3ec - std::sys_common::backtrace::_print_fmt::h8d65f438e8343444
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/sys_common/backtrace.rs:67:5
   3:     0x7f264e0eb3ec - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h41751d2af6c8033a
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x7f264e111d5c - core::fmt::rt::Argument::fmt::h5db2f552d8a28f63
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/core/src/fmt/rt.rs:138:9
   5:     0x7f264e111d5c - core::fmt::write::h99465148a27e4883
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/core/src/fmt/mod.rs:1114:21
   6:     0x7f264e0e8dbe - std::io::Write::write_fmt::hee8dfd57bd179ab2
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/io/mod.rs:1763:15
   7:     0x7f264e0eb1d4 - std::sys_common::backtrace::_print::h019a3cee3e814da4
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/sys_common/backtrace.rs:47:5
   8:     0x7f264e0eb1d4 - std::sys_common::backtrace::print::h55694121c2ddf918
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/sys_common/backtrace.rs:34:9
   9:     0x7f264e0ec5d3 - std::panicking::default_hook::{{closure}}::h29cbe3da3891b0b0
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/panicking.rs:272:22
  10:     0x7f264e0ec2f4 - std::panicking::default_hook::h881e76b2b8c74280
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/panicking.rs:292:9
  11:     0x7f264e0ecb55 - std::panicking::rust_panic_with_hook::hcc36e25b6e33969c
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/panicking.rs:731:13
  12:     0x7f264e0eca51 - std::panicking::begin_panic_handler::{{closure}}::ha415efb0f69f41f9
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/panicking.rs:609:13
  13:     0x7f264e0eb916 - std::sys_common::backtrace::__rust_end_short_backtrace::h395fe90f99451e4e
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/sys_common/backtrace.rs:170:18
  14:     0x7f264e0ec7a2 - rust_begin_unwind
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/panicking.rs:597:5
  15:     0x7f264dcb1e05 - core::panicking::panic_fmt::h452a83e54ecd764e
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/core/src/panicking.rs:72:14
  16:     0x7f264de57f3f - wgpu_native::handle_error_fatal::h3d188bffb22328b3
  17:     0x7f264de6d683 - wgpuQueueSubmit
  18:     0x559bc5a50458 - _ZN6Engine4Impl12renderShadowERKSt6vectorISt10shared_ptrI12RenderObjectESaIS4_EE
  19:     0x559bc5a54cab - _ZN6Engine4Impl6renderEP15WGPUTextureImplRKSt6vectorISt10shared_ptrI12RenderObjectESaIS6_EERK6CameraRK3BoxRKNS_13LightSettingsE
  20:     0x559bc5a4bf4e - main
  21:     0x7f264d6f19d2 - <unknown>
  22:     0x7f264d6f1a85 - __libc_start_main
  23:     0x559bc5a4c701 - _start
  24:                0x0 - <unknown>
fatal runtime error: failed to initiate panic, error 5

(I'm guessing that Parent device is lost indicates that most of this stacktrace is pretty much irrelevant?)

Expected behavior

The compute pass should finish correctly and always generate the mipmaps without crashing or hanging.

Tech stack

I am using a trunk build of wgpu-native, in order to access better synchronization and filtering floating-point textures, - i.e. features, that were merged not long ago, as I understand it. The project is in C++ (using gcc 13.2.1 20230826 compiler), uses SDL2 for window creation, and a Vulkan backend for wgpu.

Wgpu device is requested with float32-filterable feature enabled.

Additional notes

I observed frequent hangs earlier, and noticed that I never end the compute pass. After adding the appropriate wgpuComputePassEncoderEnd call, the hangs became much less frequent, and occur once every 5-10 runs of the program.

I also observed that if the compute pass does literally nothing (is created, then immediately ended, turned into a command buffer, and submitted to the queue), the hangs still occur, though even less frequently.

If the whole compute pass (from creating an encoder to submitting the command buffer to the queue) is moved to the main rendering thread, the hangs disappear, and everything works as expected.

Update: curiously, if I move just the wgpuQueueSubmit to the main rendering thread (and the texture creation & compute pass encoding is still in the separate thread), the hangs go away as well.

System info

Operating system: 6.1.19-gentoo

CPI info:

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  16
  On-line CPU(s) list:   0-15
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
    CPU family:          6
    Model:               158
    Thread(s) per core:  2
    Core(s) per socket:  8
    Socket(s):           1
    Stepping:            13
    CPU(s) scaling MHz:  52%
    CPU max MHz:         5000.0000
    CPU min MHz:         800.0000
    BogoMIPS:            7200.00
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology non
                         stop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefet
                         ch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsav
                         eopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities
Virtualization features: 
  Virtualization:        VT-x
Caches (sum of all):     
  L1d:                   256 KiB (8 instances)
  L1i:                   256 KiB (8 instances)
  L2:                    2 MiB (8 instances)
  L3:                    16 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-15
Vulnerabilities:         
  Itlb multihit:         KVM: Mitigation: VMX disabled
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable
  Retbleed:              Mitigation; Enhanced IBRS
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
  Srbds:                 Vulnerable: No microcode
  Tsx async abort:       Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable

GPU: NVIDIA GeForce GTX 1060 6GB

Wgpu limits, as returned by wgpuAdapterGetLimits:

    maxTextureDimension1D: 32768
    maxTextureDimension2D: 32768
    maxTextureDimension3D: 16384
    maxTextureArrayLayers: 2048
    maxBindGroups: 8
    maxBindGroupsPlusVertexBuffers: 32765
    maxBindingsPerBindGroup: 1000
    maxDynamicUniformBuffersPerPipelineLayout: 15
    maxDynamicStorageBuffersPerPipelineLayout: 16
    maxSampledTexturesPerShaderStage: 1048576
    maxSamplersPerShaderStage: 1048576
    maxStorageBuffersPerShaderStage: 1048576
    maxStorageTexturesPerShaderStage: 1048576
    maxUniformBuffersPerShaderStage: 15
    maxUniformBufferBindingSize: 65536
    maxStorageBufferBindingSize: 2147483648
    minUniformBufferOffsetAlignment: 256
    minStorageBufferOffsetAlignment: 32
    maxVertexBuffers: 16
    maxBufferSize: 18446744073709551615
    maxVertexAttributes: 32
    maxVertexBufferArrayStride: 2048
    maxInterStageShaderComponents: 128
    maxInterStageShaderVariables: 32765
    maxColorAttachments: 4002613514
    maxColorAttachmentBytesPerSample: 32685
    maxComputeWorkgroupStorageSize: 49152
    maxComputeInvocationsPerWorkgroup: 1536
    maxComputeWorkgroupSizeX: 1536
    maxComputeWorkgroupSizeY: 1024
    maxComputeWorkgroupSizeZ: 64
    maxComputeWorkgroupsPerDimension: 65535
lisyarus commented 9 months ago

I'd love to help with debugging this issue, but making a smallest reproducible examples feels shaky, since the less stuff is happening on the GPU, the less frequently does this issue reproduce.

Is there anything else I can do to help?

cwfitzgerald commented 8 months ago

If you don't already, install the vulkan sdk, make sure you've hooked up a logger to wgpu, and run with the validation layers enabled - this might catch the error if it's particularly egregious.

Otherwise it might be some ub in the usage of wgpu native.

If it's not that it might necessitate the use of nvidia aftermath.