calebwin / emu

The write-once-run-anywhere GPGPU library for Rust
https://calebwin.github.io/emu
MIT License
1.59k stars 53 forks source link

`.finish()` stage of the shader compilation segfaults on NVIDIA Vulkan driver #59

Open mikidep opened 2 years ago

mikidep commented 2 years ago

I am running on Ubuntu 22.04 with emu_core 0.1.1, info()?.name is "NVIDIA GeForce RTX 3050 Ti Laptop GPU", the driver is version 515 of the official NVIDIA Linux driver, installed through APT.

The problem seems to be related to the presence of a storage buffer, the one called prec_mat: if I remove it in both in the shader and in the SpirvBuilder, the issue does not manifest. I am using rust-gpu to write my shader. Note that if my integrated AMD GPU is selected, the code runs fine.

Below is a comprehensive stack trace:

___lldb_unnamed_symbol462 (@___lldb_unnamed_symbol462:301)
___lldb_unnamed_symbol11106 (@___lldb_unnamed_symbol11106:2200)
___lldb_unnamed_symbol11107 (@___lldb_unnamed_symbol11107:19)
___lldb_unnamed_symbol16036 (@___lldb_unnamed_symbol16036:120)
___lldb_unnamed_symbol11528 (@___lldb_unnamed_symbol11528:60)
___lldb_unnamed_symbol11308 (@___lldb_unnamed_symbol11308:258)
_nv002nvvm (@_nv002nvvm:11)
___lldb_unnamed_symbol58166 (@___lldb_unnamed_symbol58166:66)
___lldb_unnamed_symbol58168 (@___lldb_unnamed_symbol58168:583)
___lldb_unnamed_symbol58169 (@___lldb_unnamed_symbol58169:146)
___lldb_unnamed_symbol58181 (@___lldb_unnamed_symbol58181:164)
___lldb_unnamed_symbol58182 (@___lldb_unnamed_symbol58182:8)
___lldb_unnamed_symbol58172 (@___lldb_unnamed_symbol58172:148)
___lldb_unnamed_symbol58204 (@___lldb_unnamed_symbol58204:91)
___lldb_unnamed_symbol57964 (@___lldb_unnamed_symbol57964:70)
___lldb_unnamed_symbol57965 (@___lldb_unnamed_symbol57965:28)
ash::vk::features::DeviceFnV1_0::create_compute_pipelines (/home/mikidep/.cargo/registry/src/github.com-1ecc6299db9ec823/ash-0.31.0/src/vk/features.rs:5094)
gfx_backend_vulkan::device::<impl gfx_hal::device::Device<gfx_backend_vulkan::Backend> for gfx_backend_vulkan::Device>::create_compute_pipeline (/home/mikidep/.cargo/registry/src/github.com-1ecc6299db9ec823/gfx-backend-vulkan-0.5.11/src/device.rs:1044)
wgpu_core::device::<impl wgpu_core::hub::Global<G>>::device_create_compute_pipeline (/home/mikidep/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-core-0.5.6/src/device/mod.rs:1932)
wgpu_device_create_compute_pipeline (/home/mikidep/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-native-0.5.1/src/device.rs:347)
wgpu::Device::create_compute_pipeline (/home/mikidep/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.5.2/src/lib.rs:906)
emu_core::device::Device::compile (/home/mikidep/.cargo/registry/src/github.com-1ecc6299db9ec823/emu_core-0.1.1/src/device.rs:611)
emu_core::compile::SpirvOrFinished<P,C>::finish (/home/mikidep/.cargo/registry/src/github.com-1ecc6299db9ec823/emu_core-0.1.1/src/compile.rs:305)
scene_emu::main (/home/mikidep/Documenti/Codice/scene-emu/src/main.rs:104)
core::ops::function::FnOnce::call_once (@core::ops::function::FnOnce::call_once:6)
std::sys_common::backtrace::__rust_begin_short_backtrace (@std::sys_common::backtrace::__rust_begin_short_backtrace:6)
std::rt::lang_start::{{closure}} (@std::rt::lang_start::{{closure}}:7)
core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once (@std::rt::lang_start_internal:184)
std::panicking::try::do_call (@std::rt::lang_start_internal:183)
std::panicking::try (@std::rt::lang_start_internal:183)
std::panic::catch_unwind (@std::rt::lang_start_internal:183)
std::rt::lang_start_internal::{{closure}} (@std::rt::lang_start_internal:183)
std::panicking::try::do_call (@std::rt::lang_start_internal:183)
std::panicking::try (@std::rt::lang_start_internal:183)
std::panic::catch_unwind (@std::rt::lang_start_internal:183)
std::rt::lang_start_internal (@std::rt::lang_start_internal:183)
std::rt::lang_start (@std::rt::lang_start:13)
main (@main:10)
__libc_start_call_main (@__libc_start_call_main:29)
__libc_start_main_impl (@__libc_start_main@@GLIBC_2.34:43)
_start (@_start:15)

I am also attaching relevant Rust code and disassembled shader SPIR-V code:

Below are extracts from the above source files, in which the incriminated parameter is declared:

(in main.rs)

    let spirv = SpirvBuilder::new()
        .set_entry_point_name("main")
        .add_param_mut::<[u32]>() // alpha
        .add_param_mut::<[StackSym]>() // stack
        .add_param_mut::<[usize]>() // gives_stack
        .add_param_mut::<[u32]>() // prec_mat
        .add_param::<usize>() // length
        .add_param::<usize>() // chunk_size
        .add_param::<u32>() // term_thresh
        .set_code_with_u8(std::io::Cursor::new(code))?
        .build();
    let c = compile::<Spirv<_>, SpirvCompile, _, GlobalCache>(spirv)?.finish()?;

Segfault happens on the last line.

(in lib.rs)

#[spirv(compute(threads(4)))]
pub fn main(
    #[spirv(global_invocation_id)] id: UVec3,
    #[spirv(storage_buffer, descriptor_set = 0, binding = 0)] alpha: &mut [u32],
    #[spirv(storage_buffer, descriptor_set = 0, binding = 1)] stack: &mut [StackSym],
    #[spirv(storage_buffer, descriptor_set = 0, binding = 2)] gives_stack: &mut [usize],
    #[spirv(storage_buffer, descriptor_set = 0, binding = 3)] prec_mat: &mut [u32],
    #[spirv(storage_buffer, descriptor_set = 0, binding = 4)] length: &mut usize,
    #[spirv(storage_buffer, descriptor_set = 0, binding = 5)] chunk_size: &mut usize,
    #[spirv(storage_buffer, descriptor_set = 0, binding = 6)] term_thresh: &mut u32,
) { // ...

I understand that the issue should be related to NVIDIA's Vulkan implementation, but maybe you know something about this kind of issue. Thank you in advance.

calebwin commented 1 year ago

Interesting, thanks for reporting the issue. Not sure if it can be fixed within Emu though. May be an issue in WGPU or the Vulkan implementation.