calebwin / emu

The write-once-run-anywhere GPGPU library for Rust
https://calebwin.github.io/emu
MIT License
1.59k stars 53 forks source link

Example panics at runtime (`COPY_DST` flag) #60

Open wbrickner opened 2 years ago

wbrickner commented 2 years ago

Hello, running the compute example:

View full code ```rust use emu_core::prelude::*; use emu_glsl::*; use zerocopy::*; #[repr(C)] #[derive(AsBytes, FromBytes, Copy, Clone, Default, Debug, GlslStruct)] struct Shape { x: u32, y: u32, w: i32, h: i32, r: [i32; 2], } fn main() -> Result<(), Box> { // ensure that a device pool has been initialized // this should be called before every time when you assume you have devices to use // that goes for both library users and application users futures::executor::block_on(assert_device_pool_initialized()); println!("{:?}", take()?.lock().unwrap().info.as_ref().unwrap()); // create some data on GPU // even mutate it once loaded to GPU let mut shapes: DeviceBox<[Shape]> = vec![Default::default(); 1024].as_device_boxed_mut()?; let mut x: DeviceBox<[i32]> = vec![0; 1024].as_device_boxed_mut()?; shapes.set(vec![ Shape { x: 0, y: 0, w: 100, h: 100, r: [2, 9] }; 1024 ])?; // compile GslKernel to SPIR-V // then, we can either inspect the SPIR-V or finish the compilation by generating a DeviceFnMut // then, run the DeviceFnMut let c = compile::, GlobalCache>( GlslKernel::new() .spawn(64) .share("float stuff[64]") .param_mut::<[Shape], _>("Shape[] shapes") .param_mut::<[i32], _>("int[] x") .param::("int scalar") .with_struct::() .with_const("int c", "7") .with_helper_code( r#" Shape flip(Shape s) { s.x = s.x + s.w; s.y = s.y + s.h; s.w *= -1; s.h *= -1; s.r = ivec2(5, 3); return s; } "#, ) .with_kernel_code( "shapes[gl_GlobalInvocationID.x] = flip(shapes[gl_GlobalInvocationID.x]); x[gl_GlobalInvocationID.x] = scalar + c + int(gl_WorkGroupID.x);", ), )?.finish()?; unsafe { spawn(16).launch(call!(c, &mut shapes, &mut x, &DeviceBox::new(10)?))?; } // download from GPU and print out println!("{:?}", futures::executor::block_on(shapes.get())?); println!("{:?}", futures::executor::block_on(x.get())?); Ok(()) } ```
$ cargo run

yields

    Finished dev [unoptimized + debuginfo] target(s) in 0.44s
     Running `target/debug/emu_test`
Limits {
    max_bind_groups: 4,
    max_dynamic_uniform_buffers_per_pipeline_layout: 8,
    max_dynamic_storage_buffers_per_pipeline_layout: 4,
    max_sampled_textures_per_shader_stage: 16,
    max_samplers_per_shader_stage: 16,
    max_storage_buffers_per_shader_stage: 4,
    max_storage_textures_per_shader_stage: 4,
    max_uniform_buffers_per_shader_stage: 12,
    max_uniform_buffer_binding_size: 16384,
    max_push_constant_size: 0,
}
{ name: "Intel(R) Iris(TM) Plus Graphics 655", vendor_id: 0, device_id: 0, device_type: IntegratedGpu }
wgpu error: Validation Error

Caused by:
    In CommandEncoder::copy_buffer_to_buffer
    Copy error
    destination buffer/texture is missing the `COPY_DST` usage flag
      note: destination = `<Buffer-(4, 1, Metal)>`

thread 'main' panicked at 'Handling wgpu errors as fatal by default', /Users/wbrickner/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.7.0/src/backend/direct.rs:1896:5
stack backtrace:
   0: std::panicking::begin_panic
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:616:12
   1: wgpu::backend::direct::default_error_handler
             at /Users/wbrickner/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.7.0/src/backend/direct.rs:1896:5
   2: core::ops::function::Fn::call
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/ops/function.rs:70:5
   3: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/boxed.rs:1875:9
   4: wgpu::backend::direct::ErrorSinkRaw::handle_error
             at /Users/wbrickner/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.7.0/src/backend/direct.rs:1883:9
   5: wgpu::backend::direct::Context::handle_error
             at /Users/wbrickner/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.7.0/src/backend/direct.rs:109:9
   6: wgpu::backend::direct::Context::handle_error_nolabel
             at /Users/wbrickner/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.7.0/src/backend/direct.rs:121:9
   7: <wgpu::backend::direct::Context as wgpu::Context>::command_encoder_copy_buffer_to_buffer
             at /Users/wbrickner/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.7.0/src/backend/direct.rs:1542:13
   8: wgpu::CommandEncoder::copy_buffer_to_buffer
             at /Users/wbrickner/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.7.0/src/lib.rs:1954:9
   9: emu_core::device::Device::get::{{closure}}
             at /Users/wbrickner/.cargo/git/checkouts/emu-7973979264d9dc07/9fe3db3/emu_core/src/device.rs:391:9
  10: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/mod.rs:91:19
  11: emu_core::boxed::<impl emu_core::device::DeviceBox<[T]>>::get::{{closure}}
             at /Users/wbrickner/.cargo/git/checkouts/emu-7973979264d9dc07/9fe3db3/emu_core/src/boxed.rs:298:23
  12: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/mod.rs:91:19
  13: futures_executor::local_pool::block_on::{{closure}}
             at /Users/wbrickner/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-executor-0.3.21/src/local_pool.rs:315:23
  14: futures_executor::local_pool::run_executor::{{closure}}
             at /Users/wbrickner/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-executor-0.3.21/src/local_pool.rs:90:37
  15: std::thread::local::LocalKey<T>::try_with
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/thread/local.rs:442:16
  16: std::thread::local::LocalKey<T>::with
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/thread/local.rs:418:9
  17: futures_executor::local_pool::run_executor
             at /Users/wbrickner/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-executor-0.3.21/src/local_pool.rs:86:5
  18: futures_executor::local_pool::block_on
             at /Users/wbrickner/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-executor-0.3.21/src/local_pool.rs:315:5
  19: emu_test::main
             at ./src/main.rs:71:22
  20: core::ops::function::FnOnce::call_once
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

my understanding is that buffers must have their usage declared correctly (with some amount of detail) at construction time through wgpu.

wbrickner commented 2 years ago

Weird! I see that you are setting the copy destination flag correctly. Following the execution from .as_device_boxed_mut() above:

https://github.com/calebwin/emu/blob/9fe3db33b1e19063e227e753c5c48677cd37439d/emu_core/src/device.rs#L260-L302

Any idea what the issue is?

wbrickner commented 2 years ago

Update: removing the

shapes.set(vec![
  Shape {
    x: 0,
    y: 0,
    w: 100,
    h: 100,
    r: [2, 9]
  };
  1024
])?;

eliminates the error, and things always complete successfully. Alternatively, leaving the .set call and removing the final .get call also prevents the panic.

Are usage flags being corrupted by .set?

wbrickner commented 2 years ago

Ok, here's the problem:

https://github.com/calebwin/emu/commit/80a28e78e651c7c02410238dcb654000ad9772ba#diff-c42b8503ace5f383ffbc7c9f91463d31dc54fc36b2d3974c49a3e2fc434024e9R331-R337

when setting, the existing staging buffer is discarded and replaced with a new staging buffer, which is created with the usage flag COPY_SRC.

Adding to the end of the function the construction of a new staging buffer with MAP_READ | COPY_DST (as it usually has, from the DeviceBox construction pathway) solves the issue, when it comes time to read the correct usage is there.

I would like to not do it this way, it seems quite wasteful to prepare a new allocation and immediately discard it etc. Mixing COPY_SRC and COPY_DST and MAP_READ permissions is not allowed evidently. The usage flag paradigm is the problem here, unsure how fundamentally important it really is. Perhaps two staging buffers could be lazily prepared, one for copies to/from the GPU device.

calebwin commented 1 year ago

@wbrickner Thanks for doing this investigation in this issue. I see how the staging buffer creation is problematic...

If anyone has a PR that fixes this, I can review/edit/merge.