Typesafe initialization with BindGroup0::from_data()

JMLX42 commented 1 year ago

Hi there!

Great project!

As someone who struggled a lot to have proper shader <=> CPU bindings, I really love the idea to leverage Rust code generation + strong typing to make this easier.

To make it even easier, would it make sense to implement a BindGroup0::from_data() method relying on encase to create the bind group directly from the uniform data:

             pub fn from_data(device: &wgpu::Device, #(#fields),*) -> Self {
                use wgpu::util::DeviceExt;
                use encase::ShaderType;

                let mut buffer = encase::DynamicUniformBuffer::new(Vec::new());
                #(
                    buffer.write(#field_names).unwrap();
                    buffer.set_offset(device.limits().min_uniform_buffer_offset_alignment as u64);
                )*
                let bytes = buffer.into_inner();
                let uniform_bufer = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
                    label: None,
                    contents: &bytes,
                    usage: wgpu::BufferUsages::UNIFORM | wgpu::BufferUsages::COPY_DST,
                });

                Self::from_bindings(
                    device,
                    #bind_group_layout_name {
                        #(#bindings),*
                    },
                )
            }

Example of output:

        pub fn from_data(
            device: &wgpu::Device,
            material: &super::Material,
            alpha: &super::Alpha,
        ) -> Self {
            use wgpu::util::DeviceExt;
            use encase::ShaderType;
            let mut buffer = encase::DynamicUniformBuffer::new(Vec::new());
            buffer.write(material).unwrap();
            buffer
                .set_offset(device.limits().min_uniform_buffer_offset_alignment as u64);
            buffer.write(alpha).unwrap();
            buffer
                .set_offset(device.limits().min_uniform_buffer_offset_alignment as u64);
            let bytes = buffer.into_inner();
            let uniform_bufer = device
                .create_buffer_init(
                    &wgpu::util::BufferInitDescriptor {
                        label: None,
                        contents: &bytes,
                        usage: wgpu::BufferUsages::UNIFORM | wgpu::BufferUsages::COPY_DST,
                    },
                );
            Self::from_bindings(
                device,
                BindGroupLayout0 {
                    material: wgpu::BufferBinding {
                        buffer: &uniform_bufer,
                        offset: 0,
                        size: Some(super::Material::min_size()),
                    },
                    alpha: wgpu::BufferBinding {
                        buffer: &uniform_bufer,
                        offset: (1u32
                            * device.limits().min_uniform_buffer_offset_alignment)
                            as u64,
                        size: Some(super::Alpha::min_size()),
                    },
                },
            )
        }

Usage example:

let material = shader::Material {
  color: nalgebra::Vector3::new(1.0, 1.0, 0.0),
};
let alpha = shader::Alpha { value: 1.0 };
let bind_group0 = shader::bind_groups::BindGroup0::from_data(&device, &material, &alpha);

If that makes sense I'll open a PR.

ScanMountGoat commented 1 year ago

I used to have something similar in one of my WGPU projects. You can simplify that code by taking the struct instead of the fields and making the function generic. This assumes you're using encase instead of bytemuck for uniform and storage buffers.

// Extension trait for helper methods.
impl DeviceBufferExt for wgpu::Device {
    fn create_buffer_from_data<T: ShaderType + WriteInto + ShaderSize>(
        &self,
        label: &str,
        data: &[T],
        usage: wgpu::BufferUsages,
    ) -> wgpu::Buffer {
        // Storage buffers also satisfy uniform buffer alignment requirements.
        let mut buffer = StorageBuffer::new(Vec::new());
        buffer.write(&data).unwrap();

        self.create_buffer_init(&wgpu::util::BufferInitDescriptor {
            label: Some(label),
            contents: &buffer.into_inner(),
            usage,
        })
    }
}

let camera_buffer = device.create_buffer_from_data(
    "Camera Buffer",
    &[crate::shader::model::CameraTransforms {
        model_view_matrix: glam::Mat4::IDENTITY,
        mvp_matrix: glam::Mat4::IDENTITY,
        mvp_inv_matrix: glam::Mat4::IDENTITY,
        camera_pos: glam::vec4(0.0, 0.0, -1.0, 1.0),
        screen_dimensions: glam::vec4(1.0, 1.0, 1.0, 1.0),
    }],
    wgpu::BufferUsages::UNIFORM | wgpu::BufferUsages::COPY_DST,
);

JMLX42 commented 1 year ago

@ScanMountGoat nice!

But how does it work if there is no buffer.set_offset(device.limits().min_uniform_buffer_offset_alignment as u64);?

I had to use it because otherwise I get the following error:

thread 'main' panicked at 'wgpu error: Validation Error

Caused by:
    In Device::create_bind_group
    buffer offset 16 does not respect device's requested `min_uniform_buffer_offset_alignment` limit 256

What happens if you call create_buffer_from_data() with more that one item in data?

And more important: how do you make sure it fits the bind group layout? AFAIK you'll have to check that manually. Whereas my solution makes sure there can be no mistake. Maybe we could find the best of both world using a builder pattern?

ScanMountGoat commented 1 year ago

To make it even easier, would it make sense to implement a BindGroup0::from_data() method relying on encase to create the bind group directly from the uniform data:

This would be convenient, but it might encourage bad practices in some cases. The main issue is that the types like BindGroup0 own the bind group and not the underlying data itself. Allocating new buffers each time you want to update the bind group is going to be less efficient than storing the buffer somewhere and updating it as needed with queue.write_buffer(...).

ScanMountGoat commented 1 year ago

But how does it work if there is no buffer.set_offset(device.limits().min_uniform_buffer_offset_alignment as u64);?

This is the alignment for offsets into the buffer. In my application I use a storage buffer for transforming the vertex positions instead of a vertex shader. Each object has it's own offset into the shared buffer that must be aligned to 256. This is not be confused with the alignment requirements for fields, which are much less strict. https://wgpu.rs/doc/wgpu/struct.Limits.html#structfield.min_uniform_buffer_offset_alignment

When using a DynamicUniformBuffer or DynamicStorageBuffer, the write methods return the aligned offset used for the start of the write (usually 0, 256, 512, etc if the writes are small). This should be automatically enforced by encase. Can you post some more details on the code you're using that isn't working as expected?

let offset0 = dynamic_storage_buffer.write(&vertices0).unwrap();
let offset1 = dynamic_storage_buffer.write(&vertices0).unwrap();
...

And more important: how do you make sure it fits the bind group layout? AFAIK you'll have to check that manually. Whereas my solution makes sure there can be no mistake. Maybe we could find the best of both world using a builder pattern?

The struct is the one generated by wgsl_to_wgpu. If you derive encase, you can use the struct instead of the fields. I would like for this to enforce using the right type for the right bind group like you mentioned, but we would need to think of a way to do it that still allows updates to the uniform/storage buffer without allocating a new buffer every time.

JMLX42 commented 1 year ago

The struct is the one generated by wgsl_to_wgpu. If you derive encase, you can use the struct instead of the fields.

What if you have more than 1 struct? How is it guaranteed they are written in the right order in the uniform buffer?

I would like for this to enforce using the right type for the right bind group like you mentioned, but we would need to think of a way to do it that still allows updates to the uniform/storage buffer without allocating a new buffer every time.

Yes my solution is not good enough.

pub fn from_data(
  device: &wgpu::Device,
  buffer: Option<DynamicUniformBuffer>,
  material: Option<&super::Material>,
  alpha: Option<&super::Alpha>,
) {
  let buffer = buffer.unwrap_or(DynamicUniformBuffer::new(Vec::new()));

  if Some(material) = material {
    let offset = buffer.write(material).unwrap();
  }

  // etc...
}

ScanMountGoat commented 1 year ago

What if you have more than 1 struct? How is it guaranteed they are written in the right order in the uniform buffer?

I think I may have misunderstood your original question. Are you putting all the resources for a bind group in a single buffer? The current code generated by wgsl_to_wgpu takes bindings rather than buffers so that you can decide whether you want one or more buffers. I'm not sure how straightforward this would be to handle in a typesafe way.

Some people opt to put all their data for each object into a shared buffer with offsets for each object. If you have a very small struct like a color, this potentially wastes a lot of space on offset alignment since each struct is essentially padded to at least 256 bytes. Another option is to have a fixed or dynamically sized array in a single struct that each object indexes into in the shader. Both are valid options, which is why the current generated code is just taking "untyped" bindings.

One other point to consider is whether wgsl_to_wgpu should allow for sharing data between shader modules. For example, is a buffer containing camera struct A valid for shader B if the structs are "identical" or just if the buffer has enough data? A strongly typed function generated for each bind group would not allow this, since each shader's generated shader file is a separate module.

JMLX42 commented 1 year ago

Are you putting all the resources for a bind group in a single buffer?

IDK. I am just starting with the WGPU/WGSL API. What's the point of @group(n) otherwise?

But if there is no @group <=> uniform buffer 1 to 1 relationship, then you are absolutely right: my code is even more limiting.

Both are valid options, which is why the current generated code is just taking "untyped" bindings.

So it's the developer's job to map the generated structs with the actual buffers. Which is error prone.

Would be nice to have the best of both worlds:

Flexibility in the group/buffer relationship.
Strongly typed + automagically managed uniform buffer filling.

One other point to consider is whether wgsl_to_wgpu should allow for sharing data between shader modules.

IMHO that a valid use case. Any big enough program will re-use some common structs (cameras and materials are a good example) in multiple shaders. If I remember correctly, in GLSL/OpenGL a shader can be "linked" from multiple source modules.

Can we not do the same with WGSL and have some kind of linking where:

one ore more modules define some structs
other module(s) reuse 1. and define the VS/FS main functions

?

This way the structs would be generated only once.

ScanMountGoat commented 1 year ago

IDK. I am just starting with the WGPU/WGSL API. What's the point of @group(n) otherwise?

This was one of the more confusing concepts to learn in WGPU and part of the reason for making this library. The distinction between "group" and "binding" feels a bit arbitrary. There are hardware limits on how many groups can be created. There are also performance reasons for grouping the resources. Different resources need to be updated with different frequencies like updating the camera each frame or updating the materials for each model. See descriptor table frequency (DX12) and descriptor set frequency (Vulkan). I haven't found the performance difference to be relevant to my applications, so I allow for setting all the bind groups at once or calling bind_group.set(...) individually if needed.

So it's the developer's job to map the generated structs with the actual buffers. Which is error prone.

Yeah. This is less than ideal. The runtime validation in WGPU catches some errors but not all.

Would be nice to have the best of both worlds:

Flexibility in the group/buffer relationship.

Strongly typed + automagically managed uniform buffer filling.

The challenge is getting both. The second point could be handled using code like what you described. The first point is flexibility not only in relationships between groups and buffers but also bindings within a group and buffers. Each binding in the bind group that is a struct in the shader must be assigned a buffer binding. The buffer binding must refer to some data from a buffer that has enough bytes or also initialized using the appropriate struct depending on how strict we want to be. One approach is to use a unique type wrapper type for each wgpu::BufferBinding instead of the "untyped" wgpu::BufferBinding . The question is how to generate bindings like MaterialBinding or CameraTransformBinding in a type safe way.

One other point to consider is whether wgsl_to_wgpu should allow for sharing data between shader modules.

IMHO that a valid use case. Any big enough program will re-use some common structs (cameras and materials are a good example) in multiple shaders. If I remember correctly, in GLSL/OpenGL a shader can be "linked" from multiple source modules.

That's correct. It works like C where you have to declare the function signatures and can then link the definition in another file. If you have a lot of code it can also be more efficient to compile since each section of code can be compiled once and linked many times. I found it to be faster than a program shader cache in some cases.

Can we not do the same with WGSL and have some kind of linking where:

one ore more modules define some structs

other module(s) reuse 1. and define the VS/FS main functions

?

This way the structs would be generated only once.

I don't think WGPU supports this kind of linking. It's not as relevant in WGSL since you can just put multiple entry points in the same file and share code that way. The issue is that you end up with shared bind groups between the entry points. You would need to process the shader files yourself using some kind of #include mechanism. I guess you could do this yourself in the build script before calling the wgsl_to_wgpu code on the generated string. This would still require changes to wgsl_to_wgpu to not assume a single source file for each shader. Allowing different source files for each shader stage is something I need to support at some point.

JMLX42 commented 1 year ago

The challenge is getting both

Agreed. I'll take another shot at this and let you know.

The issue is that you end up with shared bind groups between the entry points. You would need to process the shader files yourself using some kind of #include mechanism.

Sure:

Bevy already has a pre-processor which supports #import
there is also the naga_oil crate

(Pre-processors in general and naga_oil in particular is why IMHO create_shader_module() should accept an optional existing shader Module, but that's another story)

But AFAIK it would still lead the same generated structs in different mods. With no way for the compiler to know those are supposed to be exactly the same. So any type safety would have to be by-passed.

ScanMountGoat / wgsl_to_wgpu

Typesafe initialization with BindGroup0::from_data() #28