WGSL preprocessor and modules

badicsalex commented 1 year ago

I'm planning on introducing WGSL preprocessing to one of my projects, as complexity is quickly getting out of hand.

What I'm looking at right now is bevy's syntax, since it's supported by wgsl analyzer. It could be implemented as a separate crate, or, since it's such a simple piece of code, it could probably be added to wgsl_to_wgpu directly too. It could be interesting for types especially: #imports could probably be turned into use declarations in the generated rust code in the future.

Alternatives:

There's the wgsl_preprocessor crate, but I'm not sure how widespread it is.
There are also some general purpose preprocessors out there.
There's the option of doing the preprocessing as a build.rs step too. In this case, no modification to wgsl_to_wgpu is needed.

I'd be happy to cobble together a PR, but I wanted to get some second opinions here first.

ScanMountGoat commented 1 year ago

There's the wgsl_preprocessor crate, but I'm not sure how widespread it is.

I don't think WGSL has any kind of official preprocessor. I don't have any strong opinions at the moment, but this feels like something that users should handle rather than wgsl_to_wgpu itself.

There may need to be some API changes to make this easier for users to implement. I'd be interested in any feedback you may have when trying to use a preprocessor without modifying wgsl_to_wgpu.

With the current wgsl_to_wgpu, you would need to preprocess the file, write to a new file, and then include this file and source when calling create_shader_module in build.rs. The main downside is that you now have to also store the processed files on disk. I suppose we could modify create_shader_module to not always assume people are using include_str! for the WGSL source.

imports could probably be turned into use declarations in the generated rust code in the future

This sounds like each WGSL file before preprocessing would generate its own Rust module like shared_types.rs and shader.rs. A relevant use case could be wanting to define the same type in different shaders like a shared camera struct.

In order for wgsl_to_wgpu to to figure out the right module for the use statements, we may need to look at the preprocessor statements. The current implementation assumes all the WGSL code is already in a single file.

badicsalex commented 1 year ago

For now I'll write my preprocessed and generated files to target/ then. Having an option to directly give a shader string (instead of a filename) to the wgsl_to_wgpu builder would be useful.

The current implementation assumes all the WGSL code is already in a single file.

This is my biggest issue actually, because this also implies that there is only a single pipeline layout, which is not great for heterogeneous compute-heavy use-cases like the computer vision stuff I do.

ScanMountGoat commented 1 year ago

This is my biggest issue actually, because this also implies that there is only a single pipeline layout, which is not great for heterogeneous compute-heavy use-cases like the computer vision stuff I do.

Is this related to #41? Is there a way you would like to structure your kernels, types, etc that isn't possible with wgsl_to_wgpu?

badicsalex commented 1 year ago

My specific use-case is image processing with image preprocessing, keypoint detection, descriptor calculation, frame matching and pose estimation. This is basically a chain of operations where the types should be shared between the links, but you really don't want to bind all buffers in all steps.

The problem is that wgpu emits barriers related to the bound buffers, regardless of whether they are used by the actual shader entry point or not. (I know this for sure because I checked with qrenderdoc). Now if I don't want to have these unnecessary barriers, I have to avoid binding the unused buffers in the first place, but I can only do that by putting different compute steps in different WGSL scripts, because storage buffers are not parameters but global variables. This is not a wgsl_to_wgpu limitation but a fundamental limitation of WGSL itself.

Fortunately I haven't run into actual parallelization issues because of these barriers, but I probably will in the future.

Oh, and I only use bind group 0 for compute kernels, I didn't want to deal with multiple groups. But I'm not really sure if that would solve the above problem.

ScanMountGoat commented 1 year ago

Having an option to directly give a shader string (instead of a filename) to the wgsl_to_wgpu builder would be useful.

I've added a function that doesn't require an include path on the latest commit. This could also be part of the writer options, but it might not be worth a breaking change for people's build scripts.

badicsalex commented 1 year ago

It's going to fit my use-case just fine, thanks. And yeah, breaking changes just for this sounds unnecessary.

(If you ever plan on breaking compatibility, I suggest introducing a builder pattern for the options, or at least marking the options struct non_exhaustive.)

ScanMountGoat commented 1 year ago

Bevy's syntax seems to be the most popular. It looks like they've switched to using naga-oil and working with naga::Module directly instead of being a simple text preprocessor. I could create a function that takes a naga::Module instead of a source string. The issue is guaranteeing that the same module is used when loading the shader at runtime. One option is to just write to a string again using the WGSL backend for naga.

https://bevyengine.org/news/bevy-0-11/#improved-shader-imports https://crates.io/crates/naga_oil

badicsalex commented 1 year ago

You could also use some very simple serializer (eg. bincode) on the naga::Module and put that into the generated source. Then deserialize it at runtime.

ScanMountGoat commented 1 year ago

I've been playing around with naga-oil and passing a naga::Module instead of a source string. It's easy enough to convert the module back to WGSL if needed using naga. The main issue is that naga-oil appears to be doing some sort of hashing to automatically generate type names when flattening the hierarchy of WGSL files from import statements. The flattened module probably isn't what should be used for generating the Rust structs.

In theory, we should be able to generate a Rust module for each WGSL "module". This would allow users to specify types like shared_types::Camera. I'm not sure how much work this would be to implement in practice or how heavily this would depend on implementation details of specific crates like naga-oil.

bconnorwhite commented 11 months ago

I don't think WGSL has any kind of official preprocessor. I don't have any strong opinions at the moment, but this feels like something that users should handle rather than wgsl_to_wgpu itself.

What seems more useful than a preprocessor is the ability to output only a subset of the Rust bindings

So for example, if I have //!import ./other.wgsl in main.wgsl, I'm fine with bringing my own system to parse the import and load each file, but I want to generate one Rust file for main.wgsl, and a separate file for other.wgsl.

Here's a proof of concept where I just skip each struct/bindgroup that is also present in one of the imports:

pub fn create_shader_module_with_imports(
    wgsl_source: &str,
    imported_sources: Vec<String>, // ex: code from other.wgsl
    options: WriteOptions,
) -> Result<String, CreateModuleError> {
    ...

Then if I run this once on main.wgsl (+ other.wgslvia the import Vec), and once with just other.wgsl I get the code for each independently that I can then split into two different Rust files

ScanMountGoat commented 11 months ago

It looks like the specific example you linked could be covered by pasting the contents of other.wgsl into main.wgsl and then generating Rust bindings for that using create_shader_module_embedded.

Checking the types in each imported module makes sense for preventing duplicates, but there's still the issue of how users will refer to the correct types. The shader.rs file may reference types in other modules like other::Uniforms. There's also the issue of disambiguating the same struct name in two different imported modules like in my previous post.

What seems more useful than a preprocessor is the ability to output only a subset of the Rust bindings

Is there a specific use case you had in mind with this? I think the current approach of accepting a processed WGSL source instead of a path handles most use cases I've encountered, but I realize that not every application is structured the same way.

bconnorwhite commented 11 months ago

I'm using it to create a shared.wgsl with bindings that are used by both shader1.wgsl and shader2.wgsl.

The problem with pasting shared.wgsl into each of the other two is that you end up with duplicate Rust structs

I think you shouldn't end up with references to types in other modules unless you split the wgsl files weirdly

ScanMountGoat commented 11 months ago

Can you be more specific about the project you're working on? Someone mentioned above how they were having trouble with chaining operations for a computer vision workflow, for example. Does shared.wgslcontain types, bind groups, or just functions? Are you trying to implement something more advanced than a simple preprocessor with #include and #ifdef?

bconnorwhite commented 11 months ago

shared.wgsl just has a struct and a bind group

I'm just trying to set a bind group that is shared by all shaders in a render pass instead of creating identical bind groups for each shader

ScanMountGoat commented 11 months ago

shared.wgsl just has a struct and a bind group

I'm just trying to set a bind group that is shared by all shaders in a render pass instead of creating identical bind groups for each shader

That makes more sense now. Thanks.

If all bind groups are shared, you can just define multiple entry points in the same WGSL file. This is technically still possible in a single file even if the bind groups aren't shared, but wgsl_to_wgpu doesn't support that yet because of #41. The workaround is to make a separate WGSL file for each entry point. If you only need to share uniform or storage buffers, you could just put the types in shared.wgsl. That would require some way to refer to types in other generated Rust modules.

When dealing with this in my own code, I duplicate the types in each WGSL file but only create one bind group at runtime. This works but is less than ideal.

bconnorwhite commented 11 months ago

Yeah I've been doing the same thing, just having duplicate files and then only using parts of the code generated by wgsl_to_wgpu.

ScanMountGoat commented 11 months ago

I've been thinking through how this would work in the simple case without any naming conflicts across modules. If we take a list of referenced modules as in your example, we can simply add a use shared::* to import all the structs from the bindings to shared.wgsl.

This could also be applied to bind groups, but it feels a bit weird to define some of the bind groups for an entry point in an external file. Bind groups are compatible if they have the same bind group layout. The proof of concept linked above also assumes that each pipeline has the same group index for the shared bind group. We would need some way to detect that the layouts are the same regardless of whether it's at @group(0) or @group(1).

bconnorwhite commented 11 months ago

To keep the API the same I'm using (name, source) in that example, so you can even pass in something like "super::super::shared" for the use statement. This is pretty hacky though, using paths instead of the source might be better.

I agree that defining bind groups in a separate file is a bit weird. I have a single "global" bindgroup that is always group(0), and a second per-pass bindgroup that is always at group(1), and then each of my shaders start at group(2). I think just importing the structs still enables a setup like this though

Swoorup commented 9 months ago

I made a fork mostly for my own use where I've added support for import syntax. For structs, I have it such that the generated rust modules hierarchy are the exactly same as you'd have in naga_oil's wgsl modules.

But for bindings that might come from "other" wgsl files, I've decided to make it work more like a direct preprocessor import. (which in this case means, demangling naga oil's name and directly using the item name)

Probably easier to show examples: Entry point - https://github.com/Swoorup/wgsl-bindgen/blob/main/wgsl_bindgen/tests/test_shaders/main.wgsl Generated - https://github.com/Swoorup/wgsl-bindgen/blob/main/wgsl_bindgen/tests/expected/bindgen_main.out.rs#L79

The generator code you'd normally use in build.rs: https://github.com/Swoorup/wgsl-bindgen/blob/main/wgsl_bindgen/tests/bindgen_tests.rs#L28-L41

ScanMountGoat commented 8 months ago

It doesn't look like it's feasible to use naga_oil at the moment without relying on internal implementation details related to module path name mangling. I reran my test code on the latest version, and it doesn't look like much has changed since https://github.com/ScanMountGoat/wgsl_to_wgpu/issues/40#issuecomment-1738381297.

We probably only need to generate Rust code for types and constants in shared modules. The functions and bind groups can be handled by just parsing the final processed naga module from naga_oil. There may be edge cases where the entry function name gets mangled from imports.

I'm open to adding this in the future if naga_oil provides a reversible way to demangle names or some other method to access the type information we need.

ScanMountGoat / wgsl_to_wgpu

WGSL preprocessor and modules #40

imports could probably be turned into use declarations in the generated rust code in the future