Ported shadertoy shader runs very slow (fractal pyramid)

LegNeato commented 9 months ago

I ported the shadertoy shader from https://www.shadertoy.com/view/tsXBzS to rust-gpu (code at https://github.com/LegNeato/rust-gpu/tree/fractal). When I run it, shadertoy on the web is way faster than rustgpu + wgpu locally, even with a small window.

This is the command I am running:

cargo run --release --bin wgpu_runner -- -s FractalPyramid

I'm on a MacBook Pro with an M1 Pro on macOS 13.5.2.

Cazadorro commented 9 months ago

Have you tried it with out rust-gpu? Like using GLSL directly in WGPU, or using SPIR-V generated by GLSLang or HLSL in WGPU? I'm not sure how this isolates to rust-gpu otherwise, even if it is the problem.

LegNeato commented 9 months ago

I am using rust-gpu so I can use my Rust knowledge for graphics programming. I am not a graphics programmer who wants to use Rust. It is a good suggestion though, I'll poke around and see if I can eliminate some variables. Thank you!

LegNeato commented 9 months ago

Adding #[inline(always)] to the functions in the shader speeds it up a bit. It is still slow, but reasonably faster than before.

Cazadorro commented 9 months ago

First, I finally had a proper look at your code, I think you may have a bunch of implicit slow behavior going on. Primitive choice matters a lot GPU because of the massive difference in performance, particularly between i32, f32,and f64 (yes, all three have completely different performance characteristics that vary wildly platform to platform, Ada as 1/2 the i32 as f32, and either 1/32 or 1/64 I believe f64 as f32). I think you might have a bunch of f64 math by accident here, but it's hard to tell, since rust doesn't really give a let x= 1.0 literal instantiation a type until after it's been used, and I'm not sure how Rust-GPU handles this.

For example, this:

Vec3::new(0.2, 0.7, 0.9)

Seems fine, but this:

let mut t = 0.0;

might not be fine and could result in t being a double (at least for Rust GPUs code-gen), which on Nvidia at least, would result in 1/32 the performance for basic float math compared to float 32 depending on if you used a consumer graphics GPU or a server grade GPU. I would probably consider this a bug in Rust GPU if it didn't handle this properly.

Next, Have you run the spir-v output through SPIR-V Opt? I don't think Rust GPU does any serious optimization passes though I could be wrong, and by default sharderc and glslang actually do run their generated spir-v through SPIRV opt if I remember correctly. OpenGL works differently if you don't use the SPIR-V extensions, (ie like WebGL) so it's basically implicitly doing a bunch of optimization passes behind the scenes at the driver level, before it's even turned into gpu IR (or, if you're on safari, who knows what's happening).

One way or another, you're going to have to take a look at rust-gpu's generated spirv for the same code as GLSLang/shaderc's if those actually produce faster SPIR-V, and to determine that, you need to isolate, so use spirv from those programs and run it through your current codebase, and benchmark. Luckily that shader doesn't look too complicated, so it should be pretty straight forward to also port it to standalone vulkan GLSL, likely easier than rust-gpu, but that's something you need to do.

So first, test if it's just because it hasn't been run through spir-v opt. If it is, it's not a problem with rust-gpu. There's no point in duplicating Khronos Groups effort here.
If it's just as slow with shaderc/glslang run through spir-v opt, it's not the Rust GPU shader, it's moltenvk or wgpu or what ever you're using to actually run these shaders on Mac. Note massive perf differences between these tools and WebGL may be expected, because the projects aren't related at all, if you're running shadertoy on safari, you're using Apple's implementation of WebGL, which may go directly to Metal.
If it's faster with shaderc/glslang, then you'll need to look at the decompiled SPIR-V and compare the output between the GLSL shader you made, and the rust-gpu one, again, the shader looks simple enough that you should easily be able to determine what functions are what. Make sure you output debug symbols into your SPIR-V.
- From there you can start to find where this program is going wrong. I would recommend you exactly match the glsl, not just kind of match it, for example , while the code in the GLSL just uses a float for time, you seem to have created a new type struct for it

#[derive(Clone, Copy)]
struct Time(f32);

Who knows what's going on there with code gen, for all I know, Rust GPU treats that as a struct with a single member, then implements copy and clone actual SPIRV-V functions for that type, generating a bunch of actual function calls and access chain instructions which wouldn't normally be needed since it's just a float. That might be a bug, but again, sounds like the kind of thing SPIR-V opt or your actual SPIR-V compiler from your driver would take care of anyway? You also wouldn't be able to inline these functions manually using this derive attribute IIUC..
Once you isolate the problem, you should be able to make a new issue that better specifies an actionable specific cause of the problem.

LegNeato commented 9 months ago

Awesome, thanks for the pointers! I Hope to look into this more after the holidays.

EmbarkStudios / rust-gpu

Ported shadertoy shader runs very slow (fractal pyramid) #1106