EmbarkStudios / rust-gpu

🐉 Making Rust a first-class language and ecosystem for GPU shaders 🚧
https://shader.rs
Apache License 2.0
7.28k stars 246 forks source link

Ported shadertoy shader runs very slow (fractal pyramid) #1106

Open LegNeato opened 9 months ago

LegNeato commented 9 months ago

I ported the shadertoy shader from https://www.shadertoy.com/view/tsXBzS to rust-gpu (code at https://github.com/LegNeato/rust-gpu/tree/fractal). When I run it, shadertoy on the web is way faster than rustgpu + wgpu locally, even with a small window.

This is the command I am running:

cargo run --release --bin wgpu_runner -- -s FractalPyramid

I'm on a MacBook Pro with an M1 Pro on macOS 13.5.2.

Cazadorro commented 9 months ago

Have you tried it with out rust-gpu? Like using GLSL directly in WGPU, or using SPIR-V generated by GLSLang or HLSL in WGPU? I'm not sure how this isolates to rust-gpu otherwise, even if it is the problem.

LegNeato commented 9 months ago

I am using rust-gpu so I can use my Rust knowledge for graphics programming. I am not a graphics programmer who wants to use Rust. It is a good suggestion though, I'll poke around and see if I can eliminate some variables. Thank you!

LegNeato commented 9 months ago

Adding #[inline(always)] to the functions in the shader speeds it up a bit. It is still slow, but reasonably faster than before.

Cazadorro commented 9 months ago

First, I finally had a proper look at your code, I think you may have a bunch of implicit slow behavior going on. Primitive choice matters a lot GPU because of the massive difference in performance, particularly between i32, f32,and f64 (yes, all three have completely different performance characteristics that vary wildly platform to platform, Ada as 1/2 the i32 as f32, and either 1/32 or 1/64 I believe f64 as f32). I think you might have a bunch of f64 math by accident here, but it's hard to tell, since rust doesn't really give a let x= 1.0 literal instantiation a type until after it's been used, and I'm not sure how Rust-GPU handles this.

For example, this:

Vec3::new(0.2, 0.7, 0.9)

Seems fine, but this:

let mut t = 0.0;

might not be fine and could result in t being a double (at least for Rust GPUs code-gen), which on Nvidia at least, would result in 1/32 the performance for basic float math compared to float 32 depending on if you used a consumer graphics GPU or a server grade GPU. I would probably consider this a bug in Rust GPU if it didn't handle this properly.

Next, Have you run the spir-v output through SPIR-V Opt? I don't think Rust GPU does any serious optimization passes though I could be wrong, and by default sharderc and glslang actually do run their generated spir-v through SPIRV opt if I remember correctly. OpenGL works differently if you don't use the SPIR-V extensions, (ie like WebGL) so it's basically implicitly doing a bunch of optimization passes behind the scenes at the driver level, before it's even turned into gpu IR (or, if you're on safari, who knows what's happening).

One way or another, you're going to have to take a look at rust-gpu's generated spirv for the same code as GLSLang/shaderc's if those actually produce faster SPIR-V, and to determine that, you need to isolate, so use spirv from those programs and run it through your current codebase, and benchmark. Luckily that shader doesn't look too complicated, so it should be pretty straight forward to also port it to standalone vulkan GLSL, likely easier than rust-gpu, but that's something you need to do.

#[derive(Clone, Copy)]
struct Time(f32);
LegNeato commented 9 months ago

Awesome, thanks for the pointers! I Hope to look into this more after the holidays.