Open NilOmniscient opened 4 weeks ago
I consistently get ~1100ms with C# and ~380ms with GDScript.
Godot v4.3.rc3.mono - Windows 10.0.19045 - Vulkan (Mobile) - dedicated NVIDIA GeForce GTX 1660 Ti with Max-Q Design (NVIDIA; 32.0.15.5612) - AMD Ryzen 7 3750H with Radeon Vega Mobile Gfx (8 Threads)
--- Debugging process started ---
Godot Engine v4.3.rc3.mono.official.03afb92ef - https://godotengine.org
Vulkan 1.3.278 - Forward Mobile - Using Device #0: NVIDIA - NVIDIA GeForce GTX 1660 Ti with Max-Q Design
C# Time Elapsed: 1113
--- Debugging process stopped ---
Set GDScriptShader
--- Debugging process started ---
Godot Engine v4.3.rc3.mono.official.03afb92ef - https://godotengine.org
Vulkan 1.3.278 - Forward Mobile - Using Device #0: NVIDIA - NVIDIA GeForce GTX 1660 Ti with Max-Q Design
GDScript Time Elapsed: 377
--- Debugging process stopped ---
Godot Engine v4.3.rc3.mono.official.03afb92ef - https://godotengine.org
Vulkan 1.3.278 - Forward Mobile - Using Device #0: Intel - Intel(R) Xe Graphics (TGL GT2)
C# Time Elapsed: 3312ms
Set GDScriptShader
Godot Engine v4.3.rc3.mono.official.03afb92ef - https://godotengine.org
Vulkan 1.3.278 - Forward Mobile - Using Device #0: Intel - Intel(R) Xe Graphics (TGL GT2)
GDScript Time Elapsed: 153ms
This is what I normally get. I think it's interesting that your C# is faster than mine, but the GDScript is slower for you.
I don't have dotnet set up on this device. But looking through the code, I can see that this is measuring not just the time to run the compute shader, but the time to create the rendering device, compile the shader, load/create all the resources, and read the data back from the shader afterwards.
As a next step, someone will need to do some more fine-tuned profiling to see where the difference is coming from. My gut tells me that it won't be from running the compute shader, its more likely going to come from reading the storage buffer back from the GPU. My guess is that C# ends up doing more memory allocations and copies the memory around more times
I don't have dotnet set up on this device. But looking through the code, I can see that this is measuring not just the time to run the compute shader, but the time to create the rendering device, compile the shader, load/create all the resources, and read the data back from the shader afterwards.
As a next step, someone will need to do some more fine-tuned profiling to see where the difference is coming from. My gut tells me that it won't be from running the compute shader, its more likely going to come from reading the storage buffer back from the GPU. My guess is that C# ends up doing more memory allocations and copies the memory around more times
In my initial Issue Description, I do also list the individual runtimes for just the rd.Submit() and rd.Sync(). I did some testing before I submitted the bug to make sure I wasn't causing most of my headache due to porting the code badly.
The actual runtimes on my machine for each rd.Submit() and rd.Sync() combo, no other things: ~5ms for GDScript, and ~300-500ms for C#.
It loops 9 times. (Algorithm is based on 2008 talk by Li-Yi Wei about parallel Poisson Disk Sampling involves processing on the GPU in 9 steps), so a vast majority of the discrepancy should be there.
@NilOmniscient Regardless of what language you are using, the shader runs exactly the same. Its definitely not the shader that is running differently. Which is why we need to profile the entire thing and see where the discrepancy is coming from.
@clayjohn I get that. The shader runs on the GPU, completely separate from everything else.
I was simply pointing out that when I originally had more logging in there, whatever happens inside rd.Submit() and rd.Sync() seemed to be what was causing the largest gaps, and was hoping that might help narrow things down more.
@clayjohn I get that. The shader runs on the GPU, completely separate from everything else.
I was simply pointing out that when I originally had more logging in there, whatever happens inside rd.Submit() and rd.Sync() seemed to be what was causing the largest gaps, and was hoping that might help narrow things down more.
Thank you for the clarification! That result is extremely weird as there should be no difference between calling submit and sync from C# or GDScript. In both cases you are just making a call directly into an internal engine function.
Maybe @raulsntos Has some ideas about how performance could be affected in such a case?
Let me add back the extra logging and resubmit the MRP (and add logs from my machine) just in case I'm remembering wrong.
Bear in mind I'm on a different machine right now, so it'll probably be more in line with tetrapod's results.
This machine is a Windows machine, with a Ryzen 5600X and RX6700XT CPU/GPU. New logs as follows. The biggest time difference is in rd.sync(). Uploading the project with more detailed logging inside each versions ShaderHelper.RunShader()
--- Debugging process started --- Godot Engine v4.3.stable.mono.official.77dcf97d8 - https://godotengine.org Vulkan 1.3.280 - Forward Mobile - Using Device #0: AMD - AMD Radeon RX 6700 XT
C# Uniform Set Creation: 1ms C# Pipeline Creation: 0ms C# Compute List Creation: 0ms C# rd.Submit(): 0ms C# rd.Sync(): 40ms C# rd.FreeRid(pipeline) && rd.FreeRid(uniformSet): 0ms Time Elapsed: 446 Set GDScriptShader --- Debugging process stopped --- --- Debugging process started --- Godot Engine v4.3.stable.mono.official.77dcf97d8 - https://godotengine.org Vulkan 1.3.280 - Forward Mobile - Using Device #0: AMD - AMD Radeon RX 6700 XT
GDScript Pipeline creation: 0ms GDScript Uniform Set Creation: 1ms GDScript Compute List Creation: 0ms GDScript rd.submit(): 0ms GDScript rd.sync(): 1ms GDScript rd.free_rid(pipeline) && rd.free_rid(uniform_set): 0ms Time Elapsed: 117
Tested versions
System information
EndeavorOS Linux (Arch based). CPU - Intel i7-1165G7 (iGPU). Drivers - vulkan-intel/mesa
Issue description
Project using a GPU Poisson Disk Sampling shader. When running via c# version of rd.Submit(), shader takes upwards of 300-500ms each to run. When calling from a GDScript version though, (even from inside a C# program) each shader runs in < 5ms.
Huge discrepancy, and not entirely sure if this is a bug, or just a current limitation of the C# implementation.
Steps to reproduce
Load up MRP, in Main, toggle "Gd Script Shader" export variable on and off. Time Elapsed in ms is posted to Output terminal. GDScript version of Shader Code runs in < 150ms total whereas the C# version of the Shader code takes almost 3.5s.
Both use the same glsl file.
Minimal reproduction project (MRP)
Archive.zip