Sergio0694 / ComputeSharp

A .NET library to run C# code in parallel on the GPU through DX12, D2D1, and dynamically generated HLSL compute and pixel shaders, with the goal of making GPU computing easy to use for all .NET developers! 🚀
MIT License
2.65k stars 122 forks source link

Remove shader linking heuristic, rework compile option presets #807

Closed Sergio0694 closed 1 month ago

Sergio0694 commented 1 month ago

Closes #804

Description

This PR includes changes around shader linking:

rickbrew commented 1 month ago

My main concern here is with D2D1CompileOptions.OptimizeForSpeed, as per our discussion on Discord.

Normally when a compiler has an "optimize for speed" option (which... all of them do?), it doesn't come with a huge compromise like PartialPrecision does. That flag should not be used casually -- it can significantly affect rendering output, especially for an effect graph with multiple shaders in it due to compounding precision errors, and only some hardware (not NVIDIA!) will even honor it. This makes it a massive debugging hazard, as you could see wildly different output on two different systems and have no idea why, and it could take hours to nail down why.

(I have seen major precision errors due to using D2D1_BUFFER_PRECISION_FLOAT16 btw, or using _UNKNOWN on a non-Float32 render target in certain cases, and I had to manually set Precision on several effects to Float32!)

D2D1CompileOptions.Default is intended to lead folks down the happy path, but a lot of people will see OptimizeForSpeed and think that Default is insufficient, will blindly apply OptimizeForSpeed because they think Default isn't "fast", and will thus inadvertently fall down into a pit of misery and failure.

Please remove OptimizeForSpeed -- an option like that should not introduce precision problems like this. If you look at VC++'s /O1, /O2 and /Os, /Ot, you'll note that they do not include /fp:fast . PartialPrecision goes way beyond /fp:fast and should not be in any list of stock optimization options unless it is very clear that this is the case, e.g. calling it OptimizeForSpeedSacrificingPrecision (not necessarily a good name).

In addition, OptimizeForSize is kind of a misnomer. It's not really an "optimization" setting, as it doesn't affect the speed of the generated code, it just removes things that may not be necessary (reflection, linking). Lack of linking can of course affect performance at runtime, but not always, and it's up to D2D's discretion at runtime -- not the compiler's at compile-time. Again, I think it will lead people in the wrong direction due to mis-analogy with other compiler's "optimize for size" options.