Open jimblandy opened 1 week ago
Will this cause problems due to slow 64-bit arithmetic? Is it possible for the GPU to actually run 232 loop iterations before the device is lost?
I was proposing to add this to the WGSL spec first to avoid UB in both the Metal compiler and DXC. DXC doesn't make the same assumptions as Metal but it does technically change the behavior of the program as written in WGSL https://github.com/gfx-rs/wgpu/issues/6528#issuecomment-2476185624.
My main argument is that given WGSL allows loops to be infinite and that we target MSL & HLSL (both being based on C++), therefore need to prove to downstream compilers that loops are finite. Currently the only way to do that is with volatiles but that's expensive, if we add an upper bound on the nr of loop iterations we should be in a better position performance wise. Loops can even be unrolled as you found in https://github.com/gfx-rs/wgpu/issues/6528#issuecomment-2477277145.
Is it possible for the GPU to actually run 232 loop iterations before the device is lost?
We could run some tests to see if this is the case. A more expensive way to make sure this happens is to add the mechanism ourselves but that's nontrivial. It might be enough to tell developers that loops have the upper bound and that they might hit it.
A 264 - 1 limit allows assuming that the limit will never actually be reached, because the GPU cannot execute enough cycles to actually hit it in a human lifetime.
Rather than injecting branches on volatile bools, Naga's backend for Metal Shading Language should avoid undefined behavior simply by imposing an iteration limit on every loop. Unlike the volatile bools, the optimizer will be able to reason about these limits and eliminate them when possible. Optimizations like unrolling are defeated by the volatile bool, but not by the iteration limit.
See a full description of the idea, with godbolt experiments, here: https://github.com/gfx-rs/wgpu/issues/6528#issuecomment-2477277145