KhronosGroup / GLSL

GLSL Shading Language Specification and Extensions
Other
346 stars 101 forks source link

allow `edge1 <= edge0` in `smoothstep` #260

Open greggman opened 1 month ago

greggman commented 1 month ago

The current definition of smoothstep is

Returns 0.0 if x <= edge0 and 1.0 if x >= edge1, and performs smooth Hermite interpolation between 0 and 1 when edge0 < x < edge1. This is useful in cases where you would want a threshold function with a smooth transition. This is equivalent to:

genFType t;
t = clamp ((x - edge0) / (edge1 - edge0), 0, 1);
return t * t * (3 - 2 * t);

(And similarly for doubles.) Results are undefined if edge0 >= edge1.

I'm curious if there are actually any GPUs that produce different results for edge1 <= edge0. AFAICT, no such GPUs exist as code using egde1 <= edge0 is ubiquitous.

All implementations seem to be effectively

    if (edge0 === edge1) {
       return edge0 < x ? 1.0 : 0.0;
    }
    t = clamp ((x - edge0) / (edge1 - edge0), 0, 1);
    return t * t * (3 - 2 * t);

This not undefined for edge1 <= edge0.

If no GPUs actually produce different results when edge1 <= edge0 it would be great to update the spec and tests to make that a requirement. Especially given its ubiquitous use.

As it is, given the spec says it's undefined, the WebGPU spec is trying to help developers avoid this undefined behavior. But, developers are pushing back saying they've been using it for years and have never run into any GPU actually having undefined behavior. As one public example they point out tons of examples on shadertoy use edge1 <= edge0.

gnl21 commented 1 month ago

I don't think that we would want to make a change without SPIR-V also making some sort of change, otherwise we would just push the problem from WebGPU into GLSL compilers, at least for Vulkan, so I'll create a SPIR-V issue to discuss this as well.

In general I'd be in favour of finding some way to improve the definedness of these operations if everyone has implemented them the same way to say WebGPU having to work around the existing limits, but there has to be a high bar for adding functionality to things that have been specced in a particular way for years. We can ask vendors to respond about what their implementations do.

I imagine that if implementations have just followed the spec formula here then they will be fine, except perhaps in the case of x == edge0 == edge1, which is a particular special case.