Open unfinishedprogram opened 11 months ago
Yeah I think that makes sense, especially if it means behaviour is the same with and without SIMD.
Turns out there are unstable https://doc.rust-lang.org/core/primitive.f32.html#method.maximum and https://doc.rust-lang.org/core/primitive.f32.html#method.minimum functions that are similar to what you suggest, although there are issues with stabilizing them https://github.com/rust-lang/rust/issues/91079.
Interesting, I guess some digging on godbolt is required to determine if this manual "ternary style" replacement compiles to the expected instruction on all architectures.
Another thing I didn't consider with this proposal, is non-optimized code-gen. This method could be a substantial performance degradation for optimized builds. However, I'm not sure how important this is for the goals of this library or if this is even the case. It would need to be tested.
Currently, the implementation of
min_element
andmax_element
for floating point vec3 uses the built inmin
/max
functions. These functions explicitly handleNaN
values, and do not propagate them, as is explained here. This is potentially desired behavior, however, this is semantically different from the meaning ofmin
in vectorized contexts whereNaN
values would be propagated.The extra handling of
NaN
s in these functions incurs significant performance overhead. Benchmarking on my computer using a custom f32::min implementation, which does not check forNaN
, was ~25% faster.We can see why this is so much faster by comparing the generated assembly: https://godbolt.org/z/3Ph1hWx8e
If I'm missing something please let me know, but otherwise this seems like an easy change which could make a significant difference in performance for some use-cases. I didn't have the time to look into other uses of floating-point min throughout the library but I'd guess it's replacement could improve performance elsewhere as well.