Closed kwea123 closed 1 year ago
Yes, I've had the same experience. I tried it mainly because the NVIDIA CUDA profiler kept suggesting these operations to enhance speed. However, in terms of speed, it doesn't seem to be beneficial at all. Instead, it tends to obfuscate and makes the code more difficult to understand.
I see you replace basically every operation with this intrinsics, but according to this there is zero speedup, they serve only for predictable rounding.