The current implementation of glm_vec4_normalize uses _mm_rsqrt_ps. While its relative error (<= 1.5*2^-12) is small, when used in more complicated functions it can lead inconsistent results as imprecision amplifies.
For example, modifying test/gtx/gtx_matrix_factorisation.cpp to operate on floats instead of doubles: replacing glm::dmat with glm::mat and reducing both T const epsilon = ... values to 1E-3:
On instruction sets that support it, I see little issue w/ replacing _mm_rsqrt_ps with _mm_div_ps+_mm_sqrt_ps (or vdivq_f32+vsqrtq_f32 on A64). Newton iterations are not much of a win as they once were on modern processors.
Would you say @gottfriedleibniz that limiting the _mm_rsqrt_ps implementation to a lowp version good enough?
And use the _mm_div_ps+_mm_sqrt_ps implemenation for mediump and highp.
The current implementation of glm_vec4_normalize uses
_mm_rsqrt_ps
. While its relative error (<= 1.5*2^-12
) is small, when used in more complicated functions it can lead inconsistent results as imprecision amplifies.For example, modifying test/gtx/gtx_matrix_factorisation.cpp to operate on floats instead of doubles: replacing
glm::dmat
withglm::mat
and reducing bothT const epsilon = ...
values to1E-3
:On instruction sets that support it, I see little issue w/ replacing
_mm_rsqrt_ps
with_mm_div_ps+_mm_sqrt_ps
(orvdivq_f32+vsqrtq_f32
on A64). Newton iterations are not much of a win as they once were on modern processors.