g-truc / glm

OpenGL Mathematics (GLM)
https://glm.g-truc.net
Other
9.1k stars 2.12k forks source link

glm_vec4_normalize #1224

Open gottfriedleibniz opened 7 months ago

gottfriedleibniz commented 7 months ago

The current implementation of glm_vec4_normalize uses _mm_rsqrt_ps. While its relative error (<= 1.5*2^-12) is small, when used in more complicated functions it can lead inconsistent results as imprecision amplifies.

For example, modifying test/gtx/gtx_matrix_factorisation.cpp to operate on floats instead of doubles: replacing glm::dmat with glm::mat and reducing both T const epsilon = ... values to 1E-3:

# Scalar:
└> g++ -I${GLM_PATH} gtx_matrix_factorisation.cpp
└> ./a.out ; echo $?
0

# SSE:
└> g++ -I${GLM_PATH} -DGLM_FORCE_SSE2 -DGLM_FORCE_DEFAULT_ALIGNED_GENTYPES gtx_matrix_factorisation.cpp
└> ./a.out ; echo $?
2

# Arm64+NEON (different box):
└> g++ -I${GLM_PATH} -DGLM_FORCE_NEON -DGLM_FORCE_DEFAULT_ALIGNED_GENTYPES gtx_matrix_factorisation.cpp
└> ./a.out ; echo $?
2

On instruction sets that support it, I see little issue w/ replacing _mm_rsqrt_ps with _mm_div_ps+_mm_sqrt_ps (or vdivq_f32+vsqrtq_f32 on A64). Newton iterations are not much of a win as they once were on modern processors.

christophe-lunarg commented 7 months ago

Would you say @gottfriedleibniz that limiting the _mm_rsqrt_ps implementation to a lowp version good enough? And use the _mm_div_ps+_mm_sqrt_ps implemenation for mediump and highp.

gottfriedleibniz commented 7 months ago

Yeah, that seems fine.