Closed jmather-sesi closed 3 weeks ago
@jserv I think maybe we should remove all the FORCE_INLINE_OPTNONE
and using OPTNONE
for the failed tests.
Warning will be added in the root README to inform users about the potential optimization issues.
How do you think about it
Looking a bit deeper, I'm not entirely sure if _MM_SET_ROUNDING_MODE
is working as expected. I'm working on a math library, and all of the numerical tests pass on x86. On Arm however, if I call _mm_setcsr(*flag)
provided by sse2neon, the tests fail. If I instead call fesetexceptflag(flag, FE_ALL_EXCEPT);
, it works as expected.
I think maybe we should remove all the
FORCE_INLINE_OPTNONE
and usingOPTNONE
for the failed tests. Warning will be added in the root README to inform users about the potential optimization issues.
The OPTNONE
changes introduced in #638 have resulted in a known performance regression. To mitigate this, we should maintain the enforcement of static inline for _MM_SET_ROUNDING_MODE
where possible. Additionally, it is time to re-evaluate the use of the volatile keyword in conjunction with our supported compilers.
I think maybe we should remove all the
FORCE_INLINE_OPTNONE
and usingOPTNONE
for the failed tests. Warning will be added in the root README to inform users about the potential optimization issues.The
OPTNONE
changes introduced in #638 have resulted in a known performance regression. To mitigate this, we should maintain the enforcement of static inline for_MM_SET_ROUNDING_MODE
where possible.
Yes. OPTNONE
is the keyword that we use in the tests. So removing FORCE_INLINE_OPTNONE
and replace it with OPTNONE
shifts the responsibility of avoiding optimization issue to the developers side, and mitigate the performance change we made in the https://github.com/DLTcollab/sse2neon/pull/638
I will check fesetexceptflag
with different optimization options too
_MM_SET_ROUNDING_MODE
is declared asFORCE_INLINE_OPTNONE
, which lowers performance. Unless I'm missing something, wouldn't it be better to define ther
union asvolatile
instead?