DLTcollab / sse2neon

A translator from Intel SSE intrinsics to Arm/Aarch64 NEON implementation
MIT License
1.3k stars 208 forks source link

`SSE2NEON_PRECISE_MINMAX` clarification #527

Closed shord closed 1 year ago

shord commented 2 years ago

Just looked into other of the library setting, and its not clear if SSE2NEON_PRECISE_MINMAX is needed for correct handling of infinity It's probably fine, but there is not much explanation about the problems its solve in the header file. I don't see any inf in the testing code too.

jserv commented 2 years ago

For macro SSE2NEON_PRECISE_MINMAX, @marktwtn did put some TODO in file tests/impl.cpp, but there was no corresponding conditional build to switch.

Cuda-Chen commented 1 year ago

Hi @shord , For the SSE2NEON_PRECISE_MINMAX usage, it is used for exact behavior of SSE conversion when handling with NaN value. For the infinity comparison test, let me do some tests and I will update the result in this thread. Please Stay tuned.

Cuda-Chen commented 1 year ago

Hi @shord , for the initial test of infinity, both Arm targets passes the test of positive infinity.

Hi @jserv , in this initial test, I find sse2neon lacks of testing positive/negative infinity. The test data of integers/floats/doubles are generated by the following function in tests/impl.cpp:

SSE2NEONTestImpl::SSE2NEONTestImpl(void)
{
    mTestFloatPointer1 = (float *) platformAlignedAlloc(sizeof(__m128));
    mTestFloatPointer2 = (float *) platformAlignedAlloc(sizeof(__m128));
    mTestIntPointer1 = (int32_t *) platformAlignedAlloc(sizeof(__m128i));
    mTestIntPointer2 = (int32_t *) platformAlignedAlloc(sizeof(__m128i));
    srand(0);
    for (uint32_t i = 0; i < MAX_TEST_VALUE; i++) {
        mTestFloats[i] = ranf(-100000, 100000);
        mTestInts[i] = (int32_t) ranf(-100000, 100000);
    }
}

I would like to add the tests about testing with infinity value.