JishinMaster / simd_utils

A header only library implementing common mathematical functions using SIMD intrinsics
BSD 2-Clause "Simplified" License
93 stars 21 forks source link

Compilation failing on Mac M1 (arm) #3

Closed jerinphilip closed 1 year ago

jerinphilip commented 1 year ago

Hey, thanks for keeping development going. I previously tried to use this library for android compile and got it working. Now I'm using the latest master (as it has fixes for M_PI constant). I'm using AppleClang 14.0.0, and getting stuck at the following error.

In file included from /Users/jerin/code/bergamot-translator/3rd_party/marian-dev/src/3rd_party/simd_utils/sse2neon_wrapper.h:11:
/Users/jerin/code/bergamot-translator/3rd_party/marian-dev/src/3rd_party/simd_utils/sse2neon.h:6212:33: error: cannot initialize a parameter of type 'float32x4_t' (vector of 4 'float32_t' values) with an lvalue of type '__m128d' (aka 'float64x2_t')
    __builtin_nontemporal_store(a, (float32x4_t *) p);
                                ^
1 error generated.
make[3]: *** [3rd_party/marian-dev/src/CMakeFiles/marian.dir/common/binary.cpp.o] Error 1

The relevant part in code is:

https://github.com/JishinMaster/simd_utils/blob/f94de48b010fdf8b2e8115c44031dc7608125542/sse2neon.h#L6204-L6218

Not sure what's going on, but changing it to float64x2_t* takes compilation forward (and my final executable works it seems as well, not sure it's activating this path or not). The M1 is arm64 / aarch64.

Any help is much appreciated, thank you.

JishinMaster commented 1 year ago

Hi, Thank you for your interest in simd_utils! It seems that the problem comes from sse2neon, I have patched it in the latest commit. I do not have the problem with gcc 12, but it might not have __builtin_nontemporal_store. Could you please do a pull request on the sse2neon repo ? (https://github.com/DLTcollab/sse2neon)

Please let me know if you encounter other problems on your Apple M1.