DLTcollab / sse2neon

A translator from Intel SSE intrinsics to Arm/Aarch64 NEON implementation
MIT License
1.3k stars 208 forks source link

2 Tests fail on M1 chip #233

Closed wangxiao1254 closed 3 years ago

wangxiao1254 commented 3 years ago

Two of the tests fail on apple M1 chip. mm_hadds_epi16 and mm_hsubs_epi16. Seems to be fine on CI. I'm happy to provide more debugging information if needed. make_check.txt

jserv commented 3 years ago

Thank @wangxiao1254 for reporting! To reduce the scope for trouble shooting, can you help the following experiments?

  1. Change line 3735 in sse2neon.h from #if defined(__aarch64__) to #if 0
  2. Change line 3758 in sse2neon.h from #if defined(__aarch64__) to #if 0

I would like to check if the Aarch64 fastpath affected or not.

wangxiao1254 commented 3 years ago

Seems that the slow path is also reporting failed.

git_diff.txt make_check.txt

jserv commented 3 years ago

Seems that the slow path is also reporting failed. git_diff.txt make_check.txt

It is weird that option -mfpu=neon was passed to g++. For Apple M1, it is definitely Aarch64 architecture rather than ARMv7-A. I thought there must be something wrong. Can you check Makefile for architecture detection (uname -m)?

jserv commented 3 years ago

@wangxiao1254, can you try with the latest master branch?

wangxiao1254 commented 3 years ago

uname -m is actually arm64 I had to modify MakeFile otherwise it shows unsupported. make_check.txt

jserv commented 3 years ago

uname -m is actually arm64 I had to modify MakeFile otherwise it shows unsupported. make_check.txt

I just modify Makefile to reflect Apple M1. Here are the failing intrinsics:

marktwtn commented 3 years ago

I believe that the failing intrinsics _mm_hadds_epi16 and _mm_hsubs_epi16 might be caused by the test. I forgot to check the minimum in _mm_hadds_epi16 and maximum in _mm_hsubs_epi16.

jserv commented 3 years ago

I believe that the failing intrinsics _mm_hadds_epi16 and _mm_hsubs_epi16 might be caused by the test. I forgot to check the minimum in _mm_hadds_epi16 and maximum in _mm_hsubs_epi16.

@wangxiao1254, please get the latest master branch and test with Apple M1.

wangxiao1254 commented 3 years ago

Here is the result. Seems that there are only 4 fails now: Test mm_srai_epi32 failed Test mm_srli_epi16 failed Test mm_srli_epi32 failed Test mm_srli_epi64 failed

make_check.txt

marktwtn commented 3 years ago

The wrong intrinsic implementation has been fixed. Please check 3b6b93bb4aa383a632fa355a1d31c46c96764cad for reference.

Thanks @wangxiao1254 for pointing out the errors and @HowJMay for code contribution.

wangxiao1254 commented 3 years ago

Thanks, I just tested again and all works well!