Closed wangxiao1254 closed 3 years ago
Thank @wangxiao1254 for reporting! To reduce the scope for trouble shooting, can you help the following experiments?
sse2neon.h
from #if defined(__aarch64__)
to #if 0
sse2neon.h
from #if defined(__aarch64__)
to #if 0
I would like to check if the Aarch64 fastpath affected or not.
Seems that the slow path is also reporting failed.
Seems that the slow path is also reporting failed. git_diff.txt make_check.txt
It is weird that option -mfpu=neon
was passed to g++. For Apple M1, it is definitely Aarch64 architecture rather than ARMv7-A. I thought there must be something wrong. Can you check Makefile
for architecture detection (uname -m
)?
@wangxiao1254, can you try with the latest master branch?
uname -m is actually arm64 I had to modify MakeFile otherwise it shows unsupported. make_check.txt
uname -m is actually arm64 I had to modify MakeFile otherwise it shows unsupported. make_check.txt
I just modify Makefile
to reflect Apple M1. Here are the failing intrinsics:
mm_srai_epi32
mm_srli_epi16
mm_srli_epi32
mm_srli_epi64
mm_hadds_epi16
mm_hsubs_epi16
I believe that the failing intrinsics _mm_hadds_epi16
and _mm_hsubs_epi16
might be caused by the test.
I forgot to check the minimum in _mm_hadds_epi16
and maximum in _mm_hsubs_epi16
.
I believe that the failing intrinsics
_mm_hadds_epi16
and_mm_hsubs_epi16
might be caused by the test. I forgot to check the minimum in_mm_hadds_epi16
and maximum in_mm_hsubs_epi16
.
@wangxiao1254, please get the latest master
branch and test with Apple M1.
Here is the result. Seems that there are only 4 fails now: Test mm_srai_epi32 failed Test mm_srli_epi16 failed Test mm_srli_epi32 failed Test mm_srli_epi64 failed
The wrong intrinsic implementation has been fixed. Please check 3b6b93bb4aa383a632fa355a1d31c46c96764cad for reference.
Thanks @wangxiao1254 for pointing out the errors and @HowJMay for code contribution.
Thanks, I just tested again and all works well!
Two of the tests fail on apple M1 chip.
mm_hadds_epi16
andmm_hsubs_epi16
. Seems to be fine on CI. I'm happy to provide more debugging information if needed. make_check.txt