JishinMaster / simd_utils

A header only library implementing common mathematical functions using SIMD intrinsics
BSD 2-Clause "Simplified" License
91 stars 21 forks source link

simd_test.c fails to compile on clang-10 (armv8-a + simd) #1

Closed jerinphilip closed 2 years ago

jerinphilip commented 2 years ago

Thank you for maintaining this library. I was trying to use this library in a larger source-code and encountered the following error.

In file included from ../../src/3rd_party/simd_utils/sse2neon_wrapper.h:11:
../../src/3rd_party/simd_utils/sse2neon.h:5992:33: error: cannot initialize a parameter of type 'float32x4_t' (vector of 4 'float32_t' values) with an lvalue of type '__m128d' (aka 'float64x2_t')
    __builtin_nontemporal_store(a, (float32x4_t *) p);

I am cross compiling for Android-NDK for a device with armv8-a+simd (uses clang). I additionally have a machine with the following specs:

$ lscpu
Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Vendor ID:                       ARM
Model name:                      Neoverse-N1
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs

I am using the ARM machine and have isolated this error to simd_utils, and can confirm the following behaviour. The compilation appears to be working fine using g++ 9.4.0

$ g++ --version
g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
$ g++ -DARM -DFMA -DSSE -flax-vector-conversions -c simd_test.c -l . 
In file included from simd_utils.h:489,
                 from simd_test.c:13:
simd_utils_sse_float.h:572:2: warning: #warning "src2 should have no 0.0f values!" [-Wcpp]
  572 | #warning "src2 should have no 0.0f values!"
      |  ^~~~~~~

But when using clang 10.0.0, the following errors are generated.

$ clang --version
clang version 10.0.0-4ubuntu1 
Target: aarch64-unknown-linux-gnu
$ clang++ -DARM -DFMA -DSSE -flax-vector-conversions -c simd_test.c -l .
clang: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated [-Wdeprecated]
clang: warning: -l.: 'linker' input unused [-Wunused-command-line-argument]
In file included from simd_test.c:13:
In file included from ./simd_utils.h:163:
In file included from ./neon_mathfun.h:35:
In file included from ./sse2neon_wrapper.h:11:
./sse2neon.h:5995:33: error: cannot initialize a parameter of type 'float32x4_t' (vector of 4 'float32_t' values) with an lvalue of type '__m128d' (aka 'float64x2_t')
    __builtin_nontemporal_store(a, (float32x4_t *) p);
                                ^
In file included from simd_test.c:13:
In file included from ./simd_utils.h:163:
./neon_mathfun.h:540:24: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
#pragma GCC diagnostic pop
                       ^
In file included from simd_test.c:13:
In file included from ./simd_utils.h:489:
./simd_utils_sse_float.h:572:2: warning: "src2 should have no 0.0f values!" [-W#warnings]
#warning "src2 should have no 0.0f values!"
 ^
2 warnings and 1 error generated.

The above error is identical with what I'm getting in the bigger source integration. If you have a moment, could you take a look and help out. I can help with testing a fix.

Thanks again,

jerinphilip commented 2 years ago

It appears I have managed to resolve my issue with: 11bc6dc4495ba71bdd530a544d4990e5c8c0048f...72a657b72f95620f9b3cb19aa15a92ae59eec91c#diff-4e3d6805ba

I am not sure if this is right or wrong, but build works now. Leaving this open until confirmation.