ermig1979 / Simd

C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC, NEON for ARM.
http://ermig1979.github.io/Simd
MIT License
2.01k stars 407 forks source link

Can not compile the SIMD project in Windows for ARM64, using Cmake. #266

Open binhpht opened 6 months ago

binhpht commented 6 months ago

Issue Description

I encountered an issue while trying to compile the library for ARM64 on Windows 11 using Microsoft Visual Studio 2022. I received the following error:

SimdExtract.h(328,32): error C2084: function 'float32x4_t Simd::Neon::Extract4Sums(const float32x4_t &,const float32x4_t &,const float32x4_t &,const float32x4_t &)' already has a body SimdExtract.h(319,33): see previous definition of 'Extract4Sums'

Issue Details

Root Cause: From line 319, in SimdExtract.h

      SIMD_INLINE float32x4_t Extract4Sums(const float32x4_t & a0, const float32x4_t & a1, const float32x4_t & a2, const float32x4_t & a3)
        {
            float32x4x2_t b0 = vzipq_f32(a0, a2);
            float32x4x2_t b1 = vzipq_f32(a1, a3);
            float32x4x2_t c0 = vzipq_f32(b0.val[0], b1.val[0]);
            float32x4x2_t c1 = vzipq_f32(b0.val[1], b1.val[1]);
            return vaddq_f32(vaddq_f32(c0.val[0], c0.val[1]), vaddq_f32(c1.val[0], c1.val[1]));
        }

In line 328

       SIMD_INLINE uint32x4_t Extract4Sums(const uint32x4_t& a0, const uint32x4_t& a1, const uint32x4_t& a2, const uint32x4_t& a3)
        {
            uint32x4x2_t b0 = vzipq_u32(a0, a2);
            uint32x4x2_t b1 = vzipq_u32(a1, a3);
            uint32x4x2_t c0 = vzipq_u32(b0.val[0], b1.val[0]);
            uint32x4x2_t c1 = vzipq_u32(b0.val[1], b1.val[1]);
            return vaddq_u32(vaddq_u32(c0.val[0], c0.val[1]), vaddq_u32(c1.val[0], c1.val[1]));
        }

Additional Information

in the file arm_neon.h inside Visual studio, both uint32x4_t, float32x4_t have the type __n128

Thank you for your assistance in resolving this issue.

ermig1979 commented 6 months ago

Hi. I changed names of functions Extract4Sums to Extract4Sums32f and Extract4Sums32u. I hope it will resolve this issue. P.S. I can't check it because i have no access to Windows on ARM64.

binhpht commented 6 months ago

Thanks , it managed to compile now,