ermig1979 / Simd

C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC, NEON for ARM.
http://ermig1979.github.io/Simd
MIT License
2.04k stars 407 forks source link

Static construction (via list initialization) of NEON data types is non-portable #155

Closed ngzhian closed 3 years ago

ngzhian commented 3 years ago

Please see https://developer.arm.com/architectures/system-architectures/software-standards/acle Chapter 13.1.8:

ACLE does not define static construction of vector types. E.g. int32x4_t x = { 1, 2, 3, 4 }; Is not portable. Use the vcreate or vdup intrinsics to construct values from scalars.

This idiom is used to initialize constants in SimdConst.h:

https://github.com/ermig1979/Simd/blob/a51408c74f93c22652d57506b7909c0f3c59328a/src/Simd/SimdInit.h#L490-L491

It works fine now because GCC and Clang supports the same vector extensions.

For context: I encountered this issue while trying to port this library to Wasm, using SIMDe as the intrinsics emulation layer to run NEON on Wasm SIMD. See https://github.com/simd-everywhere/simde/issues/751#issuecomment-828868231 for an issue I filed, and the suggestion from SIMDe maintainer. What do you think of his suggestion?

ermig1979 commented 3 years ago

The macros defined in file SimdInit.h are used to initialize constant vectors which has global visibility. And they are initialized at library loading before checking of availability of corresponding CPU extension. So these macros must not use any CPU extension otherwise it cause crash on CPU without these extension. This is a main reason why I use compiler specific vector definition in order to initialize them. If any compiler does not supported such vector defenition you can add your own macros. At now there are two implementation: for MSVS and GCC(Clang). If you will offer compiler independed way to initialize constant vector in global namespace without of using any CPU extension I will glad to add it to Simd.

ngzhian commented 3 years ago

Thank you for explaining. I don't really understand this:

And they are initialized at library loading before checking of availability of corresponding CPU extension.

It seems to me that the definitions of the global constants are guarded by SIMD_NEON_ENABLE:

https://github.com/ermig1979/Simd/blob/a51408c74f93c22652d57506b7909c0f3c59328a/src/Simd/SimdConst.h#L735-L749

So shouldn't it be safe to use NEON intrinsics at this point?

Also in SimdInit.h we do a check for SIMD_NEON_ENABLE https://github.com/ermig1979/Simd/blob/a51408c74f93c22652d57506b7909c0f3c59328a/src/Simd/SimdInit.h#L443

and we also included "Simd/SimdDefs.h", which would have included "" if SIMD_NEON_ENABLE was set.

Maybe the ifdefs would need to change to something like:

I'm not knowledgeable about the different compilers and all the intrinsics, so really appreciate your guidance. Thanks!

ermig1979 commented 3 years ago

The macro SIMD_NEON_ENABLE is defined at compiler time after checking of compiler options and availability of compiler to compile code with NEON. So it only guarantees that the code may be built. There are no guaranty that CPU which run this code has corresponding extension. The lastone is checked in runtime.

If we use global constants:

#include <iostream>

inline int init(const char* desc)
{
    std::cout << desc << std::endl;
    return 1;
}

const int global = init("global");

void func()
{
    static const int local = init("local");
}

int main()
{
    int enable = init("enable");

    if(enable)
        func();

    return 0;
}

output:

global
enable
local

They are initialized before checking of availability of CPU extension and may cause crash if CPU is not supported them.

P.S. Of course it has not reason if you code works only on single type of CPU and you can guarantee that CPU has corresponding extensions.

ngzhian commented 3 years ago

Got it, thank you for explaining!