Enable System.Runtime.Intrinsics intrinsics

MichalStrehovsky commented 6 years ago

Since RyuJIT already supports this, I think we just need these things:

[x] Type layout work for Vector64/Vector128/Vector256. This is basically a port of dotnet/coreclr#15961. It might look similar to #3678 (where I enabled Vector) [edit: This is in progress at #6675, we need one more thing after that]
[x] Report the SIMD types as intrinsic types to RyuJIT
[x] Pass flags to RyuJIT to enable generation of SIMD code
[ ] Either a command line switch to enable various intrinsics (program will fail if run on a CPU that doesn't support them), or ideally, function multi-versioning so that we can use the appropriate code for the CPU the program is running on at runtime

MichalStrehovsky commented 6 years ago

@Alan-FGR this is up-for-grabs :)

jkotas commented 6 years ago

Pass flags to RyuJIT to enable generation of SIMD code

We need command line switch that describes the minimum hardware you expect to be running on (ideally, this would be combined with runtime check during startup that verifies the minimum requirements - it will save us from debugging mysterious crashes).

MichalStrehovsky commented 6 years ago

Cc @tannergooding who might be able to give us more pointers.

tannergooding commented 6 years ago

We need command line switch that describes the minimum hardware you expect to be running on (ideally, this would be combined with runtime check during startup that verifies the minimum requirements - it will save us from debugging mysterious crashes).

I agree that this is a good baseline. However, It may also be interesting to have the CoreRT startup code perform and cache the CPUID checks as a one time cost. This allows AOT code to support higher level hardware than w/e the baseline is decided to be. The C Runtime library (both glibc and mscrt) does this for many of the math functions, for example.

The other tracked work items look correct as well.

It may be interesting to document that Vector64<T>, Vector128<T>, and Vector256<T> correspond to the __m64, __m128, and __m256 primitive types defined by most ABIs as part of the process.

4creators commented 6 years ago

This allows AOT code to support higher level hardware than w/e the baseline is decided to be. The C Runtime library (both glibc and mscrt) does this for many of the math functions, for example.

In GCC this feature is named Function Multi Versioning - FMV and is supported in evolving form since GCC v4.8 (C++ only). Essentially it compiles for all architectures indicated in attribute and at runtime c-runtime fixes RVAs based on architecture test.

The very same mechanism was already proposed earlier during discussion of HW Intrinsics for R2R assemblies. IMHO it would be one of the most important features to implement in .NET Core to fully exploit SIMD potential. This may support both System.Numerics.Vector<T> and HW intrinsics.

     __attribute__ ((target ("sse4.2")))
    int foo(){
    // foo version for SSE4.2
    return 1;
    }
    __attribute__ ((target ("arch=atom")))
    int foo(){
    // foo version for the Intel Atom processor
    return 2;
    }

    int main() {
    int (*p)() = &foo;
    assert((*p)() == foo());
    return 0;
    }

Looks pretty similar to Sse.IsSupported

dotnet / corert

Enable System.Runtime.Intrinsics intrinsics #6173