Open MichalStrehovsky opened 6 years ago
@Alan-FGR this is up-for-grabs :)
Pass flags to RyuJIT to enable generation of SIMD code
We need command line switch that describes the minimum hardware you expect to be running on (ideally, this would be combined with runtime check during startup that verifies the minimum requirements - it will save us from debugging mysterious crashes).
Cc @tannergooding who might be able to give us more pointers.
We need command line switch that describes the minimum hardware you expect to be running on (ideally, this would be combined with runtime check during startup that verifies the minimum requirements - it will save us from debugging mysterious crashes).
I agree that this is a good baseline. However, It may also be interesting to have the CoreRT startup code perform and cache the CPUID checks as a one time cost. This allows AOT code to support higher level hardware than w/e the baseline is decided to be. The C Runtime library (both glibc and mscrt) does this for many of the math functions, for example.
The other tracked work items look correct as well.
It may be interesting to document that Vector64<T>
, Vector128<T>
, and Vector256<T>
correspond to the __m64
, __m128
, and __m256
primitive types defined by most ABIs as part of the process.
This allows AOT code to support higher level hardware than w/e the baseline is decided to be. The C Runtime library (both glibc and mscrt) does this for many of the math functions, for example.
In GCC this feature is named Function Multi Versioning - FMV
and is supported in evolving form since GCC v4.8 (C++ only). Essentially it compiles for all architectures indicated in attribute and at runtime c-runtime fixes RVAs based on architecture test.
The very same mechanism was already proposed earlier during discussion of HW Intrinsics for R2R assemblies. IMHO it would be one of the most important features to implement in .NET Core to fully exploit SIMD potential. This may support both System.Numerics.Vector<T>
and HW intrinsics.
__attribute__ ((target ("sse4.2")))
int foo(){
// foo version for SSE4.2
return 1;
}
__attribute__ ((target ("arch=atom")))
int foo(){
// foo version for the Intel Atom processor
return 2;
}
int main() {
int (*p)() = &foo;
assert((*p)() == foo());
return 0;
}
Looks pretty similar to Sse.IsSupported
Since RyuJIT already supports this, I think we just need these things: