Closed CarolEidt closed 4 years ago
cc @eerhardt @fiigii @mikedn @4creators @tannergooding
Copying my response from the mentioned thread...
The original reasons for splitting this, as I recall, were:
There were several discussions around how to make these APIs the "most usable"/"friendly". In the end, it was originally decided that some things (like software fallbacks) could be provided separately. I believe a shared wrapper API could be provided in the same way (or even as part of the same software fallback library).
I do think that merging the APIs may make it "easier to use", but so would merging "AVX" and "AVX2". However, we can't merge "AVX" and "AVX2" because there are still plenty of CPUs and operating systems that only support one of them. Additionally, users can trivially "merge" these APIs themselves with a using static Sse; using static Sse2;
I prefer to merge SSE and SSE2 into one class.
From the hardware product perspective, all the 64-bit x86 processors support SSE2 as the SIMD ISA bottom-line. I do not think any existing MSIL platform will consistently support the hardware released before 2001 (right?)
From the user experience perspective, merging SSE and SSE2 makes a lot of 128-bit APIs generic and more convenient to use. using static Sse; using static Sse2;
just solve parts of the problem.
One of my concern of merging SSE and SSE2 is that finding a good name of the class may be hard, Sse
, Sse2
, or SseSse2
?
SetZeroVector128 that provides support for float vectors in SSE, but supports the full range of generic types in SSE2.
Actually, I do not think this is a problem. The codegen of helper intrinsic does not need to be constrained to ISAs. Sse2.SetZeroVector128<float>
can generate any code to provide a "zero vector". Similarly, Sse2.SetAllVector128
also can be generic to support the full range of generic types in SSE2.
Merging SSE and SSE2 can avoid this kind of "duplicated" APIs, such as Sse2.SetZeroVector128<T>
and Sse.SetZeroVector128
.
It may be worth noting that other languages/runtimes that are also bringing up HWIntrinsic support are also keeping the ISAs separated.
For example:
other languages/runtimes that are also bringing up HWIntrinsic support are also keeping the ISAs separated
It was main argument behind design of HW APIs. We have done split based on hardware support detection to give fine grained control over implementation.
all the 64-bit x86 processors support SSE2 as the SIMD ISA bottom-line
I do not think it is true as after some additional data checks I have verified that latest Celeron N, Pentium N and Atom C series processors like:
https://ark.intel.com/products/95592/Intel-Pentium-Processor-N4200-2M-Cache-up-to-2_5-GHz or https://ark.intel.com/products/97935/Intel-Atom-Processor-C3308-4M-Cache-up-to-2_10-GHz
do not support any type of SIMD instructions except for AES NI.
I do not think it is true as after some additional data checks I have verified that latest Celeron N, Pentium N and Atom C series processors like: do not support any type of SIMD instructions except for AES NI.
No, I am pretty sure that Goldmont architecture has SSE -SSE4.2 instructions.
No, I am pretty sure that Goldmont architecture has SSE -SSE4.2 instructions.
Can you point us into any docs confirming SIMD support in Atom.
Another problem is with Celeron N and Pentium N which are also known for lack of SIMD support.
Please look for the docs of micro-architecture instead of processor product. For example, the newest ATOM/Celeron/Pentium should base on the architecture code-named Goldmont or Goldmont Plus.
One of my concern of merging SSE and SSE2 is that finding a good name of the class may be hard,
Sse
,Sse2
, orSseSse2
?
SseX
?
For usage it is way more convenient to have these merged to one class.
SseX
?
No thanks 😀
To keep the both worlds happy, can we have class SseX
in addition to those for separate ISAs?
Made a PR https://github.com/dotnet/corefx/pull/29447 to show the effect of merging SSE and SSE2 classes, which may help the discussion here.
@CarolEidt @tannergooding @terrajobst - I've updated the original comment to follow the API proposal process, and to capture the proposal to use inheritance in the X86 classes.
All - Please take a look at the proposal and provide feedback.
I like the proposal and think it works well. It allows easy access to the various methods from a given class while still keeping the IsSupported checks logical/etc. It also becomes easy to discover the relationship via source control (i.e. Is using Sse okay if Sse3.IsSupported returns true).
Props to @terrajobst, who I believe was the original proposer.
For the SetZeroVector128
issue. I am in favor of 1 (rename and explode the APIs) or 3 (move them to a shared class).
3 may be slightly better as it is also applicable to ARM (cc. @sdmaclea) and may simplify some things (such as StaticCast
implementation, where the plan is to just fold it away in the importer).
For the SetZeroVector128 issue, I prefer 2, but there is 4th solution.
SetZeroVector128<T>
in Sse class, and generate different code for underlying hardware support
xorps
for all base type that provides correct semantics but maybe with a bit data-bypass penalty on integer base types.Note: we (and C++ compilers) already use the similar approach for optimizing some other helpers (e.g., Avx.SetAllVector256).
For SetZero
there's also a variant of the 3rd option - just rely on new Vector128<T>()
and default
to do the right thing. Requiring a method call to produce a 0 value is counterintuitive in provides no advantages.
Thanks @fiigii and @mikedn, I've added those options to the list.
Closing this as resolved, since we are now using inheritance to expose the intrinsics.
Any other issues tracked here should be moved to their own new issue.
A good portion of X86 ISAs all follow a hierarchy of support. If
SSE2
is supported, then implicitlySSE
is supported. Similarly, ifAVX2
is supported, thenAVX
is supported as well.We should design our HW Intrinsics classes in a similar fashion. This allows for simplicity in coding calling functions from 2 (or more) instruction sets:
This means we should have our classes inherit from each other to model this inheritance in the ISAs:
The
IsSupported
hierarchy checks are doc'd in https://software.intel.com/en-us/articles/intel-sdm, for example:The hierarchy is also checked in the .NET runtime: https://github.com/dotnet/coreclr/blob/6bf04a47badd74646e21e70f4e9267c71b7bfd08/src/vm/codeman.cpp#L1307.
There is a bit of an open question: What should
AVX
inherit from? Should it inherit fromSSE/SSE2/SSE4.1/SSE4.2
? The Intel docs don't seem to indicate any SSE is necessary to support AVX, but the .NET runtime will only support AVX if SSE2 is present.NOTE Taking this proposal would mean we will need to drop
static class
from all our classes, sincestatic class
cannot form an inheritance hierarchy.For the original problem of
SetZeroVector128
, we have the following options:SetZeroVector128Float
SetZeroVector128Double
,SetZeroVector128Int32
, etc.Sse2.SetZeroVector128<float>()
, which would match the behavior ofSse.SetZeroVector128()
.SetAllVector128
,SetVector128
, etc.Vector128<T>.Zero;
,new Vector128<T>();
, ordefault
.Original proposal:
This was discussed during the API design (https://github.com/dotnet/corefx/pull/23489#issuecomment-326433485) but has come under discussion again in the context of https://github.com/dotnet/coreclr/pull/17691.
A related issue is, if the APIs remain separate, how to deal with an API, e.g.
SetZeroVector128
that provides support for float vectors in SSE, but supports the full range of generic types in SSE2.