Open DeepakRajendrakumaran opened 1 month ago
Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics See info in area-owners.md if you want to be subscribed.
The following instructions which are part of Avx10.2 are not mentioned above. These fall under mostly 2 groups - 16 bit floating point and FMA instructions
`
Instructions Skipped -
Entire section 7 in AVX10.2 manual
Parts of Section 8 in AVX10.2 manual
- VCOMXSH
- VUCOMXSH
Entire Section 9 in AVX10.2 manual
Parts of Section 10 in AVX10.2 manual
- VDPPHPS
Parts of Section 11 in AVX10.2 manual
- VMINMAXNEPBF16
Parts of Section 12 in AVX10.2 manual
- VADDPH
- VCMPPH
- VCVTDQ2PH
- VCVTPD2PH
- VCVTPH2DQ
- VCVTPH2PD
- VCVTPH2PS
- VCVTPH2PSX
- VCVTPH2QQ
- VCVTPH2UDQ
- VCVTPH2UQQ
- VCVTPH2UW
- VCVTPH2W
- VCVTPS2PH
- VCVTPS2PHX
- VCVTQQ2PH
- VCVTTPH2DQ
- VCVTTPH2QQ
- VCVTTPH2UDQ
- VCVTTPH2UQQ
- VCVTTPH2UW
- VCVTTPH2W
- VCVTUDQ2PH
- VCVTUQQ2PH
- VCVTUW2PH
- VCVTW2PH
- VDIVPH
- VFCMADDCPH
- VFCMULCPH
- VFMADD132PD - Prior instructions dont exist
- VFMADD132PH
- VFMADD132PS - Prior instructions dont exist
- VFMADD213PD - Prior instructions dont exist
- VFMADD213PH
- VFMADD213PS - Prior instructions dont exist
- VFMADD231PD - Prior instructions dont exist
- VFMADD231PH
- VFMADD231PS - Prior instructions dont exist
- VFMADDCPH
- VFMADDSUB132PD - Prior instructions dont exist
- VFMADDSUB132PH
- VFMADDSUB132PS - Prior instructions dont exist
- VFMADDSUB213PD - Prior instructions dont exist
- VFMADDSUB213PH
- VFMADDSUB213PS - Prior instructions dont exist
- VFMADDSUB231PD - Prior instructions dont exist
- VFMADDSUB231PH
- VFMADDSUB231PS - Prior instructions dont exist
- VFMSUB132PD - Prior instructions dont exist
- VFMSUB132PH
- VFMSUB132PS - Prior instructions dont exist
- VFMSUB213PD - Prior instructions dont exist
- VFMSUB213PH
- VFMSUB213PS - Prior instructions dont exist
- VFMSUB231PD - Prior instructions dont exist
- VFMSUB231PH
- VFMSUB231PS - Prior instructions dont exist
- VFMSUBADD132PD - Prior instructions dont exist
- VFMSUBADD132PH
- VFMSUBADD132PS - Prior instructions dont exist
- VFMSUBADD213PD - Prior instructions dont exist
- VFMSUBADD213PH
- VFMSUBADD213PS - Prior instructions dont exist
- VFMSUBADD231PD - Prior instructions dont exist
- VFMSUBADD231PH
- VFMSUBADD231PS - Prior instructions dont exist
- VFMULCPH
- VFNMADD132PD - Prior instructions dont exist
- VFNMADD132PH
- VFNMADD132PS - Prior instructions dont exist
- VFNMADD213PD - Prior instructions dont exist
- VFNMADD213PH
- VFNMADD213PS - Prior instructions dont exist
- VFNMADD231PD - Prior instructions dont exist
- VFNMADD231PH
- VFNMADD231PS - Prior instructions dont exist
- VFNMSUB132PD - Prior instructions dont exist
- VFNMSUB132PH
- VFNMSUB132PS - Prior instructions dont exist
- VFNMSUB213PD - Prior instructions dont exist
- VFNMSUB213PH
- VFNMSUB213PS - Prior instructions dont exist
- VFNMSUB231PD - Prior instructions dont exist
- VFNMSUB231PH
- VFNMSUB231PS - Prior instructions dont exist
- VGETEXPPH
- VGETMANTPH
- VMAXPH
- VMINPH
- VMULPH
- VREDUCEPH
- VRNDSCALEPH
- VSQRTPH
- VSUBPH
Parts of Section 13 in AVX10.2 manual
- VCVT[,T]NEBF162I[,U]BS
- VCVT[,T]PH2I[,U]BS
Haven't finished going through the list, but as initial feedback:
MinMaxVector
should be named just MinMax
MinMax
should instead be named MinMaxScalar
Compare*Enhanced
APIs are unnecessary, we can implicitly use these instructions for the existing Compare*
APIs, since its simply setting different flags allowing more optimal codegen for subsequent branches or conditional movesV256<T>
and FloatRoundingMode
)ConvertWithSaturationPackedFloatToSignedByteInteger
, we want to use the .NET type names, so Single
is preferred over Float
, SByte
over SignedByte
, etc
SByte
, Int16
, Int32
, Int64
Byte
, UInt16
, UInt32
, UInt64
Half
, Single
, Double
ConvertWithSaturationPackedFloatToSignedByteInteger
, we probably want to more closely parity the existing names like ConvertToVector128Int32WithTruncation
and so would call it ConvertToVector128ByteWithSaturation
Haven't finished going through the list, but as initial feedback:
MinMaxVector
should be named justMinMax
MinMax
should instead be namedMinMaxScalar
- The various
Compare*Enhanced
APIs are unnecessary, we can implicitly use these instructions for the existingCompare*
APIs, since its simply setting different flags allowing more optimal codegen for subsequent branches or conditional moves- It'd be helpful to separate out (such as via a separate code block or proposal) the "new instruction forms" where they aren't new concepts, but rather just new overloads of existing APIs (typically taking
V256<T>
andFloatRoundingMode
)For APIs like
ConvertWithSaturationPackedFloatToSignedByteInteger
, we want to use the .NET type names, soSingle
is preferred overFloat
,SByte
overSignedByte
, etc
- for signed integers we have
SByte
,Int16
,Int32
,Int64
- for unsigned integers we have
Byte
,UInt16
,UInt32
,UInt64
- for floating-point we have
Half
,Single
,Double
- For APIs like
ConvertWithSaturationPackedFloatToSignedByteInteger
, we probably want to more closely parity the existing names likeConvertToVector128Int32WithTruncation
and so would call itConvertToVector128ByteWithSaturation
Thank you. I will leave you a comment when I have made all required changes.
@tannergooding Thanks for the review. About the nomenclature for Convert
APIs, for something like // VCVTPS2IBS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
, should we use ConvertToVector128ByteWithSaturation
or ConvertToVector128SByteInt16WithSaturation
? Because the instruction description is something like -->
These instructions convert four, eight or sixteen packed single-precision floating-point values in the source operand to four, eight or sixteen signed or unsigned byte integers in the destination operand. The downconverted 8-bit result is written inplace at the lower 8-bit of the corresponding 32-bit element. The upper 3 bytes are zeroed. VCVTPS2IBS converts single-precision floating point elements into signed byte integer elements.
Let me know what you think.
I'll need to think about it more.
It is important we document the behavior which is conversion to byte
so that users understand what the API is doing.
It is then important we document the return type of Vector128<int>
so that it doesn't cause issues with overload resolution, since you cannot overload by return type.
It's functionally doing a ConvertToVector128ByteWithSaturationAndWidenToVector128Int32
, which is a very verbose name.
True. I was thinking on similar lines ConvertToVector128ByteWithSaturationAndWidenToVector128Int32
but wanted to keep it a little shorter and also describe that it widens to int32. How about ConvertToVector128SByteWithSaturationWidenToInt32
? alteast we can remove the vector128
after widen
.
I have updated the names and made the other changes. For the Widen
ones, let me know how you want us to update those. The ones this might apply to are
Accumulated*DotProduct*
and convert intrinsics where widening is happening
I'll need to think about it more.
It is important we document the behavior which is conversion to
byte
so that users understand what the API is doing. It is then important we document the return type ofVector128<int>
so that it doesn't cause issues with overload resolution, since you cannot overload by return type.It's functionally doing a
ConvertToVector128ByteWithSaturationAndWidenToVector128Int32
, which is a very verbose name.
Hi Tanner - have you decided on how you want the 'widen' API's to be named?
I think we should default to the verbose name, which is the most consistent with our other APIs and the least problematic.
We'll likely discuss some of the alternatives in API review and it wouldn't hurt to have them listed.
Notably we have Vector128<int> ConvertToInt32(Vector128<float> value)
in SSE2 (and similar for Int64
/UInt32
/UInt654
in other ISAs), so something like ConvertToByteWithSaturationAndWidenToInt32
might be a feasible shorter name that won't conflict, but I don't think we could get much shorter otherwise.
I think we should default to the verbose name, which is the most consistent with our other APIs and the least problematic.
We'll likely discuss some of the alternatives in API review and it wouldn't hurt to have them listed.
Notably we have
Vector128<int> ConvertToInt32(Vector128<float> value)
in SSE2 (and similar forInt64
/UInt32
/UInt654
in other ISAs), so something likeConvertToByteWithSaturationAndWidenToInt32
might be a feasible shorter name that won't conflict, but I don't think we could get much shorter otherwise.
I've updated the names for these, Do you think it makes sense to have 'Widen' in the name for the accumulated dot product ones as well?
I think we should default to the verbose name, which is the most consistent with our other APIs and the least problematic. We'll likely discuss some of the alternatives in API review and it wouldn't hurt to have them listed. Notably we have
Vector128<int> ConvertToInt32(Vector128<float> value)
in SSE2 (and similar forInt64
/UInt32
/UInt654
in other ISAs), so something likeConvertToByteWithSaturationAndWidenToInt32
might be a feasible shorter name that won't conflict, but I don't think we could get much shorter otherwise.I've updated the names for these, Do you think it makes sense to have 'Widen' in the name for the accumulated dot product ones as well?
@tannergooding What are the next steps for this?
I've filtered out the VPDPB[SU,UIU,SS]D[,S]
instructions from the initial review as the names aren't correct
namespace System.Runtime.Intrinsics.X86
{
/// <summary>Provides access to X86 AVX10.1 hardware instructions via intrinsics</summary>
[Intrinsic]
[CLSCompliant(false)]
public abstract class Avx10v2 : Avx10v1
{
// VPDPBSSD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
public static Vector128<int> AccumulatedByteDotProduct(vector128<sbyte> left, Vector128<sbyte> right) => AccumulatedByteDotProduct(left, right, acc);
// VPDPBSUD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
public static Vector128<int> AccumulatedByteDotProduct(vector128<sbyte> left, Vector128<byte> right) => AccumulatedByteDotProduct(left, right, acc);
// VPDPBUUD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
public static Vector128<int> AccumulatedByteDotProduct(vector128<byte> left, Vector128<byte> right) => AccumulatedSignedByteDotProduct(left, right, acc);
// VPDPBSSD ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
public static Vector256<int> AccumulatedByteDotProduct(Vector256<sbyte> left, Vector256<sbyte> right) => AccumulatedByteDotProduct(left, right, acc);
// VPDPBSUD ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
public static Vector256<int> AccumulatedByteDotProduct(Vector256<sbyte> left, Vector256<byte> right) => AccumulatedSignedByteDotProduct(left, right, acc);
// VPDPBUUD ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
public static Vector256<int> AccumulatedByteDotProduct(Vector256<byte> left, Vector256<byte> right) => AccumulatedByteDotProduct(left, right, acc);
// VPDPBSSDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
public static Vector128<int> AccumulatedByteDotProductWithSaturation(vector128<sbyte> left, Vector128<sbyte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);
// VPDPBSUDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
public static Vector128<int> AccumulatedByteDotProductWithSaturation(vector128<sbyte> left, Vector128<byte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);
// VPDPBUUDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
public static Vector128<int> AccumulatedByteDotProductWithSaturation(vector128<byte> left, Vector128<byte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);
// VPDPBSSDS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
public static Vector256<int> AccumulatedByteDotProductWithSaturation(Vector256<sbyte> left, Vector256<sbyte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);
// VPDPBSUDS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
public static Vector256<int> AccumulatedByteDotProductWithSaturation(Vector256<sbyte> left, Vector256<byte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);
// VPDPBUUDS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
public static Vector256<int> AccumulatedByteDotProductWithSaturation(Vector256<byte> left, Vector256<byte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);
// VPDPWSUD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
public static Vector128<int> AccumulatedInt16DotProduct(vector128<short> left, Vector128<ushort> right) => AccumulatedInt16DotProduct(left, right, acc);
// VPDPWUSD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
public static Vector128<int> AccumulatedInt16DotProduct(vector128<ushort> left, Vector128<short> right) => AccumulatedInt16DotProduct(left, right, acc);
// VPDPWUUD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
public static Vector128<int> AccumulatedInt16DotProduct(vector128<ushort> left, Vector128<ushort> right) => AccumulatedInt16DotProduct(left, right, acc);
// VPDPWSUD ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
public static Vector256<int> AccumulatedInt16DotProduct(Vector256<short> left, Vector256<ushort> right) => AccumulatedInt16DotProduct(left, right, acc);
// VPDPWUSD ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
public static Vector256<int> AccumulatedInt16DotProduct(Vector256<ushort> left, Vector256<short> right) => AccumulatedInt16DotProduct(left, right, acc);
// VPDPWUUD ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
public static Vector256<int> AccumulatedInt16DotProduct(Vector256<ushort> left, Vector256<ushort> right) => AccumulatedInt16DotProduct(left, right, acc);
// VPDPWSUDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
public static Vector128<int> AccumulatedInt16DotProductWithSaturation(vector128<short> left, Vector128<ushort> right) => AccumulatedInt16DotProductWithSaturation(left, right, acc);
// VPDPWUSDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
public static Vector128<int> AccumulatedInt16DotProductWithSaturation(vector128<ushort> left, Vector128<short> right) => AccumulatedInt16DotProductWithSaturation(left, right, acc);
// VPDPWUUDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
public static Vector128<int> AccumulatedInt16DotProductWithSaturation(vector128<ushort> left, Vector128<ushort> right) => AccumulatedInt16DotProductWithSaturation(left, right, acc);
// VPDPWSUDS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
public static Vector256<int> AccumulatedInt16DotProductWithSaturation(Vector256<short> left, Vector256<ushort> right) => AccumulatedSaturatedSignedShortDotProduct(left, right, acc);
// VPDPWUSDS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
public static Vector256<int> AccumulatedInt16DotProductWithSaturation(Vector256<ushort> left, Vector256<short> right) => AccumulatedSaturatedSignedShortDotProduct(left, right, acc);
// VPDPWUUDS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
public static Vector256<int> AccumulatedInt16DotProductWithSaturation(Vector256<ushort> left, Vector256<ushort> right) => AccumulatedInt16DotProductWithSaturation(left, right, acc);
[Intrinsic]
public abstract class V512 : Avx10v1.V512
{
// VPDPWSUD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
public static Vector512<int> AccumulatedInt16DotProduct(Vector512<short> left, Vector512<ushort> right) => AccumulatedInt16DotProduct(left, right, acc);
// VPDPWUSD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
public static Vector512<int> AccumulatedInt16DotProduct(Vector512<ushort> left, Vector512<short> right) => AccumulatedInt16DotProduct(left, right, acc);
// VPDPWUUD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
public static Vector512<int> AccumulatedInt16DotProduct(Vector512<ushort> left, Vector512<ushort> right) => AccumulatedInt16DotProduct(left, right, acc);
// VPDPWSUDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
public static Vector512<int> AccumulatedInt16DotProductWithSaturation(Vector512<short> left, Vector512<short> right) => AccumulatedInt16DotProductWithSaturation(left, right, acc);
// VPDPWUSDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
public static Vector512<int> AccumulatedInt16DotProductWithSaturation(Vector512<short> left, Vector512<ushort> right) => AccumulatedInt16DotProductWithSaturation(left, right, acc);
// VPDPWUUDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
public static Vector512<int> AccumulatedInt16DotProductWithSaturation(Vector512<ushort> left, Vector512<ushort> right) => AccumulatedInt16DotProductWithSaturation(left, right, acc);
// VPDPBSSD zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst
public static Vector512<int> AccumulatedByteDotProduct(Vector512<sbyte> left, Vector512<sbyte> right) => AccumulatedSByteDotProduct(left, right, acc);
// VPDPBSUD zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst
public static Vector512<int> AccumulatedByteDotProduct(Vector512<sbyte> left, Vector512<byte> right) => AccumulatedSByteDotProduct(left, right, acc);
// VPDPBUUD zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst
public static Vector512<int> AccumulatedByteDotProduct(Vector512<byte> left, Vector512<byte> right) => AccumulatedSByteDotProduct(left, right, acc);
// VPDPBSSDS zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst
public static Vector512<int> AccumulatedByteDotProductWithSaturation(Vector512<sbyte> left, Vector512<sbyte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);
// VPDPBSUDS zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst
public static Vector512<int> AccumulatedByteDotProductWithSaturation(Vector512<sbyte> left, Vector512<byte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);
// VPDPBUUDS zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst
public static Vector512<int> AccumulatedByteDotProductWithSaturation(Vector512<byte> left, Vector512<byte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);
}
}
}
Vector128<uint> ConvertToVector128UInt32(Vector128<uint> value)
and similar, change to "ConvertScalarTo..."namespace System.Runtime.Intrinsics.X86
{
/// <summary>Provides access to X86 AVX10.1 hardware instructions via intrinsics</summary>
[Intrinsic]
[CLSCompliant(false)]
public abstract class Avx10v2 : Avx10v1
{
internal Avx10v2() { }
public static new bool IsSupported { get => IsSupported; }
// VMINMAXPD xmm1{k1}{z}, xmm2, xmm3/m128/m64bcst, imm8
public static Vector128<double> MinMax(Vector128<double> left, Vector128<double> right, [ConstantExpected] byte control) => MinMax(left, right, mode);
// VMINMAXPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {sae}, imm8
public static Vector256<double> MinMax(Vector256<double> left, Vector256<double> right, [ConstantExpected] byte control) => MinMax(left, right, mode);
// VMINMAXPS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst, imm8
public static Vector128<float> MinMax(Vector128<float> left, Vector128<float> right, [ConstantExpected] byte control) => MinMax(left, right, mode);
// VMINMAXPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {sae}, imm8
public static Vector256<float> MinMax(Vector256<float> left, Vector256<float> right, [ConstantExpected] byte control) => MinMax(left, right, mode);
// VMINMAXSD xmm1{k1}{z}, xmm2, xmm3/m64 {sae}, imm8
public static Vector128<double> MinMaxScalar(Vector128<double> left, Vector128<double> right, [ConstantExpected] byte control) => MinMaxScalar(left, right, mode);
// VMINMAXSS xmm1{k1}{z}, xmm2, xmm3/m32 {sae}, imm8
public static Vector128<float> MinMaxScalar(Vector128<float> left, Vector128<float> right, [ConstantExpected] byte control) => MinMaxScalar(left, right, mode);
// VADDPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {er}
public static Vector256<double> Add(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Add(left, right, mode);
// VADDPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {er}
public static Vector256<float> Add(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Add(left, right, mode);
// VDIVPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {er}
public static Vector256<double> Divide(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Divide(left, right, mode);
// VDIVPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {er}
public static Vector256<float> Divide(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Divide(left, right, mode);
// VCVTPS2IBS xmm1{k1}{z}, xmm2/m128/m32bcst
public static Vector128<int> ConvertToByteWithSaturationAndWidenToInt32(Vector128<float> value) => ConvertToByteWithSaturationAndWidenToInt32(value);
// VCVTPS2IBS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
public static Vector256<int> ConvertToByteWithSaturationAndWidenToInt32(Vector256<float> value) => ConvertToByteWithSaturationAndWidenToInt32(value);
// VCVTPS2IBS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
public static Vector256<int> ConvertToByteWithSaturationAndWidenToInt32(Vector256<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToByteWithSaturationAndWidenToInt32(value, mode);
// VCVTPS2IUBS xmm1{k1}{z}, xmm2/m128/m32bcst
public static Vector128<uint> ConvertToByteWithSaturationAndWidenToUInt32(Vector128<float> value) => ConvertToByteWithSaturationAndWidenToUInt32(value);
// VCVTPS2IUBS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
public static Vector256<uint> ConvertToByteWithSaturationAndWidenToUInt32(Vector256<float> value) => ConvertToByteWithSaturationAndWidenToUInt32(value);
// VCVTPS2IUBS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
public static Vector256<uint> ConvertToByteWithSaturationAndWidenToUInt32(Vector256<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToByteWithSaturationAndWidenToUInt32(value, mode);
// VCVTTPS2IBS xmm1{k1}{z}, xmm2/m128/m32bcst
public static Vector128<int> ConvertToByteWithTruncatedSaturationAndWidenToInt32(Vector128<float> value) => ConvertToByteWithTruncationSaturationAndWidenToInt32(value);
// VCVTTPS2IBS ymm1{k1}{z}, ymm2/m256/m32bcst {sae}
public static Vector256<int> ConvertToByteWithTruncatedSaturationAndWidenToInt32(Vector256<float> value) => ConvertToVector256SByteWithTruncationSaturation(value);
// VCVTTPS2IUBS xmm1{k1}{z}, xmm2/m128/m32bcst
public static Vector128<uint> ConvertToByteWithTruncatedSaturationAndWidenToUInt32(Vector128<float> value) => ConvertToByteWithTruncatedSaturationAndWidenToUInt32(value);
// VCVTTPS2IUBS ymm1{k1}{z}, ymm2/m256/m32bcst {sae}
public static Vector256<uint> ConvertToByteWithTruncatedSaturationAndWidenToUInt32(Vector256<float> value) => ConvertToByteWithTruncatedSaturationAndWidenToUInt32(value);
// VMOVD xmm1, xmm2/m32
public static Vector128<uint> ConvertScalarToVector128UInt32(Vector128<uint> value) => ConvertScalarToVector128UInt32(value);
// VMOVW xmm1, xmm2/m16
public static Vector128<ushort> ConvertScalarToVector128UInt16(Vector128<ushort> value) => ConvertScalarToVector128UInt16(value);
//The below instructions are those where
//embedded rouding support have been added
//to the existing API
// VCVTDQ2PS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
public static Vector256<float> ConvertToVector256Single(Vector256<int> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Single(value, mode);
// VCVTPD2DQ xmm1{k1}{z}, ymm2/m256/m64bcst {er}
public static Vector128<int> ConvertToVector128Int32(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector128Int32(value, mode);
// VCVTPD2PS xmm1{k1}{z}, ymm2/m256/m64bcst {er}
public static Vector128<float> ConvertToVector128Single(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector128Single(value, mode);
// VCVTPD2QQ ymm1{k1}{z}, ymm2/m256/m64bcst {er}
public static Vector256<long> ConvertToVector256Int64(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Int64(value, mode);
// VCVTPD2UDQ xmm1{k1}{z}, ymm2/m256/m64bcst {er}
public static Vector128<uint> ConvertToVector128UInt32(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector128UInt32(value, mode);
// VCVTPD2UQQ ymm1{k1}{z}, ymm2/m256/m64bcst {er}
public static Vector256<ulong> ConvertToVector256UInt64(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256UInt64(value, mode);
// VCVTPS2DQ ymm1{k1}{z}, ymm2/m256/m32bcst {er}
public static Vector256<int> ConvertToVector256Int32(Vector256<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Int32(value, mode);
// VCVTPS2QQ ymm1{k1}{z}, xmm2/m128/m32bcst {er}
public static Vector256<long> ConvertToVector256Int64(Vector128<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Int64(value, mode);
// VCVTPS2UDQ ymm1{k1}{z}, ymm2/m256/m32bcst {er}
public static Vector256<uint> ConvertToVector256UInt32(Vector256<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256UInt32(value, mode);
// VCVTPS2UQQ ymm1{k1}{z}, xmm2/m128/m32bcst {er}
public static Vector256<ulong> ConvertToVector256UInt64(Vector128<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256UInt64(value, mode);
// VCVTQQ2PS xmm1{k1}{z}, ymm2/m256/m64bcst {er}
public static Vector128<float> ConvertToVector128Single(Vector256<ulong> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector128Single(value, mode);
// VCVTQQ2PD ymm1{k1}{z}, ymm2/m256/m64bcst {er}
public static Vector256<double> ConvertToVector256Double(Vector256<ulong> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Double(value, mode);
// VCVTUDQ2PS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
public static Vector256<float> ConvertToVector256Single(Vector256<uint> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Single(value, mode);
// VCVTUQQ2PS xmm1{k1}{z}, ymm2/m256/m64bcst {er}
public static Vector128<float> ConvertToVector128Single(Vector256<long> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector128Single(value, mode);
// VCVTUQQ2PD ymm1{k1}{z}, ymm2/m256/m64bcst {er}
public static Vector256<double> ConvertToVector256Double(Vector256<long> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Double(value, mode);
// VMULPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {er}
public static Vector256<double> Multiply(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Multiply(left, right, mode);
// VMULPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {er}
public static Vector256<float> Multiply(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Multiply(left, right, mode);
// VSCALEFPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {er}
public static Vector256<double> Scale(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Scale(left, right, mode);
// VSCALEFPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {er}
public static Vector256<float> Scale(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Scale(left, right, mode);
// VSQRTPD ymm1{k1}{z}, ymm2/m256/m64bcst {er}
public static Vector256<double> Sqrt(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Sqrt(value, mode);
// VSQRTPS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
public static Vector256<float> Sqrt(Vector256<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Sqrt(value, mode);
// VSUBPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {er}
public static Vector256<double> Subtract(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Subtract(left, right, mode);
// VSUBPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {er}
public static Vector256<float> Subtract(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Subtract(left, right, mode);
[Intrinsic]
public new abstract class X64 : Avx10v1.X64
{
internal X64() { }
public static new bool IsSupported { get => IsSupported; }
}
[Intrinsic]
public abstract class V512 : Avx10v1.V512
{
internal V512() { }
public static new bool IsSupported { get => IsSupported; }
// VMINMAXPD zmm1{k1}{z}, zmm2, zmm3/m512/m64bcst {sae}, imm8
public static Vector512<double> MinMax(Vector512<double> left, Vector512<double> right, [ConstantExpected] byte control) => MinMax(left, right, mode);
// VMINMAXPS zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst {sae}, imm8
public static Vector512<float> MinMax(Vector512<float> left, Vector512<float> right, [ConstantExpected] byte control) => MinMax(left, right, mode);
// VCVTPS2IBS zmm1{k1}{z}, zmm2/m512/m32bcst {er}
public static Vector512<int> ConvertToByteWithSaturationAndWidenToInt32(Vector512<float> value) => ConvertToByteWithSaturationAndWidenToInt32(value);
// VCVTPS2IBS zmm1{k1}{z}, zmm2/m512/m32bcst {er}
public static Vector512<int> ConvertToByteWithSaturationAndWidenToInt32(Vector512<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToByteWithSaturationAndWidenToInt32(value, mode);
// VCVTPS2IUBS zmm1{k1}{z}, zmm2/m512/m32bcst {er}
public static Vector512<uint> ConvertToByteWithSaturationAndWidenToUInt32(Vector512<float> value) => ConvertToByteWithSaturationAndWidenToUInt32(value);
// VCVTPS2IUBS zmm1{k1}{z}, zmm2/m512/m32bcst {er}
public static Vector512<uint> ConvertToByteWithSaturationAndWidenToUInt32(Vector512<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToByteWithSaturationAndWidenToUInt32(value, mode);
// VCVTTPS2IUBS zmm1{k1}{z}, zmm2/m512/m32bcst {sae}
public static Vector512<int> ConvertToByteWithTruncatedSaturationAndWidenToInt32(Vector512<float> value) => ConvertToByteWithTruncatedSaturationAndWidenToInt32(value);
// VCVTTPS2IUBS zmm1{k1}{z}, zmm2/m512/m32bcst {sae}
public static Vector512<uint> ConvertToByteWithTruncatedSaturationAndWidenToUInt32(Vector512<float> value) => ConvertToByteWithTruncatedSaturationAndWidenToUInt32(value);
// This is a 512 extension of previously existing 128/26 inrinsic
// VMPSADBW zmm1{k1}{z}, zmm2, zmm3/m512, imm8
public static Vector512<ushort> MultipleSumAbsoluteDifferences(Vector512<byte> left, Vector512<byte> right, [ConstantExpected] byte mask) => MultipleSumAbsoluteDifferences(left, right, mask);
[Intrinsic]
public new abstract class X64 : Avx10v1.V512.X64
{
internal X64() { }
public static new bool IsSupported { get => IsSupported; }
}
}
}
}
I will also like to discuss the following API
// VMOVD xmm1, xmm2/m32
public static Vector128<uint> ConvertToVector128UInt32(Vector128<uint> value) => ConvertToVector128UInt32(value);
// VMOVW xmm1, xmm2/m16
public static Vector128<ushort> ConvertToVector128UInt16(Vector128<ushort> value) => ConvertToVector128UInt16(value);
and would like to change them to
public static unsafe void StoreLowDWord(byte* address, Vector128<uint> source) => StoreLowDWord(address, source);
public static unsafe void StoreLowWord(byte* address, Vector128<ushort> source) => StoreLowWord(address, source);
public static unsafe Vector128<uint> RetrieveLowDWord(byte* address) => RetrieveLowDWord(address);
public static unsafe Vector128<ushort> RetrieveLowWord(byte* address) => RetrieveLowWord(address);
@khushal1996 the proposed signatures don't match the .NET naming conventions (we'd still use UInt32
) but also don't cover all the functionality the underlying API supports
In particular we already expose the existing movd
/movq
variants that deal with general-purpose to/from SIMD
and so which can already work with loading from or storing to memory. These notably have a signature similar to static Vector128<int> ConvertScalarToVector128Int32(int value)
Likewise while we expose some instructions like movss
, where the managed signature is static Vector128<float> MoveScalar(Vector128<float> upper, Vector128<float> value)
, these preserve the upper bits.
The new movd
/movw
variants are most similar to the existing movq
variant which deals with SIMD to SIMD and zero the upper bits. The latter is exposed in two ways today: MoveScalar
for SIMD to/from SIMD
and ConvertToVector128UInt64
which is general-purpose to/from SIMD
. So it might be goodness for these ones to similarly be MoveScalar
rather than ConvertToVector128*
Thanks @tannergooding To conclude, this is what you are proposing
public static Vector128<uint>MoveScalarUInt32(Vector128<uint>)
In this case, since its a move and no conversions are possible, it'd just be 2-4 MoveScalar
overloads (depending on if we only want unsigned
or also signed
overloads).
Background and motivation
Intel has announced the features available in the next version of
Avx10
(10.2). In order to support this, .NET needs to expand theAvx10
library to include the new APIs.Avx10.2 spec. Section 7 - 14 in this spec goes over the newly added instructions. A couple of interesting features here are
MinMax
andsaturating conversions
As part of the original API Proposal, the proposed design was for future
Avx10
versions to have their own classes which inherits fromAvx10v1
API Proposal
API Usage
Alternative Designs
No response
Risks
No response