dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.48k stars 4.76k forks source link

[API Proposal]: Add AVX10v2 API to add Avx10.2 support #109083

Open DeepakRajendrakumaran opened 1 month ago

DeepakRajendrakumaran commented 1 month ago

Background and motivation

Intel has announced the features available in the next version of Avx10(10.2). In order to support this, .NET needs to expand the Avx10library to include the new APIs.

Avx10.2 spec. Section 7 - 14 in this spec goes over the newly added instructions. A couple of interesting features here are MinMaxand saturating conversions

As part of the original API Proposal, the proposed design was for future Avx10 versions to have their own classes which inherits from Avx10v1

API Proposal

namespace System.Runtime.Intrinsics.X86
{
    /// <summary>Provides access to X86 AVX10.1 hardware instructions via intrinsics</summary>
    [Intrinsic]
    [CLSCompliant(false)]
    public abstract class Avx10v2 : Avx10v1
    {
        internal Avx10v2() { }

        public static new bool IsSupported { get => IsSupported; }

        // VMINMAXPD xmm1{k1}{z}, xmm2, xmm3/m128/m64bcst, imm8
        public static Vector128<double> MinMax(Vector128<double> left, Vector128<double> right, [ConstantExpected] byte control) => MinMax(left, right, mode);

        // VMINMAXPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {sae}, imm8
        public static Vector256<double> MinMax(Vector256<double> left, Vector256<double> right, [ConstantExpected] byte control) => MinMax(left, right, mode);

        // VMINMAXPS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst, imm8
        public static Vector128<float> MinMax(Vector128<float> left, Vector128<float> right, [ConstantExpected] byte control) => MinMax(left, right, mode);

        // VMINMAXPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {sae}, imm8
        public static Vector256<float> MinMax(Vector256<float> left, Vector256<float> right, [ConstantExpected] byte control) => MinMax(left, right, mode);

        // VMINMAXSD xmm1{k1}{z}, xmm2, xmm3/m64 {sae}, imm8
        public static double MinMaxScalar(Vector128<double> left, Vector128<double> right, [ConstantExpected] byte control) => MinMaxScalar(left, right, mode);

        // VMINMAXSS xmm1{k1}{z}, xmm2, xmm3/m32 {sae}, imm8
        public static float MinMaxScalar(Vector128<float> left, Vector128<float> right, [ConstantExpected] byte control) => MinMaxScalar(left, right, mode);

        // VADDPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {er}
        public static Vector256<double> Add(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Add(left, right, mode);

        // VADDPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {er}
        public static Vector256<float> Add(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Add(left, right, mode);

        // VDIVPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {er}
        public static Vector256<double> Divide(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Divide(left, right, mode);

        // VDIVPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {er}
        public static Vector256<float> Divide(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Divide(left, right, mode);

        // VCVTPS2IBS xmm1{k1}{z}, xmm2/m128/m32bcst
        public static Vector128<int> ConvertToByteWithSaturationAndWidenToInt32(Vector128<float> value) => ConvertToByteWithSaturationAndWidenToInt32(value);

        // VCVTPS2IBS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<int> ConvertToByteWithSaturationAndWidenToInt32(Vector256<float> value) => ConvertToByteWithSaturationAndWidenToInt32(value);

        // VCVTPS2IBS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<int> ConvertToByteWithSaturationAndWidenToInt32(Vector256<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToByteWithSaturationAndWidenToInt32(value, mode);

        // VCVTPS2IUBS xmm1{k1}{z}, xmm2/m128/m32bcst
        public static Vector128<uint> ConvertToByteWithSaturationAndWidenToUInt32(Vector128<float> value) => ConvertToByteWithSaturationAndWidenToUInt32(value);

        // VCVTPS2IUBS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<uint> ConvertToByteWithSaturationAndWidenToUInt32(Vector256<float> value) => ConvertToByteWithSaturationAndWidenToUInt32(value);

        // VCVTPS2IUBS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<uint> ConvertToByteWithSaturationAndWidenToUInt32(Vector256<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToByteWithSaturationAndWidenToUInt32(value, mode);

        // VCVTTPS2IBS xmm1{k1}{z}, xmm2/m128/m32bcst
        public static Vector128<int> ConvertToByteWithTruncationSaturationAndWidenToInt32(Vector128<float> value) => ConvertToByteWithTruncationSaturationAndWidenToInt32(value);

        // VCVTTPS2IBS ymm1{k1}{z}, ymm2/m256/m32bcst {sae}
        public static Vector256<int> ConvertToByteWithTruncationSaturationAndWidenToInt32(Vector256<float> value) => ConvertToVector256SByteWithTruncationSaturation(value);

        // VCVTTPS2IUBS xmm1{k1}{z}, xmm2/m128/m32bcst
        public static Vector128<uint> ConvertToByteWithTruncationSaturationAndWidenToUInt32(Vector128<float> value) => ConvertToByteWithTruncationSaturationAndWidenToUInt32(value);

        // VCVTTPS2IUBS ymm1{k1}{z}, ymm2/m256/m32bcst {sae}
        public static Vector256<uint> ConvertToByteWithTruncationSaturationAndWidenToUInt32(Vector256<float> value) => ConvertToByteWithTruncationSaturationAndWidenToUInt32(value);

        // VMOVD xmm1, xmm2/m32
        public static Vector128<uint> ConvertToVector128UInt32(Vector128<uint> value) => ConvertToVector128UInt32(value);

        // VMOVW xmm1, xmm2/m16
        public static Vector128<ushort> ConvertToVector128UInt16(Vector128<ushort> value) => ConvertToVector128UInt16(value);

        //The below instructions are those where 
        //embedded rouding support have been added 
        //to the existing API

        // VCVTDQ2PS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<float> ConvertToVector256Single(Vector256<int> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Single(value, mode);

        // VCVTPD2DQ xmm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector128<int> ConvertToVector128Int32(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector128Int32(value, mode);

        // VCVTPD2PS xmm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector128<float> ConvertToVector128Single(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector128Single(value, mode);

        // VCVTPD2QQ ymm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector256<long> ConvertToVector256Int64(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Int64(value, mode);

        // VCVTPD2UDQ xmm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector128<uint> ConvertToVector128UInt32(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector128UInt32(value, mode);

        // VCVTPD2UQQ ymm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector256<ulong> ConvertToVector256UInt64(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256UInt64(value, mode);

        // VCVTPS2DQ ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<int> ConvertToVector256Int32(Vector256<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Int32(value, mode);

        // VCVTPS2QQ ymm1{k1}{z}, xmm2/m128/m32bcst {er}
        public static Vector256<long> ConvertToVector256Int64(Vector128<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Int64(value, mode);

        // VCVTPS2UDQ ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<uint> ConvertToVector256UInt32(Vector256<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256UInt32(value, mode);

        // VCVTPS2UQQ ymm1{k1}{z}, xmm2/m128/m32bcst {er}
        public static Vector256<ulong> ConvertToVector256UInt64(Vector128<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256UInt64(value, mode);

        // VCVTQQ2PS xmm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector128<float> ConvertToVector128Single(Vector256<ulong> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector128Single(value, mode);

        // VCVTQQ2PD ymm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector256<double> ConvertToVector256Double(Vector256<ulong> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Double(value, mode);

        // VCVTUDQ2PS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<float> ConvertToVector256Single(Vector256<uint> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Single(value, mode);

        // VCVTUQQ2PS xmm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector128<float> ConvertToVector128Single(Vector256<long> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector128Single(value, mode);

        // VCVTUQQ2PD ymm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector256<double> ConvertToVector256Double(Vector256<long> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Double(value, mode);

        // VMULPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {er}
        public static Vector256<double> Multiply(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Multiply(left, right, mode);

        // VMULPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {er}
        public static Vector256<float> Multiply(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Multiply(left, right, mode);

        // VSCALEFPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {er}
        public static Vector256<double> Scale(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Scale(left, right, mode);

        // VSCALEFPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {er}
        public static Vector256<float> Scale(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Scale(left, right, mode);

        // VSQRTPD ymm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector256<double> Sqrt(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Sqrt(value, mode);

        // VSQRTPS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<float> Sqrt(Vector256<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Sqrt(value, mode);

        // VSUBPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {er}
        public static Vector256<double> Subtract(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Subtract(left, right, mode);

        // VSUBPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {er}
        public static Vector256<float> Subtract(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Subtract(left, right, mode);

        [Intrinsic]
        public new abstract class X64 : Avx10v1.X64
        {
            internal X64() { }

            public static new bool IsSupported { get => IsSupported; }
        }

        [Intrinsic]
        public abstract class V512 : Avx10v1.V512
        {
            internal V512() { }

            public static new bool IsSupported { get => IsSupported; }

            // VMINMAXPD zmm1{k1}{z}, zmm2, zmm3/m512/m64bcst {sae}, imm8
            public static Vector512<double> MinMax(Vector512<double> left, Vector512<double> right, [ConstantExpected] byte control) => MinMax(left, right, mode);

            // VMINMAXPS zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst {sae}, imm8
            public static Vector512<float> MinMax(Vector512<float> left, Vector512<float> right, [ConstantExpected] byte control) => MinMax(left, right, mode);

            // VCVTPS2IBS zmm1{k1}{z}, zmm2/m512/m32bcst {er}
            public static Vector512<int> ConvertToByteWithSaturationAndWidenToInt32(Vector512<float> value) => ConvertToByteWithSaturationAndWidenToInt32(value);

            // VCVTPS2IBS zmm1{k1}{z}, zmm2/m512/m32bcst {er}
            public static Vector512<int> ConvertToByteWithSaturationAndWidenToInt32(Vector512<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToByteWithSaturationAndWidenToInt32(value, mode);

            // VCVTPS2IUBS zmm1{k1}{z}, zmm2/m512/m32bcst {er}
            public static Vector512<uint> ConvertToByteWithSaturationAndWidenToUInt32(Vector512<float> value) => ConvertToByteWithSaturationAndWidenToUInt32(value);

            // VCVTPS2IUBS zmm1{k1}{z}, zmm2/m512/m32bcst {er}
            public static Vector512<uint> ConvertToByteWithSaturationAndWidenToUInt32(Vector512<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToByteWithSaturationAndWidenToUInt32(value, mode);

            // VCVTTPS2IUBS zmm1{k1}{z}, zmm2/m512/m32bcst {sae}
            public static Vector512<int> ConvertToByteWithTruncationSaturationAndWidenToInt32(Vector512<float> value) => ConvertToByteWithTruncationSaturationAndWidenToInt32(value);

            // VCVTTPS2IUBS zmm1{k1}{z}, zmm2/m512/m32bcst {sae}
            public static Vector512<uint> ConvertToByteWithTruncationSaturationAndWidenToUInt32(Vector512<float> value) => ConvertToByteWithTruncationSaturationAndWidenToUInt32(value);

            // This is a 512 extension of previously existing 128/26 inrinsic
            // VMPSADBW zmm1{k1}{z}, zmm2, zmm3/m512, imm8
            public static Vector512<ushort> MultipleSumAbsoluteDifferences(Vector512<byte> left, Vector512<byte> right, [ConstantExpected] byte mask) => MultipleSumAbsoluteDifferences(left, right, mask);

            [Intrinsic]
            public new abstract class X64 : Avx10v1.V512.X64
            {
                internal X64() { }

                public static new bool IsSupported { get => IsSupported; }
            }
        }
    }
}

API Usage

Vector128<float> v1 = Vector512.Create((float)someParam1);
Vector128<float> v2 = Vector512.Create((float)someParam2);
if (Avx10v2.IsSupported()) {
  Vector128<float> v3 = Avx10v2.MinMaxVector(v1, v2, 0b00000000);
  // etc
}

Alternative Designs

No response

Risks

No response

dotnet-policy-service[bot] commented 1 month ago

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics See info in area-owners.md if you want to be subscribed.

DeepakRajendrakumaran commented 1 month ago

The following instructions which are part of Avx10.2 are not mentioned above. These fall under mostly 2 groups - 16 bit floating point and FMA instructions

`


Instructions Skipped - 
Entire section 7 in AVX10.2 manual

Parts of Section 8 in AVX10.2 manual
- VCOMXSH
- VUCOMXSH

Entire Section 9 in AVX10.2 manual

Parts of Section 10 in AVX10.2 manual
- VDPPHPS

Parts of Section 11 in AVX10.2 manual
- VMINMAXNEPBF16

Parts of Section 12 in AVX10.2 manual
- VADDPH
- VCMPPH
- VCVTDQ2PH
- VCVTPD2PH
- VCVTPH2DQ
- VCVTPH2PD
- VCVTPH2PS
- VCVTPH2PSX
- VCVTPH2QQ
- VCVTPH2UDQ
- VCVTPH2UQQ
- VCVTPH2UW
- VCVTPH2W
- VCVTPS2PH
- VCVTPS2PHX
- VCVTQQ2PH
- VCVTTPH2DQ
- VCVTTPH2QQ
- VCVTTPH2UDQ
- VCVTTPH2UQQ
- VCVTTPH2UW
- VCVTTPH2W
- VCVTUDQ2PH
- VCVTUQQ2PH
- VCVTUW2PH
- VCVTW2PH
- VDIVPH
- VFCMADDCPH
- VFCMULCPH
- VFMADD132PD - Prior instructions dont exist
- VFMADD132PH
- VFMADD132PS - Prior instructions dont exist
- VFMADD213PD - Prior instructions dont exist
- VFMADD213PH
- VFMADD213PS - Prior instructions dont exist
- VFMADD231PD - Prior instructions dont exist
- VFMADD231PH
- VFMADD231PS - Prior instructions dont exist
- VFMADDCPH
- VFMADDSUB132PD - Prior instructions dont exist
- VFMADDSUB132PH
- VFMADDSUB132PS - Prior instructions dont exist
- VFMADDSUB213PD - Prior instructions dont exist
- VFMADDSUB213PH
- VFMADDSUB213PS - Prior instructions dont exist
- VFMADDSUB231PD - Prior instructions dont exist
- VFMADDSUB231PH
- VFMADDSUB231PS - Prior instructions dont exist
- VFMSUB132PD - Prior instructions dont exist
- VFMSUB132PH
- VFMSUB132PS - Prior instructions dont exist
- VFMSUB213PD - Prior instructions dont exist
- VFMSUB213PH
- VFMSUB213PS - Prior instructions dont exist
- VFMSUB231PD - Prior instructions dont exist
- VFMSUB231PH
- VFMSUB231PS - Prior instructions dont exist
- VFMSUBADD132PD - Prior instructions dont exist
- VFMSUBADD132PH
- VFMSUBADD132PS - Prior instructions dont exist
- VFMSUBADD213PD - Prior instructions dont exist
- VFMSUBADD213PH
- VFMSUBADD213PS - Prior instructions dont exist
- VFMSUBADD231PD - Prior instructions dont exist
- VFMSUBADD231PH
- VFMSUBADD231PS - Prior instructions dont exist
- VFMULCPH
- VFNMADD132PD - Prior instructions dont exist
- VFNMADD132PH
- VFNMADD132PS - Prior instructions dont exist
- VFNMADD213PD - Prior instructions dont exist
- VFNMADD213PH
- VFNMADD213PS - Prior instructions dont exist
- VFNMADD231PD - Prior instructions dont exist
- VFNMADD231PH
- VFNMADD231PS - Prior instructions dont exist
- VFNMSUB132PD - Prior instructions dont exist
- VFNMSUB132PH
- VFNMSUB132PS - Prior instructions dont exist
- VFNMSUB213PD - Prior instructions dont exist
- VFNMSUB213PH
- VFNMSUB213PS - Prior instructions dont exist
- VFNMSUB231PD - Prior instructions dont exist
- VFNMSUB231PH
- VFNMSUB231PS - Prior instructions dont exist
- VGETEXPPH
- VGETMANTPH
- VMAXPH
- VMINPH
- VMULPH
- VREDUCEPH
- VRNDSCALEPH
- VSQRTPH
- VSUBPH

Parts of Section 13 in AVX10.2 manual
- VCVT[,T]NEBF162I[,U]BS
- VCVT[,T]PH2I[,U]BS
tannergooding commented 1 month ago

Haven't finished going through the list, but as initial feedback:

DeepakRajendrakumaran commented 1 month ago

Haven't finished going through the list, but as initial feedback:

  • MinMaxVector should be named just MinMax
  • MinMax should instead be named MinMaxScalar
  • The various Compare*Enhanced APIs are unnecessary, we can implicitly use these instructions for the existing Compare* APIs, since its simply setting different flags allowing more optimal codegen for subsequent branches or conditional moves
  • It'd be helpful to separate out (such as via a separate code block or proposal) the "new instruction forms" where they aren't new concepts, but rather just new overloads of existing APIs (typically taking V256<T> and FloatRoundingMode)
  • For APIs like ConvertWithSaturationPackedFloatToSignedByteInteger, we want to use the .NET type names, so Single is preferred over Float, SByte over SignedByte, etc

    • for signed integers we have SByte, Int16, Int32, Int64
    • for unsigned integers we have Byte, UInt16, UInt32, UInt64
    • for floating-point we have Half, Single, Double
  • For APIs like ConvertWithSaturationPackedFloatToSignedByteInteger, we probably want to more closely parity the existing names like ConvertToVector128Int32WithTruncation and so would call it ConvertToVector128ByteWithSaturation

Thank you. I will leave you a comment when I have made all required changes.

khushal1996 commented 1 month ago

@tannergooding Thanks for the review. About the nomenclature for Convert APIs, for something like // VCVTPS2IBS ymm1{k1}{z}, ymm2/m256/m32bcst {er}, should we use ConvertToVector128ByteWithSaturation or ConvertToVector128SByteInt16WithSaturation? Because the instruction description is something like -->

These instructions convert four, eight or sixteen packed single-precision floating-point values in the source operand to four, eight or sixteen signed or unsigned byte integers in the destination operand. The downconverted 8-bit result is written inplace at the lower 8-bit of the corresponding 32-bit element. The upper 3 bytes are zeroed. VCVTPS2IBS converts single-precision floating point elements into signed byte integer elements.

Let me know what you think.

tannergooding commented 1 month ago

I'll need to think about it more.

It is important we document the behavior which is conversion to byte so that users understand what the API is doing. It is then important we document the return type of Vector128<int> so that it doesn't cause issues with overload resolution, since you cannot overload by return type.

It's functionally doing a ConvertToVector128ByteWithSaturationAndWidenToVector128Int32, which is a very verbose name.

khushal1996 commented 1 month ago

True. I was thinking on similar lines ConvertToVector128ByteWithSaturationAndWidenToVector128Int32 but wanted to keep it a little shorter and also describe that it widens to int32. How about ConvertToVector128SByteWithSaturationWidenToInt32? alteast we can remove the vector128 after widen.

DeepakRajendrakumaran commented 1 month ago

I have updated the names and made the other changes. For the Widen ones, let me know how you want us to update those. The ones this might apply to are Accumulated*DotProduct*

and convert intrinsics where widening is happening

DeepakRajendrakumaran commented 1 month ago

I'll need to think about it more.

It is important we document the behavior which is conversion to byte so that users understand what the API is doing. It is then important we document the return type of Vector128<int> so that it doesn't cause issues with overload resolution, since you cannot overload by return type.

It's functionally doing a ConvertToVector128ByteWithSaturationAndWidenToVector128Int32, which is a very verbose name.

Hi Tanner - have you decided on how you want the 'widen' API's to be named?

tannergooding commented 1 month ago

I think we should default to the verbose name, which is the most consistent with our other APIs and the least problematic.

We'll likely discuss some of the alternatives in API review and it wouldn't hurt to have them listed.

Notably we have Vector128<int> ConvertToInt32(Vector128<float> value) in SSE2 (and similar for Int64/UInt32/UInt654 in other ISAs), so something like ConvertToByteWithSaturationAndWidenToInt32 might be a feasible shorter name that won't conflict, but I don't think we could get much shorter otherwise.

DeepakRajendrakumaran commented 1 month ago

I think we should default to the verbose name, which is the most consistent with our other APIs and the least problematic.

We'll likely discuss some of the alternatives in API review and it wouldn't hurt to have them listed.

Notably we have Vector128<int> ConvertToInt32(Vector128<float> value) in SSE2 (and similar for Int64/UInt32/UInt654 in other ISAs), so something like ConvertToByteWithSaturationAndWidenToInt32 might be a feasible shorter name that won't conflict, but I don't think we could get much shorter otherwise.

I've updated the names for these, Do you think it makes sense to have 'Widen' in the name for the accumulated dot product ones as well?

DeepakRajendrakumaran commented 3 weeks ago

I think we should default to the verbose name, which is the most consistent with our other APIs and the least problematic. We'll likely discuss some of the alternatives in API review and it wouldn't hurt to have them listed. Notably we have Vector128<int> ConvertToInt32(Vector128<float> value) in SSE2 (and similar for Int64/UInt32/UInt654 in other ISAs), so something like ConvertToByteWithSaturationAndWidenToInt32 might be a feasible shorter name that won't conflict, but I don't think we could get much shorter otherwise.

I've updated the names for these, Do you think it makes sense to have 'Widen' in the name for the accumulated dot product ones as well?

@tannergooding What are the next steps for this?

tannergooding commented 2 weeks ago

I've filtered out the VPDPB[SU,UIU,SS]D[,S] instructions from the initial review as the names aren't correct

namespace System.Runtime.Intrinsics.X86
{
    /// <summary>Provides access to X86 AVX10.1 hardware instructions via intrinsics</summary>
    [Intrinsic]
    [CLSCompliant(false)]
    public abstract class Avx10v2 : Avx10v1
    {
        // VPDPBSSD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
        public static Vector128<int> AccumulatedByteDotProduct(vector128<sbyte> left, Vector128<sbyte> right) => AccumulatedByteDotProduct(left, right, acc);

        // VPDPBSUD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
        public static Vector128<int> AccumulatedByteDotProduct(vector128<sbyte> left, Vector128<byte> right) => AccumulatedByteDotProduct(left, right, acc);

        // VPDPBUUD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
        public static Vector128<int> AccumulatedByteDotProduct(vector128<byte> left, Vector128<byte> right) => AccumulatedSignedByteDotProduct(left, right, acc);

        // VPDPBSSD ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
        public static Vector256<int> AccumulatedByteDotProduct(Vector256<sbyte> left, Vector256<sbyte> right) => AccumulatedByteDotProduct(left, right, acc);

        // VPDPBSUD ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
        public static Vector256<int> AccumulatedByteDotProduct(Vector256<sbyte> left, Vector256<byte> right) => AccumulatedSignedByteDotProduct(left, right, acc);

        // VPDPBUUD ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
        public static Vector256<int> AccumulatedByteDotProduct(Vector256<byte> left, Vector256<byte> right) => AccumulatedByteDotProduct(left, right, acc);

        // VPDPBSSDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
        public static Vector128<int> AccumulatedByteDotProductWithSaturation(vector128<sbyte> left, Vector128<sbyte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);

        // VPDPBSUDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
        public static Vector128<int> AccumulatedByteDotProductWithSaturation(vector128<sbyte> left, Vector128<byte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);

        // VPDPBUUDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
        public static Vector128<int> AccumulatedByteDotProductWithSaturation(vector128<byte> left, Vector128<byte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);

        // VPDPBSSDS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
        public static Vector256<int> AccumulatedByteDotProductWithSaturation(Vector256<sbyte> left, Vector256<sbyte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);

        // VPDPBSUDS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
        public static Vector256<int> AccumulatedByteDotProductWithSaturation(Vector256<sbyte> left, Vector256<byte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);

        // VPDPBUUDS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
        public static Vector256<int> AccumulatedByteDotProductWithSaturation(Vector256<byte> left, Vector256<byte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);

        // VPDPWSUD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
        public static Vector128<int> AccumulatedInt16DotProduct(vector128<short> left, Vector128<ushort> right) => AccumulatedInt16DotProduct(left, right, acc);

        // VPDPWUSD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
        public static Vector128<int> AccumulatedInt16DotProduct(vector128<ushort> left, Vector128<short> right) => AccumulatedInt16DotProduct(left, right, acc);

        // VPDPWUUD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
        public static Vector128<int> AccumulatedInt16DotProduct(vector128<ushort> left, Vector128<ushort> right) => AccumulatedInt16DotProduct(left, right, acc);

        // VPDPWSUD ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
        public static Vector256<int> AccumulatedInt16DotProduct(Vector256<short> left, Vector256<ushort> right) => AccumulatedInt16DotProduct(left, right, acc);

        // VPDPWUSD ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
        public static Vector256<int> AccumulatedInt16DotProduct(Vector256<ushort> left, Vector256<short> right) => AccumulatedInt16DotProduct(left, right, acc);

        // VPDPWUUD ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
        public static Vector256<int> AccumulatedInt16DotProduct(Vector256<ushort> left, Vector256<ushort> right) => AccumulatedInt16DotProduct(left, right, acc);

        // VPDPWSUDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
        public static Vector128<int> AccumulatedInt16DotProductWithSaturation(vector128<short> left, Vector128<ushort> right) => AccumulatedInt16DotProductWithSaturation(left, right, acc);

        // VPDPWUSDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
        public static Vector128<int> AccumulatedInt16DotProductWithSaturation(vector128<ushort> left, Vector128<short> right) => AccumulatedInt16DotProductWithSaturation(left, right, acc);

        // VPDPWUUDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
        public static Vector128<int> AccumulatedInt16DotProductWithSaturation(vector128<ushort> left, Vector128<ushort> right) => AccumulatedInt16DotProductWithSaturation(left, right, acc);

        // VPDPWSUDS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
        public static Vector256<int> AccumulatedInt16DotProductWithSaturation(Vector256<short> left, Vector256<ushort> right) => AccumulatedSaturatedSignedShortDotProduct(left, right, acc);

        // VPDPWUSDS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
        public static Vector256<int> AccumulatedInt16DotProductWithSaturation(Vector256<ushort> left, Vector256<short> right) => AccumulatedSaturatedSignedShortDotProduct(left, right, acc);

        // VPDPWUUDS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
        public static Vector256<int> AccumulatedInt16DotProductWithSaturation(Vector256<ushort> left, Vector256<ushort> right) => AccumulatedInt16DotProductWithSaturation(left, right, acc);

        [Intrinsic]
        public abstract class V512 : Avx10v1.V512
        {   
            // VPDPWSUD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
            public static Vector512<int> AccumulatedInt16DotProduct(Vector512<short> left, Vector512<ushort> right) => AccumulatedInt16DotProduct(left, right, acc);

            // VPDPWUSD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
            public static Vector512<int> AccumulatedInt16DotProduct(Vector512<ushort> left, Vector512<short> right) => AccumulatedInt16DotProduct(left, right, acc);

            // VPDPWUUD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
            public static Vector512<int> AccumulatedInt16DotProduct(Vector512<ushort> left, Vector512<ushort> right) => AccumulatedInt16DotProduct(left, right, acc);

            // VPDPWSUDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
            public static Vector512<int> AccumulatedInt16DotProductWithSaturation(Vector512<short> left, Vector512<short> right) => AccumulatedInt16DotProductWithSaturation(left, right, acc);

            // VPDPWUSDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
            public static Vector512<int> AccumulatedInt16DotProductWithSaturation(Vector512<short> left, Vector512<ushort> right) => AccumulatedInt16DotProductWithSaturation(left, right, acc);

            // VPDPWUUDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
            public static Vector512<int> AccumulatedInt16DotProductWithSaturation(Vector512<ushort> left, Vector512<ushort> right) => AccumulatedInt16DotProductWithSaturation(left, right, acc);

            // VPDPBSSD zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst
            public static Vector512<int> AccumulatedByteDotProduct(Vector512<sbyte> left, Vector512<sbyte> right) => AccumulatedSByteDotProduct(left, right, acc);

            // VPDPBSUD zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst
            public static Vector512<int> AccumulatedByteDotProduct(Vector512<sbyte> left, Vector512<byte> right) => AccumulatedSByteDotProduct(left, right, acc);

            // VPDPBUUD zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst
            public static Vector512<int> AccumulatedByteDotProduct(Vector512<byte> left, Vector512<byte> right) => AccumulatedSByteDotProduct(left, right, acc);

            // VPDPBSSDS zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst
            public static Vector512<int> AccumulatedByteDotProductWithSaturation(Vector512<sbyte> left, Vector512<sbyte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);

            // VPDPBSUDS zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst
            public static Vector512<int> AccumulatedByteDotProductWithSaturation(Vector512<sbyte> left, Vector512<byte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);

            // VPDPBUUDS zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst
            public static Vector512<int> AccumulatedByteDotProductWithSaturation(Vector512<byte> left, Vector512<byte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);
        }
    }
}
bartonjs commented 2 weeks ago

Video

namespace System.Runtime.Intrinsics.X86
{
    /// <summary>Provides access to X86 AVX10.1 hardware instructions via intrinsics</summary>
    [Intrinsic]
    [CLSCompliant(false)]
    public abstract class Avx10v2 : Avx10v1
    {
        internal Avx10v2() { }

        public static new bool IsSupported { get => IsSupported; }

        // VMINMAXPD xmm1{k1}{z}, xmm2, xmm3/m128/m64bcst, imm8
        public static Vector128<double> MinMax(Vector128<double> left, Vector128<double> right, [ConstantExpected] byte control) => MinMax(left, right, mode);

        // VMINMAXPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {sae}, imm8
        public static Vector256<double> MinMax(Vector256<double> left, Vector256<double> right, [ConstantExpected] byte control) => MinMax(left, right, mode);

        // VMINMAXPS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst, imm8
        public static Vector128<float> MinMax(Vector128<float> left, Vector128<float> right, [ConstantExpected] byte control) => MinMax(left, right, mode);

        // VMINMAXPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {sae}, imm8
        public static Vector256<float> MinMax(Vector256<float> left, Vector256<float> right, [ConstantExpected] byte control) => MinMax(left, right, mode);

        // VMINMAXSD xmm1{k1}{z}, xmm2, xmm3/m64 {sae}, imm8
        public static Vector128<double> MinMaxScalar(Vector128<double> left, Vector128<double> right, [ConstantExpected] byte control) => MinMaxScalar(left, right, mode);

        // VMINMAXSS xmm1{k1}{z}, xmm2, xmm3/m32 {sae}, imm8
        public static Vector128<float> MinMaxScalar(Vector128<float> left, Vector128<float> right, [ConstantExpected] byte control) => MinMaxScalar(left, right, mode);

        // VADDPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {er}
        public static Vector256<double> Add(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Add(left, right, mode);

        // VADDPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {er}
        public static Vector256<float> Add(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Add(left, right, mode);

        // VDIVPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {er}
        public static Vector256<double> Divide(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Divide(left, right, mode);

        // VDIVPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {er}
        public static Vector256<float> Divide(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Divide(left, right, mode);

        // VCVTPS2IBS xmm1{k1}{z}, xmm2/m128/m32bcst
        public static Vector128<int> ConvertToByteWithSaturationAndWidenToInt32(Vector128<float> value) => ConvertToByteWithSaturationAndWidenToInt32(value);

        // VCVTPS2IBS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<int> ConvertToByteWithSaturationAndWidenToInt32(Vector256<float> value) => ConvertToByteWithSaturationAndWidenToInt32(value);

        // VCVTPS2IBS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<int> ConvertToByteWithSaturationAndWidenToInt32(Vector256<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToByteWithSaturationAndWidenToInt32(value, mode);

        // VCVTPS2IUBS xmm1{k1}{z}, xmm2/m128/m32bcst
        public static Vector128<uint> ConvertToByteWithSaturationAndWidenToUInt32(Vector128<float> value) => ConvertToByteWithSaturationAndWidenToUInt32(value);

        // VCVTPS2IUBS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<uint> ConvertToByteWithSaturationAndWidenToUInt32(Vector256<float> value) => ConvertToByteWithSaturationAndWidenToUInt32(value);

        // VCVTPS2IUBS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<uint> ConvertToByteWithSaturationAndWidenToUInt32(Vector256<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToByteWithSaturationAndWidenToUInt32(value, mode);

        // VCVTTPS2IBS xmm1{k1}{z}, xmm2/m128/m32bcst
        public static Vector128<int> ConvertToByteWithTruncatedSaturationAndWidenToInt32(Vector128<float> value) => ConvertToByteWithTruncationSaturationAndWidenToInt32(value);

        // VCVTTPS2IBS ymm1{k1}{z}, ymm2/m256/m32bcst {sae}
        public static Vector256<int> ConvertToByteWithTruncatedSaturationAndWidenToInt32(Vector256<float> value) => ConvertToVector256SByteWithTruncationSaturation(value);

        // VCVTTPS2IUBS xmm1{k1}{z}, xmm2/m128/m32bcst
        public static Vector128<uint> ConvertToByteWithTruncatedSaturationAndWidenToUInt32(Vector128<float> value) => ConvertToByteWithTruncatedSaturationAndWidenToUInt32(value);

        // VCVTTPS2IUBS ymm1{k1}{z}, ymm2/m256/m32bcst {sae}
        public static Vector256<uint> ConvertToByteWithTruncatedSaturationAndWidenToUInt32(Vector256<float> value) => ConvertToByteWithTruncatedSaturationAndWidenToUInt32(value);

        // VMOVD xmm1, xmm2/m32
        public static Vector128<uint> ConvertScalarToVector128UInt32(Vector128<uint> value) => ConvertScalarToVector128UInt32(value);

        // VMOVW xmm1, xmm2/m16
        public static Vector128<ushort> ConvertScalarToVector128UInt16(Vector128<ushort> value) => ConvertScalarToVector128UInt16(value);

        //The below instructions are those where 
        //embedded rouding support have been added 
        //to the existing API

        // VCVTDQ2PS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<float> ConvertToVector256Single(Vector256<int> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Single(value, mode);

        // VCVTPD2DQ xmm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector128<int> ConvertToVector128Int32(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector128Int32(value, mode);

        // VCVTPD2PS xmm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector128<float> ConvertToVector128Single(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector128Single(value, mode);

        // VCVTPD2QQ ymm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector256<long> ConvertToVector256Int64(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Int64(value, mode);

        // VCVTPD2UDQ xmm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector128<uint> ConvertToVector128UInt32(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector128UInt32(value, mode);

        // VCVTPD2UQQ ymm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector256<ulong> ConvertToVector256UInt64(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256UInt64(value, mode);

        // VCVTPS2DQ ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<int> ConvertToVector256Int32(Vector256<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Int32(value, mode);

        // VCVTPS2QQ ymm1{k1}{z}, xmm2/m128/m32bcst {er}
        public static Vector256<long> ConvertToVector256Int64(Vector128<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Int64(value, mode);

        // VCVTPS2UDQ ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<uint> ConvertToVector256UInt32(Vector256<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256UInt32(value, mode);

        // VCVTPS2UQQ ymm1{k1}{z}, xmm2/m128/m32bcst {er}
        public static Vector256<ulong> ConvertToVector256UInt64(Vector128<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256UInt64(value, mode);

        // VCVTQQ2PS xmm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector128<float> ConvertToVector128Single(Vector256<ulong> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector128Single(value, mode);

        // VCVTQQ2PD ymm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector256<double> ConvertToVector256Double(Vector256<ulong> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Double(value, mode);

        // VCVTUDQ2PS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<float> ConvertToVector256Single(Vector256<uint> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Single(value, mode);

        // VCVTUQQ2PS xmm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector128<float> ConvertToVector128Single(Vector256<long> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector128Single(value, mode);

        // VCVTUQQ2PD ymm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector256<double> ConvertToVector256Double(Vector256<long> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Double(value, mode);

        // VMULPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {er}
        public static Vector256<double> Multiply(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Multiply(left, right, mode);

        // VMULPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {er}
        public static Vector256<float> Multiply(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Multiply(left, right, mode);

        // VSCALEFPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {er}
        public static Vector256<double> Scale(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Scale(left, right, mode);

        // VSCALEFPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {er}
        public static Vector256<float> Scale(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Scale(left, right, mode);

        // VSQRTPD ymm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector256<double> Sqrt(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Sqrt(value, mode);

        // VSQRTPS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<float> Sqrt(Vector256<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Sqrt(value, mode);

        // VSUBPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {er}
        public static Vector256<double> Subtract(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Subtract(left, right, mode);

        // VSUBPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {er}
        public static Vector256<float> Subtract(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Subtract(left, right, mode);

        [Intrinsic]
        public new abstract class X64 : Avx10v1.X64
        {
            internal X64() { }

            public static new bool IsSupported { get => IsSupported; }
        }

        [Intrinsic]
        public abstract class V512 : Avx10v1.V512
        {
            internal V512() { }

            public static new bool IsSupported { get => IsSupported; }

            // VMINMAXPD zmm1{k1}{z}, zmm2, zmm3/m512/m64bcst {sae}, imm8
            public static Vector512<double> MinMax(Vector512<double> left, Vector512<double> right, [ConstantExpected] byte control) => MinMax(left, right, mode);

            // VMINMAXPS zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst {sae}, imm8
            public static Vector512<float> MinMax(Vector512<float> left, Vector512<float> right, [ConstantExpected] byte control) => MinMax(left, right, mode);

            // VCVTPS2IBS zmm1{k1}{z}, zmm2/m512/m32bcst {er}
            public static Vector512<int> ConvertToByteWithSaturationAndWidenToInt32(Vector512<float> value) => ConvertToByteWithSaturationAndWidenToInt32(value);

            // VCVTPS2IBS zmm1{k1}{z}, zmm2/m512/m32bcst {er}
            public static Vector512<int> ConvertToByteWithSaturationAndWidenToInt32(Vector512<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToByteWithSaturationAndWidenToInt32(value, mode);

            // VCVTPS2IUBS zmm1{k1}{z}, zmm2/m512/m32bcst {er}
            public static Vector512<uint> ConvertToByteWithSaturationAndWidenToUInt32(Vector512<float> value) => ConvertToByteWithSaturationAndWidenToUInt32(value);

            // VCVTPS2IUBS zmm1{k1}{z}, zmm2/m512/m32bcst {er}
            public static Vector512<uint> ConvertToByteWithSaturationAndWidenToUInt32(Vector512<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToByteWithSaturationAndWidenToUInt32(value, mode);

            // VCVTTPS2IUBS zmm1{k1}{z}, zmm2/m512/m32bcst {sae}
            public static Vector512<int> ConvertToByteWithTruncatedSaturationAndWidenToInt32(Vector512<float> value) => ConvertToByteWithTruncatedSaturationAndWidenToInt32(value);

            // VCVTTPS2IUBS zmm1{k1}{z}, zmm2/m512/m32bcst {sae}
            public static Vector512<uint> ConvertToByteWithTruncatedSaturationAndWidenToUInt32(Vector512<float> value) => ConvertToByteWithTruncatedSaturationAndWidenToUInt32(value);

            // This is a 512 extension of previously existing 128/26 inrinsic
            // VMPSADBW zmm1{k1}{z}, zmm2, zmm3/m512, imm8
            public static Vector512<ushort> MultipleSumAbsoluteDifferences(Vector512<byte> left, Vector512<byte> right, [ConstantExpected] byte mask) => MultipleSumAbsoluteDifferences(left, right, mask);

            [Intrinsic]
            public new abstract class X64 : Avx10v1.V512.X64
            {
                internal X64() { }

                public static new bool IsSupported { get => IsSupported; }
            }
        }
    }
}
khushal1996 commented 1 week ago

I will also like to discuss the following API

        // VMOVD xmm1, xmm2/m32
        public static Vector128<uint> ConvertToVector128UInt32(Vector128<uint> value) => ConvertToVector128UInt32(value);

        // VMOVW xmm1, xmm2/m16
        public static Vector128<ushort> ConvertToVector128UInt16(Vector128<ushort> value) => ConvertToVector128UInt16(value);

and would like to change them to

        public static unsafe void StoreLowDWord(byte* address, Vector128<uint> source) => StoreLowDWord(address, source);
        public static unsafe void StoreLowWord(byte* address, Vector128<ushort> source) => StoreLowWord(address, source);
        public static unsafe Vector128<uint> RetrieveLowDWord(byte* address) => RetrieveLowDWord(address);
        public static unsafe Vector128<ushort> RetrieveLowWord(byte* address) => RetrieveLowWord(address);

Image

Image

tannergooding commented 1 week ago

@khushal1996 the proposed signatures don't match the .NET naming conventions (we'd still use UInt32) but also don't cover all the functionality the underlying API supports

In particular we already expose the existing movd/movq variants that deal with general-purpose to/from SIMD and so which can already work with loading from or storing to memory. These notably have a signature similar to static Vector128<int> ConvertScalarToVector128Int32(int value)

Likewise while we expose some instructions like movss, where the managed signature is static Vector128<float> MoveScalar(Vector128<float> upper, Vector128<float> value), these preserve the upper bits.

The new movd/movw variants are most similar to the existing movq variant which deals with SIMD to SIMD and zero the upper bits. The latter is exposed in two ways today: MoveScalar for SIMD to/from SIMD and ConvertToVector128UInt64 which is general-purpose to/from SIMD. So it might be goodness for these ones to similarly be MoveScalar rather than ConvertToVector128*

khushal1996 commented 1 week ago

Thanks @tannergooding To conclude, this is what you are proposing

public static Vector128<uint>MoveScalarUInt32(Vector128<uint>)
tannergooding commented 1 week ago

In this case, since its a move and no conversions are possible, it'd just be 2-4 MoveScalar overloads (depending on if we only want unsigned or also signed overloads).