dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.65k stars 4.58k forks source link

AVX512 masking support #87097

Open tannergooding opened 1 year ago

tannergooding commented 1 year ago

Summary

While implementing the API surface for Expose VectorMask to support generic masking for Vector, various considerations were found that necessitated taking a step back and reconsidering how it works.

Most of these issues were found foremost in the additional complexity and throughput hit that was going to be required for the JIT to integrate the type. However, it also impacted the way users interacted with the types and the public API surface we were to expose. Namely that existing user code would not benefit and it would nearly double the API surface we're currently exposing for the XArch and cross-platform intrinsics.

These considerations were raised with @dotnet/avx512-contrib and an alternative design was proposed where the JIT would do pattern recognition in lowering instead to limit the throughput hit and provide light-up to existing user code. This does not preclude the ability to expose VectorMask in the future and we can revisit the type and its design as appropriate.

Conceptual Differences

Previously, we would have defined the following and this would have expanded to effectively all existing intrinsics exposed. This would nearly double or triple our API surface taking us from the ~1900 APIs we have today up to at least ~3800 APIs. Arm64, as a corallary example, currently has ~2100 APIs.

namespace System.Runtime.Intrinsics.X86;

public static partial class Avx512F
{
    // Existing API
    public static Vector512<float> Add(Vector512<float> left, Vector512<float> right);

    // New mask API
    public static Vector512<float> Add(Vector512<float> mergeValues, Vector512Mask<float> mergeMask, Vector512<float> left, Vector512<float> right);

    // Potentially handled by just the above overload where `mergeValues: Vector512<float>.Zero`
    public static Vector512<float> Add(Vector512Mask<float> zeroMask, Vector512<float> left, Vector512<float> right);

    public static partial class VL
    {
        // New mask API
        public static Vector512<float> Add(Vector128<float> mergeValues, Vector128Mask<float> mergeMask, Vector128<float> left, Vector128<float> right);
        public static Vector512<float> Add(Vector256<float> mergeValues, Vector256Mask<float> mergeMask, Vector256<float> left, Vector256<float> right);

        // Potentially handled by just the above overload where `mergeValues: Vector512<float>.Zero`
        public static Vector512<float> Add(Vector128Mask<float> zeroMask, Vector128<float> left, Vector128<float> right);
        public static Vector512<float> Add(Vector256Mask<float> zeroMask, Vector256<float> left, Vector256<float> right);
    }
}

Pattern Recognition

Rather than exposing these overloads of APIs that take VectorMask<T> and allowing users to explicitly utilize masking, we will instead recognize a few key patterns and transform those in the JIT instead.

We would of also had some intrinsics such as public static Vector512Mask<float> CompareEqual(Vector512<float> left, Vector512<float> right) which produce a mask and various other ways to produce a mask as well. Developers then would've been able to consume this by passing down the mask to the API. For example, in the following we find all additions involving NaN and ensure those elements become 0 in the result.

Vector512Mask<float> nanMask = Avx512F.CompareNotEqual(left, left) | Avx512F.CompareNotEqual(right, right);
return Avx512F.Add(Vector512<float>.Zero, ~nanMask, left, right);

If a user wanted to do that today where masking doesn't exist, they'd actually do a functionally similar thing:

Vector256<float> nanMask = Avx.CompareNotEqual(left, left) | Avx.CompareNotEqual(right, right);
Vector256<float> result = Avx.Add(left, right);
return Vector256.ConditionalSelect(~nanMask, result, Vector256<float>.Zero);

Thus, by instead recognizing these patterns we can light up existing code and avoid exploding the API surface while also ensuring that the code users aim to write is consistent regardless of whether they are on hardware with native hardware masking or not.

A sampling of the set of patterns we want to recognize include, but are not limited to:

API Proposal

namespace System.Runtime.Intrinsics.X86;

public enum IntComparisonMode : byte
{
    Equals = 0,
    LessThan = 1,
    LessThanOrEqual = 2,
    False = 3,

    NotEquals = 4,
    GreaterThanOrEqual = 5,
    GreaterThan = 6,
    True = 7,

    // Additional names for parity
    //
    // FloatComparisonMode has similar but they are necessary there since
    // `!(x > y)` is not the same as `(x <= y)` due to the existance of NaN
    //
    // The architecture manual formally uses NotLessThan and NotLessThanOrEqual

    NotGreaterThanOrEqual = 1,
    NotGreaterThan = 2,

    NotLessThan = 5,
    NotLessThanOrEqual = 6,
}

public static partial class Avx512F
{
    public static Vector512<double> BlendVariable(Vector512<double> left, Vector512<double> right, Vector512<double> mask);
    public static Vector512<int>    BlendVariable(Vector512<int>    left, Vector512<int>    right, Vector512<int>    mask);
    public static Vector512<long>   BlendVariable(Vector512<long>   left, Vector512<long>   right, Vector512<long>   mask);
    public static Vector512<float>  BlendVariable(Vector512<float>  left, Vector512<float>  right, Vector512<float>  mask);
    public static Vector512<uint>   BlendVariable(Vector512<uint>   left, Vector512<uint>   right, Vector512<uint>   mask);
    public static Vector512<ulong>  BlendVariable(Vector512<ulong>  left, Vector512<ulong>  right, Vector512<ulong>  mask);

    public static Vector512<double> Compare                     (Vector512<double> left, Vector512<double> right, [ConstantExpected(Max = FloatComparisonMode.UnorderedTrueSignaling)] FloatComparisonMode mode);
    public static Vector512<double> CompareEqual                (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareGreaterThan          (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareGreaterThanOrEqual   (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareLessThan             (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareLessThanOrEqual      (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareNotEqual             (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareNotGreaterThan       (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareNotGreaterThanOrEqual(Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareNotLessThan          (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareNotLessThanOrEqual   (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareOrdered              (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareUnordered            (Vector512<double> left, Vector512<double> right);

    public static Vector512<float> Compare                     (Vector512<float> left, Vector512<float> right, [ConstantExpected(Max = FloatComparisonMode.UnorderedTrueSignaling)] FloatComparisonMode mode);
    public static Vector512<float> CompareEqual                (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareGreaterThan          (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareGreaterThanOrEqual   (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareLessThan             (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareLessThanOrEqual      (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareNotEqual             (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareNotGreaterThan       (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareNotGreaterThanOrEqual(Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareNotLessThan          (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareNotLessThanOrEqual   (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareOrdered              (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareUnordered            (Vector512<float> left, Vector512<float> right);

    public static Vector512<int> Compare                  (Vector512<int> left, Vector512<int> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
    public static Vector512<int> CompareEqual             (Vector512<int> left, Vector512<int> right);
    public static Vector512<int> CompareGreaterThan       (Vector512<int> left, Vector512<int> right);
    public static Vector512<int> CompareGreaterThanOrEqual(Vector512<int> left, Vector512<int> right);
    public static Vector512<int> CompareLessThan          (Vector512<int> left, Vector512<int> right);
    public static Vector512<int> CompareLessThanOrEqual   (Vector512<int> left, Vector512<int> right);
    public static Vector512<int> CompareNotEqual          (Vector512<int> left, Vector512<int> right);

    public static Vector512<long> Compare                  (Vector512<long> left, Vector512<long> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
    public static Vector512<long> CompareEqual             (Vector512<long> left, Vector512<long> right);
    public static Vector512<long> CompareGreaterThan       (Vector512<long> left, Vector512<long> right);
    public static Vector512<long> CompareGreaterThanOrEqual(Vector512<long> left, Vector512<long> right);
    public static Vector512<long> CompareLessThan          (Vector512<long> left, Vector512<long> right);
    public static Vector512<long> CompareLessThanOrEqual   (Vector512<long> left, Vector512<long> right);
    public static Vector512<long> CompareNotEqual          (Vector512<long> left, Vector512<long> right);

    public static Vector512<uint> Compare                  (Vector512<uint> left, Vector512<uint> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
    public static Vector512<uint> CompareEqual             (Vector512<uint> left, Vector512<uint> right);
    public static Vector512<uint> CompareGreaterThan       (Vector512<uint> left, Vector512<uint> right);
    public static Vector512<uint> CompareGreaterThanOrEqual(Vector512<uint> left, Vector512<uint> right);
    public static Vector512<uint> CompareLessThan          (Vector512<uint> left, Vector512<uint> right);
    public static Vector512<uint> CompareLessThanOrEqual   (Vector512<uint> left, Vector512<uint> right);
    public static Vector512<uint> CompareNotEqual          (Vector512<uint> left, Vector512<uint> right);

    public static Vector512<ulong> Compare                  (Vector512<ulong> left, Vector512<ulong> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
    public static Vector512<ulong> CompareEqual             (Vector512<ulong> left, Vector512<ulong> right);
    public static Vector512<ulong> CompareGreaterThan       (Vector512<ulong> left, Vector512<ulong> right);
    public static Vector512<ulong> CompareGreaterThanOrEqual(Vector512<ulong> left, Vector512<ulong> right);
    public static Vector512<ulong> CompareLessThan          (Vector512<ulong> left, Vector512<ulong> right);
    public static Vector512<ulong> CompareLessThanOrEqual   (Vector512<ulong> left, Vector512<ulong> right);
    public static Vector512<ulong> CompareNotEqual          (Vector512<ulong> left, Vector512<ulong> right);

    public static Vector512<double> Compress(Vector512<double> value, Vector512<double> mask);
    public static Vector512<int>    Compress(Vector512<int>    value, Vector512<int>    mask);
    public static Vector512<long>   Compress(Vector512<long>   value, Vector512<long>   mask);
    public static Vector512<float>  Compress(Vector512<float>  value, Vector512<float>  mask);
    public static Vector512<uint>   Compress(Vector512<uint>   value, Vector512<uint>   mask);
    public static Vector512<ulong>  Compress(Vector512<ulong>  value, Vector512<ulong>  mask);

    public static Vector512<double> Expand(Vector512<double> value, Vector512<double> mask);
    public static Vector512<int>    Expand(Vector512<int>    value, Vector512<int>    mask);
    public static Vector512<long>   Expand(Vector512<long>   value, Vector512<long>   mask);
    public static Vector512<float>  Expand(Vector512<float>  value, Vector512<float>  mask);
    public static Vector512<uint>   Expand(Vector512<uint>   value, Vector512<uint>   mask);
    public static Vector512<ulong>  Expand(Vector512<ulong>  value, Vector512<ulong>  mask);

    public static unsafe Vector512<double> GatherMaskVector512(Vector512<double> source, double* baseAddress, Vector512<int> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherMaskVector512(Vector512<int>    source, int*    baseAddress, Vector512<int> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherMaskVector512(Vector512<long>   source, long*   baseAddress, Vector512<int> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherMaskVector512(Vector512<float>  source, float*  baseAddress, Vector512<int> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherMaskVector512(Vector512<uint>   source, uint*   baseAddress, Vector512<int> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherMaskVector512(Vector512<ulong>  source, ulong*  baseAddress, Vector512<int> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> GatherMaskVector512(Vector512<double> source, double* baseAddress, Vector512<long> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherMaskVector512(Vector512<int>    source, int*    baseAddress, Vector512<long> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherMaskVector512(Vector512<long>   source, long*   baseAddress, Vector512<long> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherMaskVector512(Vector512<uint>   source, uint*   baseAddress, Vector512<long> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherMaskVector512(Vector512<float>  source, float*  baseAddress, Vector512<long> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherMaskVector512(Vector512<ulong>  source, ulong*  baseAddress, Vector512<long> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> GatherVector512(double* baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherVector512(int*    baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherVector512(long*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherVector512(float*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherVector512(uint*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherVector512(ulong*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> GatherVector512(double* baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherVector512(int*    baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherVector512(long*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherVector512(float*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherVector512(uint*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherVector512(ulong*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> MaskLoad(double* address, Vector512<double> mask);
    public static unsafe Vector512<int>    MaskLoad(int*    address, Vector512<int>    mask);
    public static unsafe Vector512<long>   MaskLoad(long*   address, Vector512<long>   mask);
    public static unsafe Vector512<float>  MaskLoad(float*  address, Vector512<float>  mask);
    public static unsafe Vector512<uint>   MaskLoad(uint*   address, Vector512<uint>   mask);
    public static unsafe Vector512<ulong>  MaskLoad(ulong*  address, Vector512<ulong>  mask);

    public static unsafe void MaskStore(double* address, Vector512<double> mask, Vector512<double> source);
    public static unsafe void MaskStore(int*    address, Vector512<int>    mask, Vector512<int>    source);
    public static unsafe void MaskStore(long*   address, Vector512<long>   mask, Vector512<long>   source);
    public static unsafe void MaskStore(float*  address, Vector512<float>  mask, Vector512<float>  source);
    public static unsafe void MaskStore(uint*   address, Vector512<uint>   mask, Vector512<uint>   source);
    public static unsafe void MaskStore(ulong*  address, Vector512<ulong>  mask, Vector512<ulong>  source);

    public static int MoveMask(Vector256<short>  value);
    public static int MoveMask(Vector256<ushort> value);
    public static int MoveMask(Vector512<int>    value);
    public static int MoveMask(Vector512<float>  value);
    public static int MoveMask(Vector512<uint>   value);

    public static unsafe void ScatterMaskVector512(Vector512<double> value, double* baseAddress, Vector512<int> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<int>    value, int*    baseAddress, Vector512<int> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<long>   value, long*   baseAddress, Vector512<int> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<float>  value, float*  baseAddress, Vector512<int> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<uint>   value, uint*   baseAddress, Vector512<int> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<int> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe void ScatterMaskVector512(Vector512<double> value, double* baseAddress, Vector512<long> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<int>    value, int*    baseAddress, Vector512<long> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<long>   value, long*   baseAddress, Vector512<long> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<float>  value, uint*   baseAddress, Vector512<long> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<uint>   value, float*  baseAddress, Vector512<long> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<long> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe void ScatterVector512(Vector512<double> value, double* baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<int>    value, int*    baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<long>   value, long*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<float>  value, float*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<uint>   value, uint*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe void ScatterVector512(Vector512<double> value, double* baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<int>    value, int*    baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<long>   value, long*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<float>  value, float*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<uint>   value, uint*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static bool TestC(Vector512<double> left, Vector512<double> right);
    public static bool TestC(Vector512<int>    left, Vector512<int>    right);
    public static bool TestC(Vector512<long>   left, Vector512<long>   right);
    public static bool TestC(Vector512<float>  left, Vector512<float>  right);
    public static bool TestC(Vector512<uint>   left, Vector512<uint>   right);
    public static bool TestC(Vector512<ulong>  left, Vector512<ulong>  right);

    public static bool TestNotZAndNotC(Vector512<double> left, Vector512<double> right);
    public static bool TestNotZAndNotC(Vector512<int>    left, Vector512<int>    right);
    public static bool TestNotZAndNotC(Vector512<long>   left, Vector512<long>   right);
    public static bool TestNotZAndNotC(Vector512<float>  left, Vector512<float>  right);
    public static bool TestNotZAndNotC(Vector512<uint>   left, Vector512<uint>   right);
    public static bool TestNotZAndNotC(Vector512<ulong>  left, Vector512<ulong>  right);

    public static bool TestZ(Vector512<double> left, Vector512<double> right);
    public static bool TestZ(Vector512<int>    left, Vector512<int>    right);
    public static bool TestZ(Vector512<long>   left, Vector512<long>   right);
    public static bool TestZ(Vector512<float>  left, Vector512<float>  right);
    public static bool TestZ(Vector512<uint>   left, Vector512<uint>   right);
    public static bool TestZ(Vector512<ulong>  left, Vector512<ulong>  right);

    public static partial class VL
    {
        public static Vector128<int> Compare                  (Vector128<int> left, Vector128<int> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector128<int> CompareGreaterThanOrEqual(Vector128<int> left, Vector128<int> right);
        public static Vector128<int> CompareLessThan          (Vector128<int> left, Vector128<int> right);
        public static Vector128<int> CompareLessThanOrEqual   (Vector128<int> left, Vector128<int> right);
        public static Vector128<int> CompareNotEqual          (Vector128<int> left, Vector128<int> right);

        public static Vector256<int> Compare                  (Vector256<int> left, Vector256<int> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector256<int> CompareGreaterThanOrEqual(Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareLessThan          (Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareLessThanOrEqual   (Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareNotEqual          (Vector256<int> left, Vector256<int> right);

        public static Vector128<long> Compare                  (Vector128<long> left, Vector128<long> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector128<long> CompareGreaterThanOrEqual(Vector128<long> left, Vector128<long> right);
        public static Vector128<long> CompareLessThan          (Vector128<long> left, Vector128<long> right);
        public static Vector128<long> CompareLessThanOrEqual   (Vector128<long> left, Vector128<long> right);
        public static Vector128<long> CompareNotEqual          (Vector128<long> left, Vector128<long> right);
        public static Vector256<long> Compare                  (Vector256<long> left, Vector256<long> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector256<long> CompareGreaterThanOrEqual(Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareLessThan          (Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareLessThanOrEqual   (Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareNotEqual          (Vector256<long> left, Vector256<long> right);

        public static Vector128<uint> Compare                  (Vector128<uint> left, Vector128<uint> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector128<uint> CompareGreaterThan       (Vector128<uint> left, Vector128<uint> right);
        public static Vector128<uint> CompareGreaterThanOrEqual(Vector128<uint> left, Vector128<uint> right);
        public static Vector128<uint> CompareLessThan          (Vector128<uint> left, Vector128<uint> right);
        public static Vector128<uint> CompareLessThanOrEqual   (Vector128<uint> left, Vector128<uint> right);
        public static Vector128<uint> CompareNotEqual          (Vector128<uint> left, Vector128<uint> right);
        public static Vector256<uint> Compare                  (Vector256<uint> left, Vector256<uint> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector256<uint> CompareGreaterThan       (Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareGreaterThanOrEqual(Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareLessThan          (Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareLessThanOrEqual   (Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareNotEqual          (Vector256<uint> left, Vector256<uint> right);

        public static Vector128<ulong> Compare                  (Vector128<ulong> left, Vector128<ulong> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector128<ulong> CompareGreaterThan       (Vector128<ulong> left, Vector128<ulong> right);
        public static Vector128<ulong> CompareGreaterThanOrEqual(Vector128<ulong> left, Vector128<ulong> right);
        public static Vector128<ulong> CompareLessThan          (Vector128<ulong> left, Vector128<ulong> right);
        public static Vector128<ulong> CompareLessThanOrEqual   (Vector128<ulong> left, Vector128<ulong> right);
        public static Vector128<ulong> CompareNotEqual          (Vector128<ulong> left, Vector128<ulong> right);
        public static Vector256<ulong> Compare                  (Vector256<ulong> left, Vector256<ulong> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector256<ulong> CompareGreaterThan       (Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareGreaterThanOrEqual(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareLessThan          (Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareLessThanOrEqual   (Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareNotEqual          (Vector256<ulong> left, Vector256<ulong> right);

        public static Vector128<double> Compress(Vector128<double> value, Vector128<double> mask);
        public static Vector128<int>    Compress(Vector128<int>    value, Vector128<int>    mask);
        public static Vector128<long>   Compress(Vector128<long>   value, Vector128<long>   mask);
        public static Vector128<float>  Compress(Vector128<float>  value, Vector128<float>  mask);
        public static Vector128<uint>   Compress(Vector128<uint>   value, Vector128<uint>   mask);
        public static Vector128<ulong>  Compress(Vector128<ulong>  value, Vector128<ulong>  mask);
        public static Vector256<double> Compress(Vector256<double> value, Vector256<double> mask);
        public static Vector256<int>    Compress(Vector256<int>    value, Vector256<int>    mask);
        public static Vector256<long>   Compress(Vector256<long>   value, Vector256<long>   mask);
        public static Vector256<float>  Compress(Vector256<float>  value, Vector256<float>  mask);
        public static Vector256<uint>   Compress(Vector256<uint>   value, Vector256<uint>   mask);
        public static Vector256<ulong>  Compress(Vector256<ulong>  value, Vector256<ulong>  mask);

        public static Vector128<double> Expand(Vector128<double> value, Vector128<double> mask);
        public static Vector128<int>    Expand(Vector128<int>    value, Vector128<int>    mask);
        public static Vector128<long>   Expand(Vector128<long>   value, Vector128<long>   mask);
        public static Vector128<float>  Expand(Vector128<float>  value, Vector128<float>  mask);
        public static Vector128<uint>   Expand(Vector128<uint>   value, Vector128<uint>   mask);
        public static Vector128<ulong>  Expand(Vector128<ulong>  value, Vector128<ulong>  mask);
        public static Vector256<double> Expand(Vector256<double> value, Vector256<double> mask);
        public static Vector256<int>    Expand(Vector256<int>    value, Vector256<int>    mask);
        public static Vector256<long>   Expand(Vector256<long>   value, Vector256<long>   mask);
        public static Vector256<float>  Expand(Vector256<float>  value, Vector256<float>  mask);
        public static Vector256<uint>   Expand(Vector256<uint>   value, Vector256<uint>   mask);
        public static Vector256<ulong>  Expand(Vector256<ulong>  value, Vector256<ulong>  mask);

        public static unsafe void ScatterMaskVector128(Vector128<double> value, double* baseAddress, Vector128<int> index, Vector128<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<int>    value, int*    baseAddress, Vector128<int> index, Vector128<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<long>   value, long*   baseAddress, Vector128<int> index, Vector128<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<float>  value, float*  baseAddress, Vector128<int> index, Vector128<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<uint>   value, uint*   baseAddress, Vector128<int> index, Vector128<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<int> index, Vector128<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<double> value, double* baseAddress, Vector256<int> index, Vector256<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<int>    value, int*    baseAddress, Vector256<int> index, Vector256<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<long>   value, long*   baseAddress, Vector256<int> index, Vector256<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<float>  value, float*  baseAddress, Vector256<int> index, Vector256<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<uint>   value, uint*   baseAddress, Vector256<int> index, Vector256<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<int> index, Vector256<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

        public static unsafe void ScatterMaskVector128(Vector128<double> value, double* baseAddress, Vector128<long> index, Vector128<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<int>    value, int*    baseAddress, Vector128<long> index, Vector128<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<long>   value, long*   baseAddress, Vector128<long> index, Vector128<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<float>  value, uint*   baseAddress, Vector128<long> index, Vector128<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<uint>   value, float*  baseAddress, Vector128<long> index, Vector128<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<long> index, Vector128<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<double> value, double* baseAddress, Vector256<long> index, Vector256<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<int>    value, int*    baseAddress, Vector256<long> index, Vector256<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<long>   value, long*   baseAddress, Vector256<long> index, Vector256<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<float>  value, uint*   baseAddress, Vector256<long> index, Vector256<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<uint>   value, float*  baseAddress, Vector256<long> index, Vector256<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<long> index, Vector256<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

        public static unsafe void ScatterVector128(Vector128<double> value, double* baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<int>    value, int*    baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<long>   value, long*   baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<float>  value, float*  baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<uint>   value, uint*   baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<double> value, double* baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<int>    value, int*    baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<long>   value, long*   baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<float>  value, float*  baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<uint>   value, uint*   baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

        public static unsafe void ScatterVector128(Vector128<double> value, double* baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<int>    value, int*    baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<long>   value, long*   baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<float>  value, float*  baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<uint>   value, uint*   baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<double> value, double* baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<int>    value, int*    baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<long>   value, long*   baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<float>  value, float*  baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<uint>   value, uint*   baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    }
}

public static partial class Avx512BW
{
    public static Vector512<byte>   BlendVariable(Vector512<byte>   left, Vector512<byte>   right, Vector512<byte>   mask);
    public static Vector512<short>  BlendVariable(Vector512<short>  left, Vector512<short>  right, Vector512<short>  mask);
    public static Vector512<sbyte>  BlendVariable(Vector512<sbyte>  left, Vector512<sbyte>  right, Vector512<sbyte>  mask);
    public static Vector512<ushort> BlendVariable(Vector512<ushort> left, Vector512<ushort> right, Vector512<ushort> mask);

    public static Vector512<byte> Compare                  (Vector512<byte> left, Vector512<byte> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
    public static Vector512<byte> CompareEqual             (Vector512<byte> left, Vector512<byte> right);
    public static Vector512<byte> CompareGreaterThan       (Vector512<byte> left, Vector512<byte> right);
    public static Vector512<byte> CompareGreaterThanOrEqual(Vector512<byte> left, Vector512<byte> right);
    public static Vector512<byte> CompareLessThan          (Vector512<byte> left, Vector512<byte> right);
    public static Vector512<byte> CompareLessThanOrEqual   (Vector512<byte> left, Vector512<byte> right);
    public static Vector512<byte> CompareNotEqual          (Vector512<byte> left, Vector512<byte> right);

    public static Vector512<short> Compare                  (Vector512<short> left, Vector512<short> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
    public static Vector512<short> CompareEqual             (Vector512<short> left, Vector512<short> right);
    public static Vector512<short> CompareGreaterThan       (Vector512<short> left, Vector512<short> right);
    public static Vector512<short> CompareGreaterThanOrEqual(Vector512<short> left, Vector512<short> right);
    public static Vector512<short> CompareLessThan          (Vector512<short> left, Vector512<short> right);
    public static Vector512<short> CompareLessThanOrEqual   (Vector512<short> left, Vector512<short> right);
    public static Vector512<short> CompareNotEqual          (Vector512<short> left, Vector512<short> right);

    public static Vector512<sbyte> Compare                  (Vector512<sbyte> left, Vector512<sbyte> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
    public static Vector512<sbyte> CompareEqual             (Vector512<sbyte> left, Vector512<sbyte> right);
    public static Vector512<sbyte> CompareGreaterThan       (Vector512<sbyte> left, Vector512<sbyte> right);
    public static Vector512<sbyte> CompareGreaterThanOrEqual(Vector512<sbyte> left, Vector512<sbyte> right);
    public static Vector512<sbyte> CompareLessThan          (Vector512<sbyte> left, Vector512<sbyte> right);
    public static Vector512<sbyte> CompareLessThanOrEqual   (Vector512<sbyte> left, Vector512<sbyte> right);
    public static Vector512<sbyte> CompareNotEqual          (Vector512<sbyte> left, Vector512<sbyte> right);

    public static Vector512<ushort> Compare                  (Vector512<ushort> left, Vector512<ushort> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
    public static Vector512<ushort> CompareEqual             (Vector512<ushort> left, Vector512<ushort> right);
    public static Vector512<ushort> CompareGreaterThan       (Vector512<ushort> left, Vector512<ushort> right);
    public static Vector512<ushort> CompareGreaterThanOrEqual(Vector512<ushort> left, Vector512<ushort> right);
    public static Vector512<ushort> CompareLessThan          (Vector512<ushort> left, Vector512<ushort> right);
    public static Vector512<ushort> CompareLessThanOrEqual   (Vector512<ushort> left, Vector512<ushort> right);
    public static Vector512<ushort> CompareNotEqual          (Vector512<ushort> left, Vector512<ushort> right);

    public static int MoveMask(Vector512<short>  value);
    public static int MoveMask(Vector512<ushort> value);

    public static long MoveMask(Vector512<byte>  value);
    public static long MoveMask(Vector512<sbyte> value);

    public static bool TestC(Vector512<byte>   left, Vector512<byte>   right);
    public static bool TestC(Vector512<short>  left, Vector512<short>  right);
    public static bool TestC(Vector512<sbyte>  left, Vector512<sbyte>  right);
    public static bool TestC(Vector512<ushort> left, Vector512<ushort> right);

    public static bool TestNotZAndNotC(Vector512<byte>   left, Vector512<byte>   right);
    public static bool TestNotZAndNotC(Vector512<short>  left, Vector512<short>  right);
    public static bool TestNotZAndNotC(Vector512<sbyte>  left, Vector512<sbyte>  right);
    public static bool TestNotZAndNotC(Vector512<ushort> left, Vector512<ushort> right);

    public static bool TestZ(Vector512<byte>   left, Vector512<byte>   right);
    public static bool TestZ(Vector512<short>  left, Vector512<short>  right);
    public static bool TestZ(Vector512<sbyte>  left, Vector512<sbyte>  right);
    public static bool TestZ(Vector512<ushort> left, Vector512<ushort> right);

    public static partial class VL
    {
        public static Vector128<byte> Compare                  (Vector128<byte> left, Vector128<byte> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector128<byte> CompareGreaterThan       (Vector128<byte> left, Vector128<byte> right);
        public static Vector128<byte> CompareGreaterThanOrEqual(Vector128<byte> left, Vector128<byte> right);
        public static Vector128<byte> CompareLessThan          (Vector128<byte> left, Vector128<byte> right);
        public static Vector128<byte> CompareLessThanOrEqual   (Vector128<byte> left, Vector128<byte> right);
        public static Vector128<byte> CompareNotEqual          (Vector128<byte> left, Vector128<byte> right);
        public static Vector256<byte> Compare                  (Vector256<byte> left, Vector256<byte> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector256<byte> CompareGreaterThan       (Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareGreaterThanOrEqual(Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareLessThan          (Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareLessThanOrEqual   (Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareNotEqual          (Vector256<byte> left, Vector256<byte> right);

        public static Vector128<short> Compare                  (Vector128<short> left, Vector128<short> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector128<short> CompareGreaterThanOrEqual(Vector128<short> left, Vector128<short> right);
        public static Vector128<short> CompareLessThan          (Vector128<short> left, Vector128<short> right);
        public static Vector128<short> CompareLessThanOrEqual   (Vector128<short> left, Vector128<short> right);
        public static Vector128<short> CompareNotEqual          (Vector128<short> left, Vector128<short> right);
        public static Vector256<short> Compare                  (Vector256<short> left, Vector256<short> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector256<short> CompareGreaterThanOrEqual(Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareLessThan          (Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareLessThanOrEqual   (Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareNotEqual          (Vector256<short> left, Vector256<short> right);

        public static Vector128<sbyte> Compare                  (Vector128<sbyte> left, Vector128<sbyte> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector128<sbyte> CompareGreaterThanOrEqual(Vector128<sbyte> left, Vector128<sbyte> right);
        public static Vector128<sbyte> CompareLessThan          (Vector128<sbyte> left, Vector128<sbyte> right);
        public static Vector128<sbyte> CompareLessThanOrEqual   (Vector128<sbyte> left, Vector128<sbyte> right);
        public static Vector128<sbyte> CompareNotEqual          (Vector128<sbyte> left, Vector128<sbyte> right);
        public static Vector256<sbyte> Compare                  (Vector256<sbyte> left, Vector256<sbyte> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector256<sbyte> CompareGreaterThanOrEqual(Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareLessThan          (Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareLessThanOrEqual   (Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareNotEqual          (Vector256<sbyte> left, Vector256<sbyte> right);

        public static Vector128<ushort> Compare                  (Vector128<ushort> left, Vector128<ushort> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector128<ushort> CompareGreaterThan       (Vector128<ushort> left, Vector128<ushort> right);
        public static Vector128<ushort> CompareGreaterThanOrEqual(Vector128<ushort> left, Vector128<ushort> right);
        public static Vector128<ushort> CompareLessThan          (Vector128<ushort> left, Vector128<ushort> right);
        public static Vector128<ushort> CompareLessThanOrEqual   (Vector128<ushort> left, Vector128<ushort> right);
        public static Vector128<ushort> CompareNotEqual          (Vector128<ushort> left, Vector128<ushort> right);
        public static Vector256<ushort> Compare                  (Vector256<ushort> left, Vector256<ushort> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector256<ushort> CompareGreaterThan       (Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareGreaterThanOrEqual(Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareLessThan          (Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareLessThanOrEqual   (Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareNotEqual          (Vector256<ushort> left, Vector256<ushort> right);
    }
}

public static partial class Avx512DQ
{
    public static Vector512<double> Classify(Vector512<double> value, [ConstantExpected] byte control);
    public static Vector512<float>  Classify(Vector512<float>  value, [ConstantExpected] byte control);

    public static Vector128<double> ClassifyScalar(Vector128<double> value, [ConstantExpected] byte control);
    public static Vector128<float>  ClassifyScalar(Vector128<float>  value, [ConstantExpected] byte control);

    public static int MoveMask(Vector128<short>  value);
    public static int MoveMask(Vector128<ushort> value);
    public static int MoveMask(Vector256<int>    value);
    public static int MoveMask(Vector256<uint>   value);
    public static int MoveMask(Vector512<double> value);
    public static int MoveMask(Vector512<long>   value);
    public static int MoveMask(Vector512<ulong>  value);

    public static partial class VL
    {
        public static Vector128<double> Classify(Vector128<double> value, [ConstantExpected] byte control);
        public static Vector128<float>  Classify(Vector128<float>  value, [ConstantExpected] byte control);
        public static Vector256<double> Classify(Vector256<double> value, [ConstantExpected] byte control);
        public static Vector256<float>  Classify(Vector256<float>  value, [ConstantExpected] byte control);
    }
}

public abstract class Avx512Vbmi2 : Avx512BW
{
    public static new bool IsSupported { get; }

    public static Vector512<byte>   Compress(Vector512<byte>   value, Vector512<byte>   mask);
    public static Vector512<short>  Compress(Vector512<short>  value, Vector512<short>  mask);
    public static Vector512<sbyte>  Compress(Vector512<sbyte>  value, Vector512<sbyte>  mask);
    public static Vector512<ushort> Compress(Vector512<ushort> value, Vector512<ushort> mask);

    public static Vector512<byte>   Expand(Vector512<byte>   value, Vector512<byte>   mask);
    public static Vector512<short>  Expand(Vector512<short>  value, Vector512<short>  mask);
    public static Vector512<sbyte>  Expand(Vector512<sbyte>  value, Vector512<sbyte>  mask);
    public static Vector512<ushort> Expand(Vector512<ushort> value, Vector512<ushort> mask);

    public abstract class VL : Avx512BW.VL
    {
        public static new bool IsSupported { get; }

        public static Vector128<byte>   Compress(Vector128<byte>   value, Vector128<byte>   mask);
        public static Vector128<short>  Compress(Vector128<short>  value, Vector128<short>  mask);
        public static Vector128<sbyte>  Compress(Vector128<sbyte>  value, Vector128<sbyte>  mask);
        public static Vector128<ushort> Compress(Vector128<ushort> value, Vector128<ushort> mask);
        public static Vector256<byte>   Compress(Vector256<byte>   value, Vector256<byte>   mask);
        public static Vector256<short>  Compress(Vector256<short>  value, Vector256<short>  mask);
        public static Vector256<sbyte>  Compress(Vector256<sbyte>  value, Vector256<sbyte>  mask);
        public static Vector256<ushort> Compress(Vector256<ushort> value, Vector256<ushort> mask);

        public static Vector128<byte>   Expand(Vector128<byte>   value, Vector128<byte>   mask);
        public static Vector128<short>  Expand(Vector128<short>  value, Vector128<short>  mask);
        public static Vector128<sbyte>  Expand(Vector128<sbyte>  value, Vector128<sbyte>  mask);
        public static Vector128<ushort> Expand(Vector128<ushort> value, Vector128<ushort> mask);
        public static Vector256<byte>   Expand(Vector256<byte>   value, Vector256<byte>   mask);
        public static Vector256<short>  Expand(Vector256<short>  value, Vector256<short>  mask);
        public static Vector256<sbyte>  Expand(Vector256<sbyte>  value, Vector256<sbyte>  mask);
        public static Vector256<ushort> Expand(Vector256<ushort> value, Vector256<ushort> mask);
    }

    public abstract class X64 : Avx512BW.X64
    {
        public static new bool IsSupported { get; }
    }
}
ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics See info in area-owners.md if you want to be subscribed.

Issue Details
## Summary While implementing the API surface for [Expose VectorMask to support generic masking for Vector](https://github.com/dotnet/runtime/issues/74613), various considerations were found that necessitated taking a step back and reconsidering how it works. Most of these issues were found foremost in the additional complexity and throughput hit that was going to be required for the JIT to integrate the type. However, it also impacted the way users interacted with the types and the public API surface we were to expose. Namely that existing user code would not benefit and it would nearly double the API surface we're currently exposing for the XArch and cross-platform intrinsics. These considerations were raised with @dotnet/avx512-contrib and an alternative design was proposed where the JIT would do pattern recognition in lowering instead to limit the throughput hit and provide light-up to existing user code. This does not preclude the ability to expose `VectorMask` in the future and we can revisit the type and its design as appropriate. ## Conceptual Differences Previously, we would have defined the following and this would have expanded to effectively all existing intrinsics exposed. This would nearly double or triple our API surface taking us from the `~1900` APIs we have today up to at least `~3800` APIs. Arm64, as a corallary example, currently has `~2100` APIs. ```csharp namespace System.Runtime.Intrinsics.X86; public static partial class Avx512F { // Existing API public static Vector512 Add(Vector512 left, Vector512 right); // New mask API public static Vector512 Add(Vector512 mergeValues, Vector512Mask mergeMask, Vector512 left, Vector512 right); // Potentially handled by just the above overload where `mergeValues: Vector512.Zero` public static Vector512 Add(Vector512Mask zeroMask, Vector512 left, Vector512 right); public static partial class VL { // New mask API public static Vector512 Add(Vector128 mergeValues, Vector128Mask mergeMask, Vector128 left, Vector128 right); public static Vector512 Add(Vector256 mergeValues, Vector256Mask mergeMask, Vector256 left, Vector256 right); // Potentially handled by just the above overload where `mergeValues: Vector512.Zero` public static Vector512 Add(Vector128Mask zeroMask, Vector128 left, Vector128 right); public static Vector512 Add(Vector256Mask zeroMask, Vector256 left, Vector256 right); } } ``` ## Pattern Recognition Rather than exposing these overloads of APIs that take `VectorMask` and allowing users to explicitly utilize masking, we will instead recognize a few key patterns and transform those in the JIT instead. We would of also had some intrinsics such as `public static Vector512Mask CompareEqual(Vector512 left, Vector512 right)` which produce a mask and various other ways to produce a mask as well. Developers then would've been able to consume this by passing down the mask to the API. For example, in the following we find all additions involving `NaN` and ensure those elements become `0` in the result. ```csharp Vector512Mask nanMask = Avx512F.CompareNotEqual(left, left) | Avx512F.CompareNotEqual(right, right); return Avx512F.Add(Vector512.Zero, ~nanMask, left, right); ``` If a user wanted to do that today where masking doesn't exist, they'd actually do a functionally similar thing: ```csharp Vector256 nanMask = Avx.CompareNotEqual(left, left) | Avx.CompareNotEqual(right, right); Vector256 result = Avx.Add(left, right); return Vector256.ConditionalSelect(~nanMask, result, Vector256.Zero); ``` Thus, by instead recognizing these patterns we can light up existing code and avoid exploding the API surface while also ensuring that the code users aim to write is consistent regardless of whether they are on hardware with native hardware masking or not. A sampling of the set of patterns we want to recognize include, but are not limited to: * `{k1} - ConditionalSelect(mask1, resultVector, mergeVector)` * `{k1}{z} - ConditionalSelect(mask1, resultVector, Vector.Zero)` * `kadd k1, k2 - mask1.ExtractMostSignificantBits() + mask2.ExtractMostSignificantBits()` * `kand k1, k2 - mask1 & mask2` * `kandn k1, k2 - ~mask1 & mask2` * `kmov k1, k2 - mask1 = mask2` * `kmov r32, k1 - mask1.ExtractMostSignificantBits()` * `kmov k1, r32 - Vector.Create(...).ExtractMostSignificantBits()` * `knot k1, k2 - ~mask1` * `kor k1, k2 - mask1 | mask2` * `kortest k1, k2; jz - (mask1 | mask2) == Vector.Zero` * `kortest k1, k2; jnz - (mask1 | mask2) != Vector.Zero` * `kortest k1, k2; jc - (mask1 | mask2) == Vector.AllBitsSet` * `kortest k1, k2; jnc - (mask1 | mask2) != Vector.AllBitsSet` * `kshiftl k1, k2, imm8 - mask1.ExtractMostSignificantBits() << amount` * `kshiftr k1, k2, imm8 - mask1.ExtractMostSignificantBits() >> amount` * `ktest k1, k2; jz - (mask1 & mask2) == Vector.Zero` * `ktest k1, k2; jnz - (mask1 & mask2) != Vector.Zero` * `ktest k1, k2; jc - (~mask1 & mask2) == Vector.Zero` * `ktest k1, k2; jnc - (~mask1 & mask2) == Vector.Zero` * `kunpck k1, k2, k3 - UnpackLow(mask1, mask2)` * `kxnor k1, k2 - ~( mask1 ^ mask2)` * `kxor k1, k2 - (mask1 ^ mask2)` * `vpbroadcastm - Vector.Create(mask1)` * `vpmovm2* - mask1.ExtractMostSignificantBits()` * `vpmov*2m - vector1.ExtractMostSignificantBits()` ## API Proposal ```csharp namespace System.Runtime.Intrinsics.X86; public enum IntComparisonMode : byte { Equals = 0, LessThan = 1, LessThanOrEqual = 2, False = 3, NotEquals = 4, GreaterThanOrEqual = 5, GreaterThan = 6, True = 7, // Additional names for parity // // FloatComparisonMode has similar but they are necessary there since // `!(x > y)` is not the same as `(x <= y)` due to the existance of NaN // // The architecture manual formally uses NotLessThan and NotLessThanOrEqual NotGreaterThanOrEqual = 1, NotGreaterThan = 2, NotLessThan = 5, NotLessThanOrEqual = 6, } public static partial class Avx512F { public static Vector512 BlendVariable(Vector512 left, Vector512 right, Vector512 mask); public static Vector512 BlendVariable(Vector512 left, Vector512 right, Vector512 mask); public static Vector512 BlendVariable(Vector512 left, Vector512 right, Vector512 mask); public static Vector512 BlendVariable(Vector512 left, Vector512 right, Vector512 mask); public static Vector512 BlendVariable(Vector512 left, Vector512 right, Vector512 mask); public static Vector512 BlendVariable(Vector512 left, Vector512 right, Vector512 mask); public static Vector512 Compare (Vector512 left, Vector512 right, [ConstantExpected(Max = FloatComparisonMode.UnorderedTrueSignaling)] FloatComparisonMode mode); public static Vector512 CompareEqual (Vector512 left, Vector512 right); public static Vector512 CompareGreaterThan (Vector512 left, Vector512 right); public static Vector512 CompareGreaterThanOrEqual (Vector512 left, Vector512 right); public static Vector512 CompareLessThan (Vector512 left, Vector512 right); public static Vector512 CompareLessThanOrEqual (Vector512 left, Vector512 right); public static Vector512 CompareNotEqual (Vector512 left, Vector512 right); public static Vector512 CompareNotGreaterThan (Vector512 left, Vector512 right); public static Vector512 CompareNotGreaterThanOrEqual(Vector512 left, Vector512 right); public static Vector512 CompareNotLessThan (Vector512 left, Vector512 right); public static Vector512 CompareNotLessThanOrEqual (Vector512 left, Vector512 right); public static Vector512 CompareOrdered (Vector512 left, Vector512 right); public static Vector512 CompareUnordered (Vector512 left, Vector512 right); public static Vector512 Compare (Vector512 left, Vector512 right, [ConstantExpected(Max = FloatComparisonMode.UnorderedTrueSignaling)] FloatComparisonMode mode); public static Vector512 CompareEqual (Vector512 left, Vector512 right); public static Vector512 CompareGreaterThan (Vector512 left, Vector512 right); public static Vector512 CompareGreaterThanOrEqual (Vector512 left, Vector512 right); public static Vector512 CompareLessThan (Vector512 left, Vector512 right); public static Vector512 CompareLessThanOrEqual (Vector512 left, Vector512 right); public static Vector512 CompareNotEqual (Vector512 left, Vector512 right); public static Vector512 CompareNotGreaterThan (Vector512 left, Vector512 right); public static Vector512 CompareNotGreaterThanOrEqual(Vector512 left, Vector512 right); public static Vector512 CompareNotLessThan (Vector512 left, Vector512 right); public static Vector512 CompareNotLessThanOrEqual (Vector512 left, Vector512 right); public static Vector512 CompareOrdered (Vector512 left, Vector512 right); public static Vector512 CompareUnordered (Vector512 left, Vector512 right); public static Vector512 Compare (Vector512 left, Vector512 right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode); public static Vector512 CompareEqual (Vector512 left, Vector512 right); public static Vector512 CompareGreaterThan (Vector512 left, Vector512 right); public static Vector512 CompareGreaterThanOrEqual(Vector512 left, Vector512 right); public static Vector512 CompareLessThan (Vector512 left, Vector512 right); public static Vector512 CompareLessThanOrEqual (Vector512 left, Vector512 right); public static Vector512 CompareNotEqual (Vector512 left, Vector512 right); public static Vector512 Compare (Vector512 left, Vector512 right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode); public static Vector512 CompareEqual (Vector512 left, Vector512 right); public static Vector512 CompareGreaterThan (Vector512 left, Vector512 right); public static Vector512 CompareGreaterThanOrEqual(Vector512 left, Vector512 right); public static Vector512 CompareLessThan (Vector512 left, Vector512 right); public static Vector512 CompareLessThanOrEqual (Vector512 left, Vector512 right); public static Vector512 CompareNotEqual (Vector512 left, Vector512 right); public static Vector512 Compare (Vector512 left, Vector512 right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode); public static Vector512 CompareEqual (Vector512 left, Vector512 right); public static Vector512 CompareGreaterThan (Vector512 left, Vector512 right); public static Vector512 CompareGreaterThanOrEqual(Vector512 left, Vector512 right); public static Vector512 CompareLessThan (Vector512 left, Vector512 right); public static Vector512 CompareLessThanOrEqual (Vector512 left, Vector512 right); public static Vector512 CompareNotEqual (Vector512 left, Vector512 right); public static Vector512 Compare (Vector512 left, Vector512 right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode); public static Vector512 CompareEqual (Vector512 left, Vector512 right); public static Vector512 CompareGreaterThan (Vector512 left, Vector512 right); public static Vector512 CompareGreaterThanOrEqual(Vector512 left, Vector512 right); public static Vector512 CompareLessThan (Vector512 left, Vector512 right); public static Vector512 CompareLessThanOrEqual (Vector512 left, Vector512 right); public static Vector512 CompareNotEqual (Vector512 left, Vector512 right); public static Vector512 Compress(Vector512 value, Vector512 mask); public static Vector512 Compress(Vector512 value, Vector512 mask); public static Vector512 Compress(Vector512 value, Vector512 mask); public static Vector512 Compress(Vector512 value, Vector512 mask); public static Vector512 Compress(Vector512 value, Vector512 mask); public static Vector512 Compress(Vector512 value, Vector512 mask); public static Vector512 Expand(Vector512 value, Vector512 mask); public static Vector512 Expand(Vector512 value, Vector512 mask); public static Vector512 Expand(Vector512 value, Vector512 mask); public static Vector512 Expand(Vector512 value, Vector512 mask); public static Vector512 Expand(Vector512 value, Vector512 mask); public static Vector512 Expand(Vector512 value, Vector512 mask); public static unsafe Vector512 GatherMaskVector512(Vector512 source, double* baseAddress, Vector512 index, Vector512 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe Vector512 GatherMaskVector512(Vector512 source, int* baseAddress, Vector512 index, Vector512 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe Vector512 GatherMaskVector512(Vector512 source, long* baseAddress, Vector512 index, Vector512 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe Vector512 GatherMaskVector512(Vector512 source, float* baseAddress, Vector512 index, Vector512 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe Vector512 GatherMaskVector512(Vector512 source, uint* baseAddress, Vector512 index, Vector512 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe Vector512 GatherMaskVector512(Vector512 source, ulong* baseAddress, Vector512 index, Vector512 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe Vector512 GatherMaskVector512(Vector512 source, double* baseAddress, Vector512 index, Vector512 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe Vector512 GatherMaskVector512(Vector512 source, int* baseAddress, Vector512 index, Vector512 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe Vector512 GatherMaskVector512(Vector512 source, long* baseAddress, Vector512 index, Vector512 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe Vector512 GatherMaskVector512(Vector512 source, uint* baseAddress, Vector512 index, Vector512 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe Vector512 GatherMaskVector512(Vector512 source, float* baseAddress, Vector512 index, Vector512 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe Vector512 GatherMaskVector512(Vector512 source, ulong* baseAddress, Vector512 index, Vector512 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe Vector512 GatherVector512(double* baseAddress, Vector512 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe Vector512 GatherVector512(int* baseAddress, Vector512 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe Vector512 GatherVector512(long* baseAddress, Vector512 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe Vector512 GatherVector512(float* baseAddress, Vector512 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe Vector512 GatherVector512(uint* baseAddress, Vector512 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe Vector512 GatherVector512(ulong* baseAddress, Vector512 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe Vector512 GatherVector512(double* baseAddress, Vector512 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe Vector512 GatherVector512(int* baseAddress, Vector512 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe Vector512 GatherVector512(long* baseAddress, Vector512 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe Vector512 GatherVector512(float* baseAddress, Vector512 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe Vector512 GatherVector512(uint* baseAddress, Vector512 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe Vector512 GatherVector512(ulong* baseAddress, Vector512 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe Vector512 MaskLoad(double* address, Vector512 mask); public static unsafe Vector512 MaskLoad(int* address, Vector512 mask); public static unsafe Vector512 MaskLoad(long* address, Vector512 mask); public static unsafe Vector512 MaskLoad(float* address, Vector512 mask); public static unsafe Vector512 MaskLoad(uint* address, Vector512 mask); public static unsafe Vector512 MaskLoad(ulong* address, Vector512 mask); public static unsafe void MaskStore(double* address, Vector512 mask, Vector512 source); public static unsafe void MaskStore(int* address, Vector512 mask, Vector512 source); public static unsafe void MaskStore(long* address, Vector512 mask, Vector512 source); public static unsafe void MaskStore(float* address, Vector512 mask, Vector512 source); public static unsafe void MaskStore(uint* address, Vector512 mask, Vector512 source); public static unsafe void MaskStore(ulong* address, Vector512 mask, Vector512 source); public static int MoveMask(Vector256 value); public static int MoveMask(Vector256 value); public static int MoveMask(Vector512 value); public static int MoveMask(Vector512 value); public static int MoveMask(Vector512 value); public static unsafe void ScatterMaskVector512(Vector512 value, double* baseAddress, Vector512 index, Vector512 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector512(Vector512 value, int* baseAddress, Vector512 index, Vector512 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector512(Vector512 value, long* baseAddress, Vector512 index, Vector512 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector512(Vector512 value, float* baseAddress, Vector512 index, Vector512 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector512(Vector512 value, uint* baseAddress, Vector512 index, Vector512 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector512(Vector512 value, ulong* baseAddress, Vector512 index, Vector512 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector512(Vector512 value, double* baseAddress, Vector512 index, Vector512 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector512(Vector512 value, int* baseAddress, Vector512 index, Vector512 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector512(Vector512 value, long* baseAddress, Vector512 index, Vector512 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector512(Vector512 value, uint* baseAddress, Vector512 index, Vector512 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector512(Vector512 value, float* baseAddress, Vector512 index, Vector512 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector512(Vector512 value, ulong* baseAddress, Vector512 index, Vector512 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector512(Vector512 value, double* baseAddress, Vector512 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector512(Vector512 value, int* baseAddress, Vector512 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector512(Vector512 value, long* baseAddress, Vector512 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector512(Vector512 value, float* baseAddress, Vector512 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector512(Vector512 value, uint* baseAddress, Vector512 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector512(Vector512 value, ulong* baseAddress, Vector512 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector512(Vector512 value, double* baseAddress, Vector512 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector512(Vector512 value, int* baseAddress, Vector512 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector512(Vector512 value, long* baseAddress, Vector512 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector512(Vector512 value, float* baseAddress, Vector512 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector512(Vector512 value, uint* baseAddress, Vector512 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector512(Vector512 value, ulong* baseAddress, Vector512 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static bool TestC(Vector512 left, Vector512 right); public static bool TestC(Vector512 left, Vector512 right); public static bool TestC(Vector512 left, Vector512 right); public static bool TestC(Vector512 left, Vector512 right); public static bool TestC(Vector512 left, Vector512 right); public static bool TestC(Vector512 left, Vector512 right); public static bool TestNotZAndNotC(Vector512 left, Vector512 right); public static bool TestNotZAndNotC(Vector512 left, Vector512 right); public static bool TestNotZAndNotC(Vector512 left, Vector512 right); public static bool TestNotZAndNotC(Vector512 left, Vector512 right); public static bool TestNotZAndNotC(Vector512 left, Vector512 right); public static bool TestNotZAndNotC(Vector512 left, Vector512 right); public static bool TestZ(Vector512 left, Vector512 right); public static bool TestZ(Vector512 left, Vector512 right); public static bool TestZ(Vector512 left, Vector512 right); public static bool TestZ(Vector512 left, Vector512 right); public static bool TestZ(Vector512 left, Vector512 right); public static bool TestZ(Vector512 left, Vector512 right); public static partial class VL { public static Vector128 Compare (Vector128 left, Vector128 right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode); public static Vector128 CompareGreaterThanOrEqual(Vector128 left, Vector128 right); public static Vector128 CompareLessThan (Vector128 left, Vector128 right); public static Vector128 CompareLessThanOrEqual (Vector128 left, Vector128 right); public static Vector128 CompareNotEqual (Vector128 left, Vector128 right); public static Vector256 Compare (Vector256 left, Vector256 right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode); public static Vector256 CompareGreaterThanOrEqual(Vector256 left, Vector256 right); public static Vector256 CompareLessThan (Vector256 left, Vector256 right); public static Vector256 CompareLessThanOrEqual (Vector256 left, Vector256 right); public static Vector256 CompareNotEqual (Vector256 left, Vector256 right); public static Vector128 Compare (Vector128 left, Vector128 right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode); public static Vector128 CompareGreaterThanOrEqual(Vector128 left, Vector128 right); public static Vector128 CompareLessThan (Vector128 left, Vector128 right); public static Vector128 CompareLessThanOrEqual (Vector128 left, Vector128 right); public static Vector128 CompareNotEqual (Vector128 left, Vector128 right); public static Vector256 Compare (Vector256 left, Vector256 right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode); public static Vector256 CompareGreaterThanOrEqual(Vector256 left, Vector256 right); public static Vector256 CompareLessThan (Vector256 left, Vector256 right); public static Vector256 CompareLessThanOrEqual (Vector256 left, Vector256 right); public static Vector256 CompareNotEqual (Vector256 left, Vector256 right); public static Vector128 Compare (Vector128 left, Vector128 right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode); public static Vector128 CompareGreaterThan (Vector128 left, Vector128 right); public static Vector128 CompareGreaterThanOrEqual(Vector128 left, Vector128 right); public static Vector128 CompareLessThan (Vector128 left, Vector128 right); public static Vector128 CompareLessThanOrEqual (Vector128 left, Vector128 right); public static Vector128 CompareNotEqual (Vector128 left, Vector128 right); public static Vector256 Compare (Vector256 left, Vector256 right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode); public static Vector256 CompareGreaterThan (Vector256 left, Vector256 right); public static Vector256 CompareGreaterThanOrEqual(Vector256 left, Vector256 right); public static Vector256 CompareLessThan (Vector256 left, Vector256 right); public static Vector256 CompareLessThanOrEqual (Vector256 left, Vector256 right); public static Vector256 CompareNotEqual (Vector256 left, Vector256 right); public static Vector128 Compare (Vector128 left, Vector128 right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode); public static Vector128 CompareGreaterThan (Vector128 left, Vector128 right); public static Vector128 CompareGreaterThanOrEqual(Vector128 left, Vector128 right); public static Vector128 CompareLessThan (Vector128 left, Vector128 right); public static Vector128 CompareLessThanOrEqual (Vector128 left, Vector128 right); public static Vector128 CompareNotEqual (Vector128 left, Vector128 right); public static Vector256 Compare (Vector256 left, Vector256 right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode); public static Vector256 CompareGreaterThan (Vector256 left, Vector256 right); public static Vector256 CompareGreaterThanOrEqual(Vector256 left, Vector256 right); public static Vector256 CompareLessThan (Vector256 left, Vector256 right); public static Vector256 CompareLessThanOrEqual (Vector256 left, Vector256 right); public static Vector256 CompareNotEqual (Vector256 left, Vector256 right); public static Vector128 Compress(Vector128 value, Vector128 mask); public static Vector128 Compress(Vector128 value, Vector128 mask); public static Vector128 Compress(Vector128 value, Vector128 mask); public static Vector128 Compress(Vector128 value, Vector128 mask); public static Vector128 Compress(Vector128 value, Vector128 mask); public static Vector128 Compress(Vector128 value, Vector128 mask); public static Vector256 Compress(Vector256 value, Vector256 mask); public static Vector256 Compress(Vector256 value, Vector256 mask); public static Vector256 Compress(Vector256 value, Vector256 mask); public static Vector256 Compress(Vector256 value, Vector256 mask); public static Vector256 Compress(Vector256 value, Vector256 mask); public static Vector256 Compress(Vector256 value, Vector256 mask); public static Vector128 Expand(Vector128 value, Vector128 mask); public static Vector128 Expand(Vector128 value, Vector128 mask); public static Vector128 Expand(Vector128 value, Vector128 mask); public static Vector128 Expand(Vector128 value, Vector128 mask); public static Vector128 Expand(Vector128 value, Vector128 mask); public static Vector128 Expand(Vector128 value, Vector128 mask); public static Vector256 Expand(Vector256 value, Vector256 mask); public static Vector256 Expand(Vector256 value, Vector256 mask); public static Vector256 Expand(Vector256 value, Vector256 mask); public static Vector256 Expand(Vector256 value, Vector256 mask); public static Vector256 Expand(Vector256 value, Vector256 mask); public static Vector256 Expand(Vector256 value, Vector256 mask); public static unsafe void ScatterMaskVector128(Vector128 value, double* baseAddress, Vector128 index, Vector128 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector128(Vector128 value, int* baseAddress, Vector128 index, Vector128 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector128(Vector128 value, long* baseAddress, Vector128 index, Vector128 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector128(Vector128 value, float* baseAddress, Vector128 index, Vector128 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector128(Vector128 value, uint* baseAddress, Vector128 index, Vector128 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector128(Vector128 value, ulong* baseAddress, Vector128 index, Vector128 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector256(Vector256 value, double* baseAddress, Vector256 index, Vector256 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector256(Vector256 value, int* baseAddress, Vector256 index, Vector256 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector256(Vector256 value, long* baseAddress, Vector256 index, Vector256 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector256(Vector256 value, float* baseAddress, Vector256 index, Vector256 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector256(Vector256 value, uint* baseAddress, Vector256 index, Vector256 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector256(Vector256 value, ulong* baseAddress, Vector256 index, Vector256 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector128(Vector128 value, double* baseAddress, Vector128 index, Vector128 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector128(Vector128 value, int* baseAddress, Vector128 index, Vector128 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector128(Vector128 value, long* baseAddress, Vector128 index, Vector128 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector128(Vector128 value, uint* baseAddress, Vector128 index, Vector128 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector128(Vector128 value, float* baseAddress, Vector128 index, Vector128 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector128(Vector128 value, ulong* baseAddress, Vector128 index, Vector128 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector256(Vector256 value, double* baseAddress, Vector256 index, Vector256 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector256(Vector256 value, int* baseAddress, Vector256 index, Vector256 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector256(Vector256 value, long* baseAddress, Vector256 index, Vector256 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector256(Vector256 value, uint* baseAddress, Vector256 index, Vector256 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector256(Vector256 value, float* baseAddress, Vector256 index, Vector256 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterMaskVector256(Vector256 value, ulong* baseAddress, Vector256 index, Vector256 mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector128(Vector128 value, double* baseAddress, Vector128 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector128(Vector128 value, int* baseAddress, Vector128 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector128(Vector128 value, long* baseAddress, Vector128 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector128(Vector128 value, float* baseAddress, Vector128 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector128(Vector128 value, uint* baseAddress, Vector128 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector128(Vector128 value, ulong* baseAddress, Vector128 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector256(Vector256 value, double* baseAddress, Vector256 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector256(Vector256 value, int* baseAddress, Vector256 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector256(Vector256 value, long* baseAddress, Vector256 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector256(Vector256 value, float* baseAddress, Vector256 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector256(Vector256 value, uint* baseAddress, Vector256 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector256(Vector256 value, ulong* baseAddress, Vector256 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector128(Vector128 value, double* baseAddress, Vector128 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector128(Vector128 value, int* baseAddress, Vector128 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector128(Vector128 value, long* baseAddress, Vector128 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector128(Vector128 value, float* baseAddress, Vector128 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector128(Vector128 value, uint* baseAddress, Vector128 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector128(Vector128 value, ulong* baseAddress, Vector128 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector256(Vector256 value, double* baseAddress, Vector256 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector256(Vector256 value, int* baseAddress, Vector256 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector256(Vector256 value, long* baseAddress, Vector256 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector256(Vector256 value, float* baseAddress, Vector256 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector256(Vector256 value, uint* baseAddress, Vector256 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); public static unsafe void ScatterVector256(Vector256 value, ulong* baseAddress, Vector256 index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale); } } public static partial class Avx512BW { public static Vector512 BlendVariable(Vector512 left, Vector512 right, Vector512 mask); public static Vector512 BlendVariable(Vector512 left, Vector512 right, Vector512 mask); public static Vector512 BlendVariable(Vector512 left, Vector512 right, Vector512 mask); public static Vector512 BlendVariable(Vector512 left, Vector512 right, Vector512 mask); public static Vector512 Compare (Vector512 left, Vector512 right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode); public static Vector512 CompareEqual (Vector512 left, Vector512 right); public static Vector512 CompareGreaterThan (Vector512 left, Vector512 right); public static Vector512 CompareGreaterThanOrEqual(Vector512 left, Vector512 right); public static Vector512 CompareLessThan (Vector512 left, Vector512 right); public static Vector512 CompareLessThanOrEqual (Vector512 left, Vector512 right); public static Vector512 CompareNotEqual (Vector512 left, Vector512 right); public static Vector512 Compare (Vector512 left, Vector512 right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode); public static Vector512 CompareEqual (Vector512 left, Vector512 right); public static Vector512 CompareGreaterThan (Vector512 left, Vector512 right); public static Vector512 CompareGreaterThanOrEqual(Vector512 left, Vector512 right); public static Vector512 CompareLessThan (Vector512 left, Vector512 right); public static Vector512 CompareLessThanOrEqual (Vector512 left, Vector512 right); public static Vector512 CompareNotEqual (Vector512 left, Vector512 right); public static Vector512 Compare (Vector512 left, Vector512 right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode); public static Vector512 CompareEqual (Vector512 left, Vector512 right); public static Vector512 CompareGreaterThan (Vector512 left, Vector512 right); public static Vector512 CompareGreaterThanOrEqual(Vector512 left, Vector512 right); public static Vector512 CompareLessThan (Vector512 left, Vector512 right); public static Vector512 CompareLessThanOrEqual (Vector512 left, Vector512 right); public static Vector512 CompareNotEqual (Vector512 left, Vector512 right); public static Vector512 Compare (Vector512 left, Vector512 right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode); public static Vector512 CompareEqual (Vector512 left, Vector512 right); public static Vector512 CompareGreaterThan (Vector512 left, Vector512 right); public static Vector512 CompareGreaterThanOrEqual(Vector512 left, Vector512 right); public static Vector512 CompareLessThan (Vector512 left, Vector512 right); public static Vector512 CompareLessThanOrEqual (Vector512 left, Vector512 right); public static Vector512 CompareNotEqual (Vector512 left, Vector512 right); public static int MoveMask(Vector512 value); public static int MoveMask(Vector512 value); public static long MoveMask(Vector512 value); public static long MoveMask(Vector512 value); public static bool TestC(Vector512 left, Vector512 right); public static bool TestC(Vector512 left, Vector512 right); public static bool TestC(Vector512 left, Vector512 right); public static bool TestC(Vector512 left, Vector512 right); public static bool TestNotZAndNotC(Vector512 left, Vector512 right); public static bool TestNotZAndNotC(Vector512 left, Vector512 right); public static bool TestNotZAndNotC(Vector512 left, Vector512 right); public static bool TestNotZAndNotC(Vector512 left, Vector512 right); public static bool TestZ(Vector512 left, Vector512 right); public static bool TestZ(Vector512 left, Vector512 right); public static bool TestZ(Vector512 left, Vector512 right); public static bool TestZ(Vector512 left, Vector512 right); public static partial class VL { public static Vector128 Compare (Vector128 left, Vector128 right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode); public static Vector128 CompareGreaterThan (Vector128 left, Vector128 right); public static Vector128 CompareGreaterThanOrEqual(Vector128 left, Vector128 right); public static Vector128 CompareLessThan (Vector128 left, Vector128 right); public static Vector128 CompareLessThanOrEqual (Vector128 left, Vector128 right); public static Vector128 CompareNotEqual (Vector128 left, Vector128 right); public static Vector256 Compare (Vector256 left, Vector256 right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode); public static Vector256 CompareGreaterThan (Vector256 left, Vector256 right); public static Vector256 CompareGreaterThanOrEqual(Vector256 left, Vector256 right); public static Vector256 CompareLessThan (Vector256 left, Vector256 right); public static Vector256 CompareLessThanOrEqual (Vector256 left, Vector256 right); public static Vector256 CompareNotEqual (Vector256 left, Vector256 right); public static Vector128 Compare (Vector128 left, Vector128 right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode); public static Vector128 CompareGreaterThanOrEqual(Vector128 left, Vector128 right); public static Vector128 CompareLessThan (Vector128 left, Vector128 right); public static Vector128 CompareLessThanOrEqual (Vector128 left, Vector128 right); public static Vector128 CompareNotEqual (Vector128 left, Vector128 right); public static Vector256 Compare (Vector256 left, Vector256 right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode); public static Vector256 CompareGreaterThanOrEqual(Vector256 left, Vector256 right); public static Vector256 CompareLessThan (Vector256 left, Vector256 right); public static Vector256 CompareLessThanOrEqual (Vector256 left, Vector256 right); public static Vector256 CompareNotEqual (Vector256 left, Vector256 right); public static Vector128 Compare (Vector128 left, Vector128 right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode); public static Vector128 CompareGreaterThanOrEqual(Vector128 left, Vector128 right); public static Vector128 CompareLessThan (Vector128 left, Vector128 right); public static Vector128 CompareLessThanOrEqual (Vector128 left, Vector128 right); public static Vector128 CompareNotEqual (Vector128 left, Vector128 right); public static Vector256 Compare (Vector256 left, Vector256 right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode); public static Vector256 CompareGreaterThanOrEqual(Vector256 left, Vector256 right); public static Vector256 CompareLessThan (Vector256 left, Vector256 right); public static Vector256 CompareLessThanOrEqual (Vector256 left, Vector256 right); public static Vector256 CompareNotEqual (Vector256 left, Vector256 right); public static Vector128 Compare (Vector128 left, Vector128 right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode); public static Vector128 CompareGreaterThan (Vector128 left, Vector128 right); public static Vector128 CompareGreaterThanOrEqual(Vector128 left, Vector128 right); public static Vector128 CompareLessThan (Vector128 left, Vector128 right); public static Vector128 CompareLessThanOrEqual (Vector128 left, Vector128 right); public static Vector128 CompareNotEqual (Vector128 left, Vector128 right); public static Vector256 Compare (Vector256 left, Vector256 right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode); public static Vector256 CompareGreaterThan (Vector256 left, Vector256 right); public static Vector256 CompareGreaterThanOrEqual(Vector256 left, Vector256 right); public static Vector256 CompareLessThan (Vector256 left, Vector256 right); public static Vector256 CompareLessThanOrEqual (Vector256 left, Vector256 right); public static Vector256 CompareNotEqual (Vector256 left, Vector256 right); } } public static partial class Avx512DQ { public static Vector512 Classify(Vector512 value, [ConstantExpected] byte control); public static Vector512 Classify(Vector512 value, [ConstantExpected] byte control); public static Vector128 ClassifyScalar(Vector128 value, [ConstantExpected] byte control); public static Vector128 ClassifyScalar(Vector128 value, [ConstantExpected] byte control); public static int MoveMask(Vector128 value); public static int MoveMask(Vector128 value); public static int MoveMask(Vector256 value); public static int MoveMask(Vector256 value); public static int MoveMask(Vector512 value); public static int MoveMask(Vector512 value); public static int MoveMask(Vector512 value); public static partial class VL { public static Vector128 Classify(Vector128 value, [ConstantExpected] byte control); public static Vector128 Classify(Vector128 value, [ConstantExpected] byte control); public static Vector256 Classify(Vector256 value, [ConstantExpected] byte control); public static Vector256 Classify(Vector256 value, [ConstantExpected] byte control); } } public abstract class Avx512Vbmi2 : Avx512BW { public static new bool IsSupported { get; } public static Vector512 Compress(Vector512 value, Vector512 mask); public static Vector512 Compress(Vector512 value, Vector512 mask); public static Vector512 Compress(Vector512 value, Vector512 mask); public static Vector512 Compress(Vector512 value, Vector512 mask); public static Vector512 Expand(Vector512 value, Vector512 mask); public static Vector512 Expand(Vector512 value, Vector512 mask); public static Vector512 Expand(Vector512 value, Vector512 mask); public static Vector512 Expand(Vector512 value, Vector512 mask); public abstract class VL : Avx512BW.VL { public static new bool IsSupported { get; } public static Vector128 Compress(Vector128 value, Vector128 mask); public static Vector128 Compress(Vector128 value, Vector128 mask); public static Vector128 Compress(Vector128 value, Vector128 mask); public static Vector128 Compress(Vector128 value, Vector128 mask); public static Vector256 Compress(Vector256 value, Vector256 mask); public static Vector256 Compress(Vector256 value, Vector256 mask); public static Vector256 Compress(Vector256 value, Vector256 mask); public static Vector256 Compress(Vector256 value, Vector256 mask); public static Vector128 Expand(Vector128 value, Vector128 mask); public static Vector128 Expand(Vector128 value, Vector128 mask); public static Vector128 Expand(Vector128 value, Vector128 mask); public static Vector128 Expand(Vector128 value, Vector128 mask); public static Vector256 Expand(Vector256 value, Vector256 mask); public static Vector256 Expand(Vector256 value, Vector256 mask); public static Vector256 Expand(Vector256 value, Vector256 mask); public static Vector256 Expand(Vector256 value, Vector256 mask); } public abstract class X64 : Avx512BW.X64 { public static new bool IsSupported { get; } } } ```
Author: tannergooding
Assignees: -
Labels: `area-System.Runtime.Intrinsics`, `blocking`, `api-ready-for-review`, `arch-avx512`
Milestone: 8.0.0
tannergooding commented 1 year ago

This replaces https://github.com/dotnet/runtime/issues/74613 which should no longer be marked as api-approved if this is approved instead.

This does not remove the ability to still do https://github.com/dotnet/runtime/issues/74613 in the future if the outlook changes.

terrajobst commented 1 year ago

Video

namespace System.Runtime.Intrinsics.X86;

public static partial class Avx512F
{
    public static Vector512<double> BlendVariable(Vector512<double> left, Vector512<double> right, Vector512<double> mask);
    public static Vector512<int>    BlendVariable(Vector512<int>    left, Vector512<int>    right, Vector512<int>    mask);
    public static Vector512<long>   BlendVariable(Vector512<long>   left, Vector512<long>   right, Vector512<long>   mask);
    public static Vector512<float>  BlendVariable(Vector512<float>  left, Vector512<float>  right, Vector512<float>  mask);
    public static Vector512<uint>   BlendVariable(Vector512<uint>   left, Vector512<uint>   right, Vector512<uint>   mask);
    public static Vector512<ulong>  BlendVariable(Vector512<ulong>  left, Vector512<ulong>  right, Vector512<ulong>  mask);

    public static Vector512<double> Compare                     (Vector512<double> left, Vector512<double> right, [ConstantExpected(Max = FloatComparisonMode.UnorderedTrueSignaling)] FloatComparisonMode mode);
    public static Vector512<double> CompareEqual                (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareGreaterThan          (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareGreaterThanOrEqual   (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareLessThan             (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareLessThanOrEqual      (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareNotEqual             (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareNotGreaterThan       (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareNotGreaterThanOrEqual(Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareNotLessThan          (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareNotLessThanOrEqual   (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareOrdered              (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareUnordered            (Vector512<double> left, Vector512<double> right);

    public static Vector512<float> Compare                     (Vector512<float> left, Vector512<float> right, [ConstantExpected(Max = FloatComparisonMode.UnorderedTrueSignaling)] FloatComparisonMode mode);
    public static Vector512<float> CompareEqual                (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareGreaterThan          (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareGreaterThanOrEqual   (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareLessThan             (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareLessThanOrEqual      (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareNotEqual             (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareNotGreaterThan       (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareNotGreaterThanOrEqual(Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareNotLessThan          (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareNotLessThanOrEqual   (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareOrdered              (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareUnordered            (Vector512<float> left, Vector512<float> right);

    public static Vector512<int> CompareEqual             (Vector512<int> left, Vector512<int> right);
    public static Vector512<int> CompareGreaterThan       (Vector512<int> left, Vector512<int> right);
    public static Vector512<int> CompareGreaterThanOrEqual(Vector512<int> left, Vector512<int> right);
    public static Vector512<int> CompareLessThan          (Vector512<int> left, Vector512<int> right);
    public static Vector512<int> CompareLessThanOrEqual   (Vector512<int> left, Vector512<int> right);
    public static Vector512<int> CompareNotEqual          (Vector512<int> left, Vector512<int> right);

    public static Vector512<long> CompareEqual             (Vector512<long> left, Vector512<long> right);
    public static Vector512<long> CompareGreaterThan       (Vector512<long> left, Vector512<long> right);
    public static Vector512<long> CompareGreaterThanOrEqual(Vector512<long> left, Vector512<long> right);
    public static Vector512<long> CompareLessThan          (Vector512<long> left, Vector512<long> right);
    public static Vector512<long> CompareLessThanOrEqual   (Vector512<long> left, Vector512<long> right);
    public static Vector512<long> CompareNotEqual          (Vector512<long> left, Vector512<long> right);

    public static Vector512<uint> CompareEqual             (Vector512<uint> left, Vector512<uint> right);
    public static Vector512<uint> CompareGreaterThan       (Vector512<uint> left, Vector512<uint> right);
    public static Vector512<uint> CompareGreaterThanOrEqual(Vector512<uint> left, Vector512<uint> right);
    public static Vector512<uint> CompareLessThan          (Vector512<uint> left, Vector512<uint> right);
    public static Vector512<uint> CompareLessThanOrEqual   (Vector512<uint> left, Vector512<uint> right);
    public static Vector512<uint> CompareNotEqual          (Vector512<uint> left, Vector512<uint> right);

    public static Vector512<ulong> CompareEqual             (Vector512<ulong> left, Vector512<ulong> right);
    public static Vector512<ulong> CompareGreaterThan       (Vector512<ulong> left, Vector512<ulong> right);
    public static Vector512<ulong> CompareGreaterThanOrEqual(Vector512<ulong> left, Vector512<ulong> right);
    public static Vector512<ulong> CompareLessThan          (Vector512<ulong> left, Vector512<ulong> right);
    public static Vector512<ulong> CompareLessThanOrEqual   (Vector512<ulong> left, Vector512<ulong> right);
    public static Vector512<ulong> CompareNotEqual          (Vector512<ulong> left, Vector512<ulong> right);

    public static Vector512<double> Compress(Vector512<double> value, Vector512<double> mask);
    public static Vector512<int>    Compress(Vector512<int>    value, Vector512<int>    mask);
    public static Vector512<long>   Compress(Vector512<long>   value, Vector512<long>   mask);
    public static Vector512<float>  Compress(Vector512<float>  value, Vector512<float>  mask);
    public static Vector512<uint>   Compress(Vector512<uint>   value, Vector512<uint>   mask);
    public static Vector512<ulong>  Compress(Vector512<ulong>  value, Vector512<ulong>  mask);

    public static Vector512<double> Expand(Vector512<double> value, Vector512<double> mask);
    public static Vector512<int>    Expand(Vector512<int>    value, Vector512<int>    mask);
    public static Vector512<long>   Expand(Vector512<long>   value, Vector512<long>   mask);
    public static Vector512<float>  Expand(Vector512<float>  value, Vector512<float>  mask);
    public static Vector512<uint>   Expand(Vector512<uint>   value, Vector512<uint>   mask);
    public static Vector512<ulong>  Expand(Vector512<ulong>  value, Vector512<ulong>  mask);

    public static unsafe Vector512<double> GatherMaskVector512(Vector512<double> source, double* baseAddress, Vector512<int> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherMaskVector512(Vector512<int>    source, int*    baseAddress, Vector512<int> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherMaskVector512(Vector512<long>   source, long*   baseAddress, Vector512<int> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherMaskVector512(Vector512<float>  source, float*  baseAddress, Vector512<int> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherMaskVector512(Vector512<uint>   source, uint*   baseAddress, Vector512<int> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherMaskVector512(Vector512<ulong>  source, ulong*  baseAddress, Vector512<int> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> GatherMaskVector512(Vector512<double> source, double* baseAddress, Vector512<long> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherMaskVector512(Vector512<int>    source, int*    baseAddress, Vector512<long> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherMaskVector512(Vector512<long>   source, long*   baseAddress, Vector512<long> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherMaskVector512(Vector512<uint>   source, uint*   baseAddress, Vector512<long> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherMaskVector512(Vector512<float>  source, float*  baseAddress, Vector512<long> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherMaskVector512(Vector512<ulong>  source, ulong*  baseAddress, Vector512<long> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> GatherVector512(double* baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherVector512(int*    baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherVector512(long*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherVector512(float*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherVector512(uint*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherVector512(ulong*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> GatherVector512(double* baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherVector512(int*    baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherVector512(long*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherVector512(float*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherVector512(uint*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherVector512(ulong*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> MaskLoad(double* address, Vector512<double> mask);
    public static unsafe Vector512<int>    MaskLoad(int*    address, Vector512<int>    mask);
    public static unsafe Vector512<long>   MaskLoad(long*   address, Vector512<long>   mask);
    public static unsafe Vector512<float>  MaskLoad(float*  address, Vector512<float>  mask);
    public static unsafe Vector512<uint>   MaskLoad(uint*   address, Vector512<uint>   mask);
    public static unsafe Vector512<ulong>  MaskLoad(ulong*  address, Vector512<ulong>  mask);

    public static unsafe void MaskStore(double* address, Vector512<double> mask, Vector512<double> source);
    public static unsafe void MaskStore(int*    address, Vector512<int>    mask, Vector512<int>    source);
    public static unsafe void MaskStore(long*   address, Vector512<long>   mask, Vector512<long>   source);
    public static unsafe void MaskStore(float*  address, Vector512<float>  mask, Vector512<float>  source);
    public static unsafe void MaskStore(uint*   address, Vector512<uint>   mask, Vector512<uint>   source);
    public static unsafe void MaskStore(ulong*  address, Vector512<ulong>  mask, Vector512<ulong>  source);

    public static int MoveMask(Vector256<short>  value);
    public static int MoveMask(Vector256<ushort> value);
    public static int MoveMask(Vector512<int>    value);
    public static int MoveMask(Vector512<float>  value);
    public static int MoveMask(Vector512<uint>   value);

    public static unsafe void ScatterMaskVector512(Vector512<double> value, double* baseAddress, Vector512<int> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<int>    value, int*    baseAddress, Vector512<int> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<long>   value, long*   baseAddress, Vector512<int> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<float>  value, float*  baseAddress, Vector512<int> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<uint>   value, uint*   baseAddress, Vector512<int> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<int> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe void ScatterMaskVector512(Vector512<double> value, double* baseAddress, Vector512<long> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<int>    value, int*    baseAddress, Vector512<long> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<long>   value, long*   baseAddress, Vector512<long> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<float>  value, uint*   baseAddress, Vector512<long> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<uint>   value, float*  baseAddress, Vector512<long> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<long> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe void ScatterVector512(Vector512<double> value, double* baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<int>    value, int*    baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<long>   value, long*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<float>  value, float*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<uint>   value, uint*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe void ScatterVector512(Vector512<double> value, double* baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<int>    value, int*    baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<long>   value, long*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<float>  value, float*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<uint>   value, uint*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static bool TestC(Vector512<double> left, Vector512<double> right);
    public static bool TestC(Vector512<int>    left, Vector512<int>    right);
    public static bool TestC(Vector512<long>   left, Vector512<long>   right);
    public static bool TestC(Vector512<float>  left, Vector512<float>  right);
    public static bool TestC(Vector512<uint>   left, Vector512<uint>   right);
    public static bool TestC(Vector512<ulong>  left, Vector512<ulong>  right);

    public static bool TestNotZAndNotC(Vector512<double> left, Vector512<double> right);
    public static bool TestNotZAndNotC(Vector512<int>    left, Vector512<int>    right);
    public static bool TestNotZAndNotC(Vector512<long>   left, Vector512<long>   right);
    public static bool TestNotZAndNotC(Vector512<float>  left, Vector512<float>  right);
    public static bool TestNotZAndNotC(Vector512<uint>   left, Vector512<uint>   right);
    public static bool TestNotZAndNotC(Vector512<ulong>  left, Vector512<ulong>  right);

    public static bool TestZ(Vector512<double> left, Vector512<double> right);
    public static bool TestZ(Vector512<int>    left, Vector512<int>    right);
    public static bool TestZ(Vector512<long>   left, Vector512<long>   right);
    public static bool TestZ(Vector512<float>  left, Vector512<float>  right);
    public static bool TestZ(Vector512<uint>   left, Vector512<uint>   right);
    public static bool TestZ(Vector512<ulong>  left, Vector512<ulong>  right);

    public static partial class VL
    {
        public static Vector128<int> CompareGreaterThanOrEqual(Vector128<int> left, Vector128<int> right);
        public static Vector128<int> CompareLessThan          (Vector128<int> left, Vector128<int> right);
        public static Vector128<int> CompareLessThanOrEqual   (Vector128<int> left, Vector128<int> right);
        public static Vector128<int> CompareNotEqual          (Vector128<int> left, Vector128<int> right);

        public static Vector256<int> CompareGreaterThanOrEqual(Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareLessThan          (Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareLessThanOrEqual   (Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareNotEqual          (Vector256<int> left, Vector256<int> right);

        public static Vector128<long> CompareGreaterThanOrEqual(Vector128<long> left, Vector128<long> right);
        public static Vector128<long> CompareLessThan          (Vector128<long> left, Vector128<long> right);
        public static Vector128<long> CompareLessThanOrEqual   (Vector128<long> left, Vector128<long> right);
        public static Vector128<long> CompareNotEqual          (Vector128<long> left, Vector128<long> right);
        public static Vector256<long> CompareGreaterThanOrEqual(Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareLessThan          (Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareLessThanOrEqual   (Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareNotEqual          (Vector256<long> left, Vector256<long> right);

        public static Vector128<uint> CompareGreaterThan       (Vector128<uint> left, Vector128<uint> right);
        public static Vector128<uint> CompareGreaterThanOrEqual(Vector128<uint> left, Vector128<uint> right);
        public static Vector128<uint> CompareLessThan          (Vector128<uint> left, Vector128<uint> right);
        public static Vector128<uint> CompareLessThanOrEqual   (Vector128<uint> left, Vector128<uint> right);
        public static Vector128<uint> CompareNotEqual          (Vector128<uint> left, Vector128<uint> right);
        public static Vector256<uint> CompareGreaterThan       (Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareGreaterThanOrEqual(Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareLessThan          (Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareLessThanOrEqual   (Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareNotEqual          (Vector256<uint> left, Vector256<uint> right);

        public static Vector128<ulong> CompareGreaterThan       (Vector128<ulong> left, Vector128<ulong> right);
        public static Vector128<ulong> CompareGreaterThanOrEqual(Vector128<ulong> left, Vector128<ulong> right);
        public static Vector128<ulong> CompareLessThan          (Vector128<ulong> left, Vector128<ulong> right);
        public static Vector128<ulong> CompareLessThanOrEqual   (Vector128<ulong> left, Vector128<ulong> right);
        public static Vector128<ulong> CompareNotEqual          (Vector128<ulong> left, Vector128<ulong> right);
        public static Vector256<ulong> CompareGreaterThan       (Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareGreaterThanOrEqual(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareLessThan          (Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareLessThanOrEqual   (Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareNotEqual          (Vector256<ulong> left, Vector256<ulong> right);

        public static Vector128<double> Compress(Vector128<double> value, Vector128<double> mask);
        public static Vector128<int>    Compress(Vector128<int>    value, Vector128<int>    mask);
        public static Vector128<long>   Compress(Vector128<long>   value, Vector128<long>   mask);
        public static Vector128<float>  Compress(Vector128<float>  value, Vector128<float>  mask);
        public static Vector128<uint>   Compress(Vector128<uint>   value, Vector128<uint>   mask);
        public static Vector128<ulong>  Compress(Vector128<ulong>  value, Vector128<ulong>  mask);
        public static Vector256<double> Compress(Vector256<double> value, Vector256<double> mask);
        public static Vector256<int>    Compress(Vector256<int>    value, Vector256<int>    mask);
        public static Vector256<long>   Compress(Vector256<long>   value, Vector256<long>   mask);
        public static Vector256<float>  Compress(Vector256<float>  value, Vector256<float>  mask);
        public static Vector256<uint>   Compress(Vector256<uint>   value, Vector256<uint>   mask);
        public static Vector256<ulong>  Compress(Vector256<ulong>  value, Vector256<ulong>  mask);

        public static Vector128<double> Expand(Vector128<double> value, Vector128<double> mask);
        public static Vector128<int>    Expand(Vector128<int>    value, Vector128<int>    mask);
        public static Vector128<long>   Expand(Vector128<long>   value, Vector128<long>   mask);
        public static Vector128<float>  Expand(Vector128<float>  value, Vector128<float>  mask);
        public static Vector128<uint>   Expand(Vector128<uint>   value, Vector128<uint>   mask);
        public static Vector128<ulong>  Expand(Vector128<ulong>  value, Vector128<ulong>  mask);
        public static Vector256<double> Expand(Vector256<double> value, Vector256<double> mask);
        public static Vector256<int>    Expand(Vector256<int>    value, Vector256<int>    mask);
        public static Vector256<long>   Expand(Vector256<long>   value, Vector256<long>   mask);
        public static Vector256<float>  Expand(Vector256<float>  value, Vector256<float>  mask);
        public static Vector256<uint>   Expand(Vector256<uint>   value, Vector256<uint>   mask);
        public static Vector256<ulong>  Expand(Vector256<ulong>  value, Vector256<ulong>  mask);

        public static unsafe void ScatterMaskVector128(Vector128<double> value, double* baseAddress, Vector128<int> index, Vector128<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<int>    value, int*    baseAddress, Vector128<int> index, Vector128<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<long>   value, long*   baseAddress, Vector128<int> index, Vector128<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<float>  value, float*  baseAddress, Vector128<int> index, Vector128<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<uint>   value, uint*   baseAddress, Vector128<int> index, Vector128<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<int> index, Vector128<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<double> value, double* baseAddress, Vector256<int> index, Vector256<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<int>    value, int*    baseAddress, Vector256<int> index, Vector256<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<long>   value, long*   baseAddress, Vector256<int> index, Vector256<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<float>  value, float*  baseAddress, Vector256<int> index, Vector256<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<uint>   value, uint*   baseAddress, Vector256<int> index, Vector256<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<int> index, Vector256<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

        public static unsafe void ScatterMaskVector128(Vector128<double> value, double* baseAddress, Vector128<long> index, Vector128<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<int>    value, int*    baseAddress, Vector128<long> index, Vector128<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<long>   value, long*   baseAddress, Vector128<long> index, Vector128<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<float>  value, uint*   baseAddress, Vector128<long> index, Vector128<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<uint>   value, float*  baseAddress, Vector128<long> index, Vector128<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<long> index, Vector128<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<double> value, double* baseAddress, Vector256<long> index, Vector256<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<int>    value, int*    baseAddress, Vector256<long> index, Vector256<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<long>   value, long*   baseAddress, Vector256<long> index, Vector256<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<float>  value, uint*   baseAddress, Vector256<long> index, Vector256<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<uint>   value, float*  baseAddress, Vector256<long> index, Vector256<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<long> index, Vector256<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

        public static unsafe void ScatterVector128(Vector128<double> value, double* baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<int>    value, int*    baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<long>   value, long*   baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<float>  value, float*  baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<uint>   value, uint*   baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<double> value, double* baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<int>    value, int*    baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<long>   value, long*   baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<float>  value, float*  baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<uint>   value, uint*   baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

        public static unsafe void ScatterVector128(Vector128<double> value, double* baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<int>    value, int*    baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<long>   value, long*   baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<float>  value, float*  baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<uint>   value, uint*   baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<double> value, double* baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<int>    value, int*    baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<long>   value, long*   baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<float>  value, float*  baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<uint>   value, uint*   baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    }
}

public static partial class Avx512BW
{
    public static Vector512<byte>   BlendVariable(Vector512<byte>   left, Vector512<byte>   right, Vector512<byte>   mask);
    public static Vector512<short>  BlendVariable(Vector512<short>  left, Vector512<short>  right, Vector512<short>  mask);
    public static Vector512<sbyte>  BlendVariable(Vector512<sbyte>  left, Vector512<sbyte>  right, Vector512<sbyte>  mask);
    public static Vector512<ushort> BlendVariable(Vector512<ushort> left, Vector512<ushort> right, Vector512<ushort> mask);

    public static Vector512<byte> CompareEqual             (Vector512<byte> left, Vector512<byte> right);
    public static Vector512<byte> CompareGreaterThan       (Vector512<byte> left, Vector512<byte> right);
    public static Vector512<byte> CompareGreaterThanOrEqual(Vector512<byte> left, Vector512<byte> right);
    public static Vector512<byte> CompareLessThan          (Vector512<byte> left, Vector512<byte> right);
    public static Vector512<byte> CompareLessThanOrEqual   (Vector512<byte> left, Vector512<byte> right);
    public static Vector512<byte> CompareNotEqual          (Vector512<byte> left, Vector512<byte> right);

    public static Vector512<short> CompareEqual             (Vector512<short> left, Vector512<short> right);
    public static Vector512<short> CompareGreaterThan       (Vector512<short> left, Vector512<short> right);
    public static Vector512<short> CompareGreaterThanOrEqual(Vector512<short> left, Vector512<short> right);
    public static Vector512<short> CompareLessThan          (Vector512<short> left, Vector512<short> right);
    public static Vector512<short> CompareLessThanOrEqual   (Vector512<short> left, Vector512<short> right);
    public static Vector512<short> CompareNotEqual          (Vector512<short> left, Vector512<short> right);

    public static Vector512<sbyte> CompareEqual             (Vector512<sbyte> left, Vector512<sbyte> right);
    public static Vector512<sbyte> CompareGreaterThan       (Vector512<sbyte> left, Vector512<sbyte> right);
    public static Vector512<sbyte> CompareGreaterThanOrEqual(Vector512<sbyte> left, Vector512<sbyte> right);
    public static Vector512<sbyte> CompareLessThan          (Vector512<sbyte> left, Vector512<sbyte> right);
    public static Vector512<sbyte> CompareLessThanOrEqual   (Vector512<sbyte> left, Vector512<sbyte> right);
    public static Vector512<sbyte> CompareNotEqual          (Vector512<sbyte> left, Vector512<sbyte> right);

    public static Vector512<ushort> CompareEqual             (Vector512<ushort> left, Vector512<ushort> right);
    public static Vector512<ushort> CompareGreaterThan       (Vector512<ushort> left, Vector512<ushort> right);
    public static Vector512<ushort> CompareGreaterThanOrEqual(Vector512<ushort> left, Vector512<ushort> right);
    public static Vector512<ushort> CompareLessThan          (Vector512<ushort> left, Vector512<ushort> right);
    public static Vector512<ushort> CompareLessThanOrEqual   (Vector512<ushort> left, Vector512<ushort> right);
    public static Vector512<ushort> CompareNotEqual          (Vector512<ushort> left, Vector512<ushort> right);

    public static int MoveMask(Vector512<short>  value);
    public static int MoveMask(Vector512<ushort> value);

    public static long MoveMask(Vector512<byte>  value);
    public static long MoveMask(Vector512<sbyte> value);

    public static bool TestC(Vector512<byte>   left, Vector512<byte>   right);
    public static bool TestC(Vector512<short>  left, Vector512<short>  right);
    public static bool TestC(Vector512<sbyte>  left, Vector512<sbyte>  right);
    public static bool TestC(Vector512<ushort> left, Vector512<ushort> right);

    public static bool TestNotZAndNotC(Vector512<byte>   left, Vector512<byte>   right);
    public static bool TestNotZAndNotC(Vector512<short>  left, Vector512<short>  right);
    public static bool TestNotZAndNotC(Vector512<sbyte>  left, Vector512<sbyte>  right);
    public static bool TestNotZAndNotC(Vector512<ushort> left, Vector512<ushort> right);

    public static bool TestZ(Vector512<byte>   left, Vector512<byte>   right);
    public static bool TestZ(Vector512<short>  left, Vector512<short>  right);
    public static bool TestZ(Vector512<sbyte>  left, Vector512<sbyte>  right);
    public static bool TestZ(Vector512<ushort> left, Vector512<ushort> right);

    public static partial class VL
    {
        public static Vector128<byte> CompareGreaterThan       (Vector128<byte> left, Vector128<byte> right);
        public static Vector128<byte> CompareGreaterThanOrEqual(Vector128<byte> left, Vector128<byte> right);
        public static Vector128<byte> CompareLessThan          (Vector128<byte> left, Vector128<byte> right);
        public static Vector128<byte> CompareLessThanOrEqual   (Vector128<byte> left, Vector128<byte> right);
        public static Vector128<byte> CompareNotEqual          (Vector128<byte> left, Vector128<byte> right);
        public static Vector256<byte> CompareGreaterThan       (Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareGreaterThanOrEqual(Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareLessThan          (Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareLessThanOrEqual   (Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareNotEqual          (Vector256<byte> left, Vector256<byte> right);

        public static Vector128<short> CompareGreaterThanOrEqual(Vector128<short> left, Vector128<short> right);
        public static Vector128<short> CompareLessThan          (Vector128<short> left, Vector128<short> right);
        public static Vector128<short> CompareLessThanOrEqual   (Vector128<short> left, Vector128<short> right);
        public static Vector128<short> CompareNotEqual          (Vector128<short> left, Vector128<short> right);
        public static Vector256<short> CompareGreaterThanOrEqual(Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareLessThan          (Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareLessThanOrEqual   (Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareNotEqual          (Vector256<short> left, Vector256<short> right);

        public static Vector128<sbyte> CompareGreaterThanOrEqual(Vector128<sbyte> left, Vector128<sbyte> right);
        public static Vector128<sbyte> CompareLessThan          (Vector128<sbyte> left, Vector128<sbyte> right);
        public static Vector128<sbyte> CompareLessThanOrEqual   (Vector128<sbyte> left, Vector128<sbyte> right);
        public static Vector128<sbyte> CompareNotEqual          (Vector128<sbyte> left, Vector128<sbyte> right);
        public static Vector256<sbyte> CompareGreaterThanOrEqual(Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareLessThan          (Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareLessThanOrEqual   (Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareNotEqual          (Vector256<sbyte> left, Vector256<sbyte> right);

        public static Vector128<ushort> CompareGreaterThan       (Vector128<ushort> left, Vector128<ushort> right);
        public static Vector128<ushort> CompareGreaterThanOrEqual(Vector128<ushort> left, Vector128<ushort> right);
        public static Vector128<ushort> CompareLessThan          (Vector128<ushort> left, Vector128<ushort> right);
        public static Vector128<ushort> CompareLessThanOrEqual   (Vector128<ushort> left, Vector128<ushort> right);
        public static Vector128<ushort> CompareNotEqual          (Vector128<ushort> left, Vector128<ushort> right);
        public static Vector256<ushort> CompareGreaterThan       (Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareGreaterThanOrEqual(Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareLessThan          (Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareLessThanOrEqual   (Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareNotEqual          (Vector256<ushort> left, Vector256<ushort> right);
    }
}

public static partial class Avx512DQ
{
    public static Vector512<double> Classify(Vector512<double> value, [ConstantExpected] byte control);
    public static Vector512<float>  Classify(Vector512<float>  value, [ConstantExpected] byte control);

    public static Vector128<double> ClassifyScalar(Vector128<double> value, [ConstantExpected] byte control);
    public static Vector128<float>  ClassifyScalar(Vector128<float>  value, [ConstantExpected] byte control);

    public static int MoveMask(Vector128<short>  value);
    public static int MoveMask(Vector128<ushort> value);
    public static int MoveMask(Vector256<int>    value);
    public static int MoveMask(Vector256<uint>   value);
    public static int MoveMask(Vector512<double> value);
    public static int MoveMask(Vector512<long>   value);
    public static int MoveMask(Vector512<ulong>  value);

    public static partial class VL
    {
        public static Vector128<double> Classify(Vector128<double> value, [ConstantExpected] byte control);
        public static Vector128<float>  Classify(Vector128<float>  value, [ConstantExpected] byte control);
        public static Vector256<double> Classify(Vector256<double> value, [ConstantExpected] byte control);
        public static Vector256<float>  Classify(Vector256<float>  value, [ConstantExpected] byte control);
    }
}

public abstract class Avx512Vbmi2 : Avx512BW
{
    public static new bool IsSupported { get; }

    public static Vector512<byte>   Compress(Vector512<byte>   value, Vector512<byte>   mask);
    public static Vector512<short>  Compress(Vector512<short>  value, Vector512<short>  mask);
    public static Vector512<sbyte>  Compress(Vector512<sbyte>  value, Vector512<sbyte>  mask);
    public static Vector512<ushort> Compress(Vector512<ushort> value, Vector512<ushort> mask);

    public static Vector512<byte>   Expand(Vector512<byte>   value, Vector512<byte>   mask);
    public static Vector512<short>  Expand(Vector512<short>  value, Vector512<short>  mask);
    public static Vector512<sbyte>  Expand(Vector512<sbyte>  value, Vector512<sbyte>  mask);
    public static Vector512<ushort> Expand(Vector512<ushort> value, Vector512<ushort> mask);

    public abstract class VL : Avx512BW.VL
    {
        public static new bool IsSupported { get; }

        public static Vector128<byte>   Compress(Vector128<byte>   value, Vector128<byte>   mask);
        public static Vector128<short>  Compress(Vector128<short>  value, Vector128<short>  mask);
        public static Vector128<sbyte>  Compress(Vector128<sbyte>  value, Vector128<sbyte>  mask);
        public static Vector128<ushort> Compress(Vector128<ushort> value, Vector128<ushort> mask);
        public static Vector256<byte>   Compress(Vector256<byte>   value, Vector256<byte>   mask);
        public static Vector256<short>  Compress(Vector256<short>  value, Vector256<short>  mask);
        public static Vector256<sbyte>  Compress(Vector256<sbyte>  value, Vector256<sbyte>  mask);
        public static Vector256<ushort> Compress(Vector256<ushort> value, Vector256<ushort> mask);

        public static Vector128<byte>   Expand(Vector128<byte>   value, Vector128<byte>   mask);
        public static Vector128<short>  Expand(Vector128<short>  value, Vector128<short>  mask);
        public static Vector128<sbyte>  Expand(Vector128<sbyte>  value, Vector128<sbyte>  mask);
        public static Vector128<ushort> Expand(Vector128<ushort> value, Vector128<ushort> mask);
        public static Vector256<byte>   Expand(Vector256<byte>   value, Vector256<byte>   mask);
        public static Vector256<short>  Expand(Vector256<short>  value, Vector256<short>  mask);
        public static Vector256<sbyte>  Expand(Vector256<sbyte>  value, Vector256<sbyte>  mask);
        public static Vector256<ushort> Expand(Vector256<ushort> value, Vector256<ushort> mask);
    }

    public abstract class X64 : Avx512BW.X64
    {
        public static new bool IsSupported { get; }
    }
}
MineCake147E commented 1 year ago

I have some questions about this.

First, in this design, how could I construct a mask register for variable integers representing mask bits?

Second, would

static Vector512<byte> A(Vector512<byte> zmm0, Vector512<byte> zmm1)
{
    var mask = Avx512BW.CompareGreaterThan(zmm0, zmm1);
    return mask & zmm1;
}

be optimized like the code below?

vpcmpeqb k1, zmm0, zmm1
vmovdqu8 zmm0 {k1}{z}, zmm1
ret

Third, could I pass a mask register to a method within either a general-purpose register, or more preferably, a mask register?

tannergooding commented 1 year ago

Not everything will land in .NET 8 due to time constraints. Some of the mask related support and handling is going to land in .NET 9 instead.

Just to be clear, this would've been true regardless of whether we kept with VectorMask<T> or we went with the new approach. AVX-512 is a very large set of functionality and squeezing it all into 1 release just wasn't possible.

First, in this design, how could I construct a mask register for variable integers representing mask bits?

AVX-512 does not have any "built-in" functionality for creating a mask from a constant. So for most scenarios, you want to get a mask from an instruction that produces a mask, such as a comparison instruction.

If you really have to create it from a constant, you're at best getting either of following bits of codegen:

; Load literal into general purpose register, then move into mask register
mov rax, imm64
kmov k1, rax

-or-

; Load constant into simd register, then convert to a mask register
vmovups zmm0, [addr]
vpmovb2m k1, zmm0

APIs that expect specific parameters to be a mask already emit the relevant conversion and so using Vector512.Create(cns, ..., cns) is the way to go. It matches how you'd write the same algorithm for Vector128/Vector256. In the future we may have additional recognition for specific patterns and try to optimize them further if possible.

Second, would ... be optimized like the code below?

For .NET 8 it will generate something like:

vpcmpeqb k1, zmm0, zmm1
vpmovm2b zmm0, k1
vpandd zmm0, zmm0, zmm1

Noting that this is taking into account the default Windows x64 calling convention where the first arg is passed in rcx, the second in rdx, and the third in r8. There is a hidden first argument being the return buffer since this is a large struct return.

For .NET 9, we'll likely get it to:

vpcmpeqb k1, zmm0, zmm1
vpandd zmm0 {k1}{z}, zmm1, zmm1

This just removes the one vpmovm2b instruction that converts from "mask register" back to "vector register". It would saves 6 bytes of codegen and 3 cycles.

Third, could I pass a mask register to a method within either a general-purpose register, or more preferably, a mask register?

No platform has argument passing for kmask registers, they are all considered caller trash and must be saved by the callee. The only way to pass them is in memory or by converting to an int/vector. This means you're ultimately paying a conversion price of 1-3 cycles on each side, depending on the base type used (byte/short are 3 cycles, int/long are 1 cycle).

With this new design, everything is just exposed to the user as a vector and so passing it as a vector, much as you would have done for Vector128/Vector256 is the way to go. We may look at providing something that allows doing what is functionally the inverse of ExtractMostSignificantBits, and to create a vector from a bitmask. Such an API would end up being 2-3 instructions on older hardware, however, and is a much more rarely needed scenario.

MineCake147E commented 1 year ago

AVX-512 does not have any "built-in" functionality for creating a mask from a constant.

Sorry for confusion. I meant that I want a way to emit kmovq k1, rax instruction for a variable rax. For example,

static Vector512<byte> A(Vector512<byte> source, Random r)
{
    long a = r.NextInt64(int.MinValue, int.MaxValue + 1L);
    a |= r.NextInt64(int.MinValue, int.MaxValue + 1L) << 32;
    var b = Avx512BW.CreateMask((ulong)a).AsByte();
    return Avx512BW.BlendVariable(Vector512<byte>.Zero, source, b);
}
tannergooding commented 1 year ago

For .NET 8 we did land BlendVariable and the Compare APIs which were the most critical. We also have the general functionality for MoveMask available via the xplat ExtractMostSignificantBits APIs, and the general pattern recognition that will emit vptestm. We didn't, however, land Compress/Expand, Gather/Scatter, MaskLoad/MaskStore, the platform specific MoveMask or Test APIs:

namespace System.Runtime.Intrinsics.X86;

public static partial class Avx512F
{

    public static Vector512<double> Compress(Vector512<double> value, Vector512<double> mask);
    public static Vector512<int>    Compress(Vector512<int>    value, Vector512<int>    mask);
    public static Vector512<long>   Compress(Vector512<long>   value, Vector512<long>   mask);
    public static Vector512<float>  Compress(Vector512<float>  value, Vector512<float>  mask);
    public static Vector512<uint>   Compress(Vector512<uint>   value, Vector512<uint>   mask);
    public static Vector512<ulong>  Compress(Vector512<ulong>  value, Vector512<ulong>  mask);

    public static Vector512<double> Expand(Vector512<double> value, Vector512<double> mask);
    public static Vector512<int>    Expand(Vector512<int>    value, Vector512<int>    mask);
    public static Vector512<long>   Expand(Vector512<long>   value, Vector512<long>   mask);
    public static Vector512<float>  Expand(Vector512<float>  value, Vector512<float>  mask);
    public static Vector512<uint>   Expand(Vector512<uint>   value, Vector512<uint>   mask);
    public static Vector512<ulong>  Expand(Vector512<ulong>  value, Vector512<ulong>  mask);

    public static unsafe Vector512<double> GatherMaskVector512(Vector512<double> source, double* baseAddress, Vector512<int> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherMaskVector512(Vector512<int>    source, int*    baseAddress, Vector512<int> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherMaskVector512(Vector512<long>   source, long*   baseAddress, Vector512<int> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherMaskVector512(Vector512<float>  source, float*  baseAddress, Vector512<int> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherMaskVector512(Vector512<uint>   source, uint*   baseAddress, Vector512<int> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherMaskVector512(Vector512<ulong>  source, ulong*  baseAddress, Vector512<int> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> GatherMaskVector512(Vector512<double> source, double* baseAddress, Vector512<long> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherMaskVector512(Vector512<int>    source, int*    baseAddress, Vector512<long> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherMaskVector512(Vector512<long>   source, long*   baseAddress, Vector512<long> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherMaskVector512(Vector512<uint>   source, uint*   baseAddress, Vector512<long> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherMaskVector512(Vector512<float>  source, float*  baseAddress, Vector512<long> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherMaskVector512(Vector512<ulong>  source, ulong*  baseAddress, Vector512<long> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> GatherVector512(double* baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherVector512(int*    baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherVector512(long*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherVector512(float*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherVector512(uint*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherVector512(ulong*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> GatherVector512(double* baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherVector512(int*    baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherVector512(long*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherVector512(float*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherVector512(uint*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherVector512(ulong*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> MaskLoad(double* address, Vector512<double> mask);
    public static unsafe Vector512<int>    MaskLoad(int*    address, Vector512<int>    mask);
    public static unsafe Vector512<long>   MaskLoad(long*   address, Vector512<long>   mask);
    public static unsafe Vector512<float>  MaskLoad(float*  address, Vector512<float>  mask);
    public static unsafe Vector512<uint>   MaskLoad(uint*   address, Vector512<uint>   mask);
    public static unsafe Vector512<ulong>  MaskLoad(ulong*  address, Vector512<ulong>  mask);

    public static unsafe void MaskStore(double* address, Vector512<double> mask, Vector512<double> source);
    public static unsafe void MaskStore(int*    address, Vector512<int>    mask, Vector512<int>    source);
    public static unsafe void MaskStore(long*   address, Vector512<long>   mask, Vector512<long>   source);
    public static unsafe void MaskStore(float*  address, Vector512<float>  mask, Vector512<float>  source);
    public static unsafe void MaskStore(uint*   address, Vector512<uint>   mask, Vector512<uint>   source);
    public static unsafe void MaskStore(ulong*  address, Vector512<ulong>  mask, Vector512<ulong>  source);

    public static int MoveMask(Vector256<short>  value);
    public static int MoveMask(Vector256<ushort> value);
    public static int MoveMask(Vector512<int>    value);
    public static int MoveMask(Vector512<float>  value);
    public static int MoveMask(Vector512<uint>   value);

    public static unsafe void ScatterMaskVector512(Vector512<double> value, double* baseAddress, Vector512<int> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<int>    value, int*    baseAddress, Vector512<int> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<long>   value, long*   baseAddress, Vector512<int> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<float>  value, float*  baseAddress, Vector512<int> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<uint>   value, uint*   baseAddress, Vector512<int> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<int> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe void ScatterMaskVector512(Vector512<double> value, double* baseAddress, Vector512<long> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<int>    value, int*    baseAddress, Vector512<long> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<long>   value, long*   baseAddress, Vector512<long> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<float>  value, uint*   baseAddress, Vector512<long> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<uint>   value, float*  baseAddress, Vector512<long> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<long> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe void ScatterVector512(Vector512<double> value, double* baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<int>    value, int*    baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<long>   value, long*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<float>  value, float*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<uint>   value, uint*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe void ScatterVector512(Vector512<double> value, double* baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<int>    value, int*    baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<long>   value, long*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<float>  value, float*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<uint>   value, uint*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static bool TestC(Vector512<double> left, Vector512<double> right);
    public static bool TestC(Vector512<int>    left, Vector512<int>    right);
    public static bool TestC(Vector512<long>   left, Vector512<long>   right);
    public static bool TestC(Vector512<float>  left, Vector512<float>  right);
    public static bool TestC(Vector512<uint>   left, Vector512<uint>   right);
    public static bool TestC(Vector512<ulong>  left, Vector512<ulong>  right);

    public static bool TestNotZAndNotC(Vector512<double> left, Vector512<double> right);
    public static bool TestNotZAndNotC(Vector512<int>    left, Vector512<int>    right);
    public static bool TestNotZAndNotC(Vector512<long>   left, Vector512<long>   right);
    public static bool TestNotZAndNotC(Vector512<float>  left, Vector512<float>  right);
    public static bool TestNotZAndNotC(Vector512<uint>   left, Vector512<uint>   right);
    public static bool TestNotZAndNotC(Vector512<ulong>  left, Vector512<ulong>  right);

    public static bool TestZ(Vector512<double> left, Vector512<double> right);
    public static bool TestZ(Vector512<int>    left, Vector512<int>    right);
    public static bool TestZ(Vector512<long>   left, Vector512<long>   right);
    public static bool TestZ(Vector512<float>  left, Vector512<float>  right);
    public static bool TestZ(Vector512<uint>   left, Vector512<uint>   right);
    public static bool TestZ(Vector512<ulong>  left, Vector512<ulong>  right);

    public static partial class VL
    {
        public static Vector128<double> Compress(Vector128<double> value, Vector128<double> mask);
        public static Vector128<int>    Compress(Vector128<int>    value, Vector128<int>    mask);
        public static Vector128<long>   Compress(Vector128<long>   value, Vector128<long>   mask);
        public static Vector128<float>  Compress(Vector128<float>  value, Vector128<float>  mask);
        public static Vector128<uint>   Compress(Vector128<uint>   value, Vector128<uint>   mask);
        public static Vector128<ulong>  Compress(Vector128<ulong>  value, Vector128<ulong>  mask);
        public static Vector256<double> Compress(Vector256<double> value, Vector256<double> mask);
        public static Vector256<int>    Compress(Vector256<int>    value, Vector256<int>    mask);
        public static Vector256<long>   Compress(Vector256<long>   value, Vector256<long>   mask);
        public static Vector256<float>  Compress(Vector256<float>  value, Vector256<float>  mask);
        public static Vector256<uint>   Compress(Vector256<uint>   value, Vector256<uint>   mask);
        public static Vector256<ulong>  Compress(Vector256<ulong>  value, Vector256<ulong>  mask);

        public static Vector128<double> Expand(Vector128<double> value, Vector128<double> mask);
        public static Vector128<int>    Expand(Vector128<int>    value, Vector128<int>    mask);
        public static Vector128<long>   Expand(Vector128<long>   value, Vector128<long>   mask);
        public static Vector128<float>  Expand(Vector128<float>  value, Vector128<float>  mask);
        public static Vector128<uint>   Expand(Vector128<uint>   value, Vector128<uint>   mask);
        public static Vector128<ulong>  Expand(Vector128<ulong>  value, Vector128<ulong>  mask);
        public static Vector256<double> Expand(Vector256<double> value, Vector256<double> mask);
        public static Vector256<int>    Expand(Vector256<int>    value, Vector256<int>    mask);
        public static Vector256<long>   Expand(Vector256<long>   value, Vector256<long>   mask);
        public static Vector256<float>  Expand(Vector256<float>  value, Vector256<float>  mask);
        public static Vector256<uint>   Expand(Vector256<uint>   value, Vector256<uint>   mask);
        public static Vector256<ulong>  Expand(Vector256<ulong>  value, Vector256<ulong>  mask);

        public static unsafe void ScatterMaskVector128(Vector128<double> value, double* baseAddress, Vector128<int> index, Vector128<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<int>    value, int*    baseAddress, Vector128<int> index, Vector128<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<long>   value, long*   baseAddress, Vector128<int> index, Vector128<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<float>  value, float*  baseAddress, Vector128<int> index, Vector128<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<uint>   value, uint*   baseAddress, Vector128<int> index, Vector128<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<int> index, Vector128<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<double> value, double* baseAddress, Vector256<int> index, Vector256<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<int>    value, int*    baseAddress, Vector256<int> index, Vector256<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<long>   value, long*   baseAddress, Vector256<int> index, Vector256<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<float>  value, float*  baseAddress, Vector256<int> index, Vector256<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<uint>   value, uint*   baseAddress, Vector256<int> index, Vector256<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<int> index, Vector256<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

        public static unsafe void ScatterMaskVector128(Vector128<double> value, double* baseAddress, Vector128<long> index, Vector128<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<int>    value, int*    baseAddress, Vector128<long> index, Vector128<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<long>   value, long*   baseAddress, Vector128<long> index, Vector128<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<float>  value, uint*   baseAddress, Vector128<long> index, Vector128<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<uint>   value, float*  baseAddress, Vector128<long> index, Vector128<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<long> index, Vector128<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<double> value, double* baseAddress, Vector256<long> index, Vector256<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<int>    value, int*    baseAddress, Vector256<long> index, Vector256<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<long>   value, long*   baseAddress, Vector256<long> index, Vector256<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<float>  value, uint*   baseAddress, Vector256<long> index, Vector256<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<uint>   value, float*  baseAddress, Vector256<long> index, Vector256<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<long> index, Vector256<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

        public static unsafe void ScatterVector128(Vector128<double> value, double* baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<int>    value, int*    baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<long>   value, long*   baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<float>  value, float*  baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<uint>   value, uint*   baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<double> value, double* baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<int>    value, int*    baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<long>   value, long*   baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<float>  value, float*  baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<uint>   value, uint*   baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

        public static unsafe void ScatterVector128(Vector128<double> value, double* baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<int>    value, int*    baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<long>   value, long*   baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<float>  value, float*  baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<uint>   value, uint*   baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<double> value, double* baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<int>    value, int*    baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<long>   value, long*   baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<float>  value, float*  baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<uint>   value, uint*   baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    }
}

public static partial class Avx512BW
{
    public static int MoveMask(Vector512<short>  value);
    public static int MoveMask(Vector512<ushort> value);

    public static long MoveMask(Vector512<byte>  value);
    public static long MoveMask(Vector512<sbyte> value);

    public static bool TestC(Vector512<byte>   left, Vector512<byte>   right);
    public static bool TestC(Vector512<short>  left, Vector512<short>  right);
    public static bool TestC(Vector512<sbyte>  left, Vector512<sbyte>  right);
    public static bool TestC(Vector512<ushort> left, Vector512<ushort> right);

    public static bool TestNotZAndNotC(Vector512<byte>   left, Vector512<byte>   right);
    public static bool TestNotZAndNotC(Vector512<short>  left, Vector512<short>  right);
    public static bool TestNotZAndNotC(Vector512<sbyte>  left, Vector512<sbyte>  right);
    public static bool TestNotZAndNotC(Vector512<ushort> left, Vector512<ushort> right);

    public static bool TestZ(Vector512<byte>   left, Vector512<byte>   right);
    public static bool TestZ(Vector512<short>  left, Vector512<short>  right);
    public static bool TestZ(Vector512<sbyte>  left, Vector512<sbyte>  right);
    public static bool TestZ(Vector512<ushort> left, Vector512<ushort> right);
}

public static partial class Avx512DQ
{
    public static Vector512<double> Classify(Vector512<double> value, [ConstantExpected] byte control);
    public static Vector512<float>  Classify(Vector512<float>  value, [ConstantExpected] byte control);

    public static Vector128<double> ClassifyScalar(Vector128<double> value, [ConstantExpected] byte control);
    public static Vector128<float>  ClassifyScalar(Vector128<float>  value, [ConstantExpected] byte control);

    public static int MoveMask(Vector128<short>  value);
    public static int MoveMask(Vector128<ushort> value);
    public static int MoveMask(Vector256<int>    value);
    public static int MoveMask(Vector256<uint>   value);
    public static int MoveMask(Vector512<double> value);
    public static int MoveMask(Vector512<long>   value);
    public static int MoveMask(Vector512<ulong>  value);

    public static partial class VL
    {
        public static Vector128<double> Classify(Vector128<double> value, [ConstantExpected] byte control);
        public static Vector128<float>  Classify(Vector128<float>  value, [ConstantExpected] byte control);
        public static Vector256<double> Classify(Vector256<double> value, [ConstantExpected] byte control);
        public static Vector256<float>  Classify(Vector256<float>  value, [ConstantExpected] byte control);
    }
}

public abstract class Avx512Vbmi2 : Avx512BW
{
    public static new bool IsSupported { get; }

    public static Vector512<byte>   Compress(Vector512<byte>   value, Vector512<byte>   mask);
    public static Vector512<short>  Compress(Vector512<short>  value, Vector512<short>  mask);
    public static Vector512<sbyte>  Compress(Vector512<sbyte>  value, Vector512<sbyte>  mask);
    public static Vector512<ushort> Compress(Vector512<ushort> value, Vector512<ushort> mask);

    public static Vector512<byte>   Expand(Vector512<byte>   value, Vector512<byte>   mask);
    public static Vector512<short>  Expand(Vector512<short>  value, Vector512<short>  mask);
    public static Vector512<sbyte>  Expand(Vector512<sbyte>  value, Vector512<sbyte>  mask);
    public static Vector512<ushort> Expand(Vector512<ushort> value, Vector512<ushort> mask);

    public abstract class VL : Avx512BW.VL
    {
        public static new bool IsSupported { get; }

        public static Vector128<byte>   Compress(Vector128<byte>   value, Vector128<byte>   mask);
        public static Vector128<short>  Compress(Vector128<short>  value, Vector128<short>  mask);
        public static Vector128<sbyte>  Compress(Vector128<sbyte>  value, Vector128<sbyte>  mask);
        public static Vector128<ushort> Compress(Vector128<ushort> value, Vector128<ushort> mask);
        public static Vector256<byte>   Compress(Vector256<byte>   value, Vector256<byte>   mask);
        public static Vector256<short>  Compress(Vector256<short>  value, Vector256<short>  mask);
        public static Vector256<sbyte>  Compress(Vector256<sbyte>  value, Vector256<sbyte>  mask);
        public static Vector256<ushort> Compress(Vector256<ushort> value, Vector256<ushort> mask);

        public static Vector128<byte>   Expand(Vector128<byte>   value, Vector128<byte>   mask);
        public static Vector128<short>  Expand(Vector128<short>  value, Vector128<short>  mask);
        public static Vector128<sbyte>  Expand(Vector128<sbyte>  value, Vector128<sbyte>  mask);
        public static Vector128<ushort> Expand(Vector128<ushort> value, Vector128<ushort> mask);
        public static Vector256<byte>   Expand(Vector256<byte>   value, Vector256<byte>   mask);
        public static Vector256<short>  Expand(Vector256<short>  value, Vector256<short>  mask);
        public static Vector256<sbyte>  Expand(Vector256<sbyte>  value, Vector256<sbyte>  mask);
        public static Vector256<ushort> Expand(Vector256<ushort> value, Vector256<ushort> mask);
    }

    public abstract class X64 : Avx512BW.X64
    {
        public static new bool IsSupported { get; }
    }
}

These remaining one can be implemented anytime after main opens for .NET 9 changes next month

MineCake147E commented 11 months ago

Without kmovq or kmovd for variable input, how could I utilize something generated by patterns below with ConditionalSelect?

kadd k1, k2 - mask1.ExtractMostSignificantBits() + mask2.ExtractMostSignificantBits() kshiftl k1, k2, imm8 - mask1.ExtractMostSignificantBits() << amount

Both code returns ulong, not Vector512<T>.

tannergooding commented 11 months ago

As per the above, not everything landed in .NET 8.

Right now, Vector512.Equals(x, y) + Vector512.Equals(z, w) would produce a kadd instruction, but the support around ExtractMostsignificantBits isn't there. We'll get to it in .NET 9 instead. We likewise don't have the support to generate kshift for such cases at the moment.

If you were to do something like:

Avx512F.BlendVariable(x, y, Vector512.Equals(x, y) + Vector512.Equals(z, w))

You would get the generally expected codegen:

vpcmpeqd k1, zmm0, zmm1
vpcmpeqd k2, zmm2, zmm3
kaddw    k1, k1, k2
vpblendmd zmm0 {k1}, zmm0, zmm1
MineCake147E commented 11 months ago

Right now, Vector512.Equals(x, y) + Vector512.Equals(z, w) would produce a kadd instruction

Will the same apply to the Vector256? If so, it'll be a huge breaking change.

If you were to do something like:

var ymm4 = Vector256.Equals(ymm0, ymm1);
var ymm5 = Vector256.Equals(ymm2, ymm3);
ymm4 += ymm5;
return ymm4 & ymm0;

.NET 7 would generate the code that I expect, which would be something like:

vpcmpeqd ymm4, ymm0, ymm1
vpcmpeqd ymm5, ymm2, ymm3
vpaddd ymm4, ymm4, ymm5
vpand ymm0, ymm0, ymm4

But if I understand correctly, .NET 8 on CPUs with AVX-512 support may generate something like:

vpcmpeqd k1, ymm0, ymm1
vpcmpeqd k2, ymm2, ymm3
kaddb k1, k1, k2
vpmovm2d ymm4, k1
vpand ymm0, ymm0, ymm4

which produces completely different results. EDIT: As of .NET 8.0 RC 1, it turns out to be not the case for Vector256<byte> even though I used Avx2.BlendVariable instead of Avx2.And.

[MethodImpl(MethodImplOptions.AggressiveOptimization)]
private static Vector256<byte> Test1(Vector256<byte> ymm0, Vector256<byte> ymm1, Vector256<byte> ymm2, Vector256<byte> ymm3)
    => Avx2.BlendVariable(ymm0, ymm2, Vector256.Equals(ymm0, ymm1) + Vector256.Equals(ymm2, ymm3));
00007FFBABF4D8F8  vmovups     ymm0,ymmword ptr [rdx]  
00007FFBABF4D8FC  vmovups     ymm1,ymmword ptr [r9]  
00007FFBABF4D901  vpcmpeqb    ymm2,ymm0,ymmword ptr [r8]  
00007FFBABF4D906  vpcmpeqb    ymm3,ymm1,ymmword ptr [rax]  
00007FFBABF4D90A  vpaddb      ymm2,ymm2,ymm3  
00007FFBABF4D90E  vpblendvb   ymm0,ymm0,ymm1,ymm2  
00007FFBABF4D914  vmovups     ymmword ptr [rcx],ymm0  
00007FFBABF4D918  mov         rax,rcx  

But it is the case for Avx512BW.BlendVariable:

[MethodImpl(MethodImplOptions.AggressiveOptimization)]
private static Vector512<byte> Test512(Vector512<byte> zmm0, Vector512<byte> zmm1, Vector512<byte> zmm2, Vector512<byte> zmm3)
    => Avx512BW.BlendVariable(zmm0, zmm2, Vector512.Equals(zmm0, zmm1) + Vector512.Equals(zmm2, zmm3));
00007FFBABF607B8  vmovups     zmm0,zmmword ptr [rdx]  
00007FFBABF607BE  vmovups     zmm1,zmmword ptr [r9]  
00007FFBABF607C4  vpcmpeqb    k1,zmm0,zmmword ptr [r8]  
00007FFBABF607CA  vpcmpeqb    k2,zmm1,zmmword ptr [rax]  
; My expectation
****************  vpmovm2b    zmm2,k1
****************  vpmovm2b    zmm3,k2
****************  vpaddb      zmm2,zmm2,zmm3
****************  vpmovb2m    k1,zmm2
****************  vpblendmb   zmm0{k1},zmm0,zmm1  
****************  vmovups     zmmword ptr [rcx],zmm0  
****************  mov         rax,rcx  
; Actual
00007FFBABF607D0  kaddq       k1,k1,k2
00007FFBABF607D5  vpblendmb   zmm0{k1},zmm0,zmm1  
00007FFBABF607DB  vmovups     zmmword ptr [rcx],zmm0  
00007FFBABF607E1  mov         rax,rcx  

And it introduced a behavioral inconsistency between Vector256<byte> and Vector512<byte>.

I personally think that AVX-512 masking support could be provided in slightly more machine-friendly way, like exposing the size of mask register, for example Avx512VectorMask<ulong> for Vector512<byte>, Avx512VectorMask<uint> for Vector512<ushort> and Vector256<byte>, and so on.

MadProbe commented 8 months ago

Is there a way to convert the integer that I have computed for use as a mask to a vector to pass it into methods which accept a mask represented as VectorXXX<XXX>?

tannergooding commented 8 months ago

There isn't currently a way to create a mask from an integer value (and no instruction to do this either, it at best would be mov rax, imm; kmov k0, rax)

Such an API would be reasonably to define and expose in Avx512F

MadProbe commented 8 months ago

There isn't currently a way to create a mask from an integer value (and no instruction to do this either, it at best would be mov rax, imm; kmov k0, rax)

Such an API would be reasonably to define and expose in Avx512F

It would be best to expose this as it would severely limit usability of new mask APIs otherwise and make it clunky to use in cases like conditionally loading only the last N values.

tannergooding commented 8 months ago

It notably works as expected when you simply use AllBitsSet and Zero on a per element basis, which is exactly what you'd need to do downlevel.

I'm fine with the general concept, however. Someone would need to open an API proposal covering the 4 mask variants.

MadProbe commented 8 months ago

Would this be something like the older closed one without all the overloads with zero & write masks of already existing methods?

tannergooding commented 8 months ago

It’d be an api proposal, but with a signature that looks something like Vector512<byte> CreateVector512Mask(long mostSignificantBits) and expanded to the full appropriate set. They would be in Avx512F or another relevant ISA depending on which is appropriate for kmovb/w/d/q

MadProbe commented 7 months ago

I still think that exposing vector masks as VectorMask<T> where T is the baking storage type for the mask as it provides much more control over what compiler does so one wouldn't need to heavily rely on compiler optimizations or one can be sure that it will do stuff one wants one's code to actually do. Also there's no need for implementing all the variants of instructions as were described in the earlier iterations of this proposal and there would also be no need for exposing VectorMaskXXX<T> where XXX is vector's size.

This is, I think, the closest to hardware way of doing masking and it is very minimal in its nature.

The masks would be usable in VectorXXX.ConditionalSelect functions and in masked vector loading & storing functions just like in this proposal, but the checks of minimal baking storage size of mask would be placed in VectorXXX functions, so amount of elements in mask would be always greater than or equal to the amount of values in VectorXXX<T>. And if there are more elements in the mask than needed, just ignore the unneeded upper bits as it's no use to check the unneeded upper bits to be zero as it would only make this more complicated than it should have been.

Also I propose to expose native functions in Avx512DQ & Avx512BW for masked loading & storing functions as well as blending functions as they also use masks in corresponding instructions.

This is, I think, the closest to hardware way of doing masking and it is very minimal in its nature.

MadProbe commented 7 months ago

Could I make proposal, that I have described above, with all necessary API definitions and explainers instead of doing mask-to-vector broadcast proposal, Tanner Gooding?

tannergooding commented 7 months ago

Could I make proposal, that I have described above, with all necessary API definitions and explainers instead of doing mask-to-vector broadcast proposal, Tanner Gooding?

You're free to make a proposal, but it's not going to provide what you think it will provide and I don't see it moving forward at this point in time.

While we could indeed have exposed a VectorMask64/128/256/512<T> and VectorMask<T> type and then only exposed a subset of APIs that "must" take the mask (like ConditionalSelect) or return a mask (like GreaterThan), it ultimately isn't "better". It still relies very heavily on the JIT to do pattern recognition for optimizations, still relies very heavily on users updating their code manually, and adds additional complexity/overhead that will lead users into a pit of failure.

We opted for the path we did because it massively simplifies the implementation, provides the greatest impact to both new and existing code, reduces the throughput impact this already niche feature has on the JIT, makes it more pay to play, meshes nicely with other patterns we already rely on optimizations to light up for (such as embedded loads/stores, which C/C++ also relies on compiler opts around), actively reduces the total set of pattern recognition we need to do compared to the alternative, and most importantly because we have a decently high level of confidence the JIT can be made to generate the optimal code in the vast majority of scenarios; enough so that users shouldn't even care about the nuance in practice.

The only downside was that we didn't finish all the work in .NET 8 and we will need to finish it up in .NET 9 as well as continue to improve it over time instead.

where T is the baking storage type

This won't work. Vector128<byte> requires a mask with 16-bits and therefore VectorMask128<byte> must make at least 16-bits available. If we used byte as the backing storage, we'd only have 8-bits available. This also adds significant complexity into how it would work from a cross platform perspective, complexity into the overloads exposed, overhead in the conversions between VectorMask types, and more JIT strain in it needing to handle small types.


At the end of the day, we have a few considerations. But the most important two are...

1. What happens when existing code is run on the latest hardware

Just due to how software works, the considerations of targeting multiple platforms (Windows, Linux, MacOS, Android, iOS; x64, Arm64, WASM, RISC-V, LoongArch, etc), and that newer ISAs are less common (both in terms of hardware support and in terms of code that has paths for it), existing code is very important. We will always have far more code that targets Vector128 than anything else. We will frequently have users who are more than willing to give up a tiny bit of perf in favor of maintainability, portability, reusability, etc.

This first consideration is one of the reasons why cross platform API surface is so important. There will always be some libraries that want to write a specific code path per platform/architecture/ISA. There may even be some that are willing to do micro-architecture specific optimizations. But those are ultimately a minority compared to the other set where they will simply write a Vector128<T> code path and have it reused across multiple platforms.

Because of that, we get the most benefit out of doing some pattern recognition for existing code and then having that light up on the latest hardware. We see this particularly where these patterns are simple to do, like ConditionalSelect(mask, left, right) and so the light-up recognition we currently rely on for masking support is effectively a must have.

2. What happens when the newer types are used on downlevel hardware

Providing APIs that power users can utilize to fine tune is also important however. It makes .NET a first class place for users to write such code and helps drive innovation. In most cases that newer support is straightforward as we simply expose the APIs, they throw if not supported, and users don't have to think much about the support.

Where the APIs are exposed in a cross platform way becomes more of a consideration as we then have to consider how it gets used across multiple platforms. For example, we have to consider that ConditionalSelect needs to operate on a bitwise basis and that Shuffle operates on the entire vector (Vector256.Shuffle doesn't operate on 2x128 lanes). This leads to nuance in how the APIs are exposed, how users might be expected to consume them, etc.

In the case of something like VectorMask<T> there are cases where it can have hardware acceleration and cases where it won't. The majority of developers won't want to insert code like the following into their logic, nor would they be able to trivially define helpers for easier reuse. This is even true for devs writing micro-architecture specific opts because of how it can impact the JIT and codegen:

if (VectorMask128.IsHardwareAccelerated)
{ 
    VectorMask128<T> mask = Vector128.GreaterThanMask(x, y);
    return Vector128.MaskedConditionalSelect(mask, z, w);
}
else
{
    Vector128<T> mask = Vector128.GreaterThan(x, y);
    return Vector128.ConditionalSelect(mask, z, w);
}

So, we have to then account that most users want to write only one of the two paths.

If they use Vector128<T> then all their existing code works and on newer hardware it has the capability of opportunistically lighting up to use the new mask registers. In the worst case their writing the code they would have already written and it generates slightly suboptimal code. This is within the realm of normal for most compilers, even C/C++, where compiler opts are heavily relied on in many scenarios and where you may not get exactly what you asked for already.

However, if they use VectorMask128<T>, they need to explicitly change their code to take advantage of it. Additionally, on downlevel hardware we have to functionally treat VectorMask128<T> as Vector128<T>. This gets more complex than treating Vector128<T> as TYP_MASK because we suddenly don't have the hardware capability to do the conversions, we have to rationalize what the size and shape of this non-accelerated type is on the older hardware, and it doesn't mesh naturally with how code would otherwise be written.

xoofx commented 6 months ago

We didn't, however, land Compress/Expand, Gather/Scatter, MaskLoad/MaskStore, the platform specific MoveMask or Test APIs:

Just to confirm, as I don't see it in the list, but will .NET 9 try to expose AVX512 MaskCompressStore? (vpcompressb, vpcompressw, vpcompressd, vpcompresss, vpcompressq)

MichalPetryka commented 6 months ago

We didn't, however, land Compress/Expand, Gather/Scatter, MaskLoad/MaskStore, the platform specific MoveMask or Test APIs:

Just to confirm, as I don't see it in the list, but will .NET 9 try to expose AVX512 MaskCompressStore? (vpcompressb, vpcompressw, vpcompressd, vpcompresss, vpcompressq)

MaskCompressStore should be avoided as it's terribly slow on Zen4.

tannergooding commented 6 months ago

Just to confirm, as I don't see it in the list, but will .NET 9 try to expose AVX512 MaskCompressStore? (vpcompressb, vpcompressw, vpcompressd, vpcompresss, vpcompressq)

If there's any that were missed (and I know there were a small handful, namely around masking), we'd need an explicit API proposal requesting them. There shouldn't be anything blocking us from adding them once the proposal is up/approved, however.