[API Proposal]: Expose System.Runtime.Intrinsics.X86.Aes256 and Aes512

e4m2 commented 1 year ago

Background and motivation

On some newer x86 CPUs VAES provides wider variants of encoding/decoding included in the older AES instruction set.

The 256-bit VEX-encoded variant (effectively operating on 2 AES blocks in parallel using a single instruction) has a separate CPUID flag and is not dependent on AVX512 support. Additionally, if AVX512F is supported, a 512-bit EVEX-encoded variant is available. As expected, EVEX-encoded 128 and 256-bit variants are available if AVX512VL is supported.

API Proposal

namespace System.Runtime.Intrinsics.X86;

[Intrinsic]
[CLSCompliant(false)]
public abstract class Aes256 : Aes
{
    internal Aes256() { }

    // This would depend on the VAES CPUID bit for VEX encoding and VAES + AVX512VL for EVEX
    public static new bool IsSupported { get => IsSupported; }

    [Intrinsic]
    public new abstract class X64 : Aes.X64
    {
        internal X64() { }

        public static new bool IsSupported { get => IsSupported; }
    }

    /// <summary>
    /// __m256i _mm256_aesdec_epi128(__m256i a, __m256i RoundKey)
    ///   VAESDEC ymm1, ymm2, ymm3/m256
    /// </summary>
    public static Vector256<byte> Decrypt(Vector256<byte> value, Vector256<byte> roundKey);

    /// <summary>
    /// __m256i _mm256_aesdeclast_epi128(__m256i a, __m256i RoundKey)
    ///   VAESDECLAST ymm1, ymm2, ymm3/m256
    /// </summary>
    public static Vector256<byte> DecryptLast(Vector256<byte> value, Vector256<byte> roundKey);

    /// <summary>
    /// __m256i _mm256_aesenc_epi128(__m256i a, __m256i RoundKey)
    ///   VAESENC ymm1, ymm2, ymm3/m256
    /// </summary>
    public static Vector256<byte> Encrypt(Vector256<byte> value, Vector256<byte> roundKey);

    /// <summary>
    /// __m256i _mm256_aesenclast_epi128(__m256i a, __m256i RoundKey)
    ///   VAESENCLAST ymm1, ymm2, ymm3/m256
    /// </summary>
    public static Vector256<byte> EncryptLast(Vector256<byte> value, Vector256<byte> roundKey);
}

[Intrinsic]
[CLSCompliant(false)]
public abstract class Aes512 : Aes
{
    internal Aes512() { }

    // This would depend on the VAES + AVX512F CPUID bits
    public static new bool IsSupported { get => IsSupported; }

    [Intrinsic]
    public new abstract class X64 : Aes.X64
    {
        internal X64() { }

        public static new bool IsSupported { get => IsSupported; }
    }

    /// <summary>
    /// __m512i _mm512_aesdec_epi128(__m512i a, __m512i RoundKey)
    ///   VAESDEC zmm1, zmm2, zmm3/m512
    /// </summary>
    public static Vector512<byte> Decrypt(Vector512<byte> value, Vector512<byte> roundKey);

    /// <summary>
    /// __m512i _mm512_aesdeclast_epi128(__m512i a, __m512i RoundKey)
    ///   VAESDECLAST zmm1, zmm2, zmm3/m512
    /// </summary>
    public static Vector512<byte> DecryptLast(Vector512<byte> value, Vector512<byte> roundKey);

    /// <summary>
    /// __m512i _mm512_aesenc_epi128(__m512i a, __m512i RoundKey)
    ///   VAESENC zmm1, zmm2, zmm3/m512
    /// </summary>
    public static Vector512<byte> Encrypt(Vector512<byte> value, Vector512<byte> roundKey);

    /// <summary>
    /// __m512i _mm512_aesenclast_epi128(__m512i a, __m512i RoundKey)
    ///   VAESENCLAST zmm1, zmm2, zmm3/m512
    /// </summary>
    public static Vector512<byte> EncryptLast(Vector512<byte> value, Vector512<byte> roundKey);
}

Note VAES doesn't include round key assist or inverse mix columns instructions.

API Usage

Same as AES intrinsics, except using wider vector types.

Alternative Designs

No response

Risks

No response

References

https://en.wikichip.org/wiki/x86/vaes https://en.wikipedia.org/wiki/AVX-512#VAES https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#othertechs=VAES

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics See info in area-owners.md if you want to be subscribed.

Issue Details

### Background and motivation On some newer x86 CPUs VAES provides wider variants of encoding/decoding included in the older AES instruction set. The 256-bit VEX-encoded variant (effectively operating on 2 AES blocks in parallel using a single instruction) has a separate CPUID flag and is not dependent on AVX512 support. Additionally, if AVX512F is supported, a 512-bit EVEX-encoded variant is available. As expected, EVEX-encoded 128 and 256-bit variants are available if AVX512VL is supported. ### API Proposal ```csharp namespace System.Runtime.Intrinsics.X86; public abstract class Vaes : Aes { public static new bool IsSupported { get; } public new abstract class X64 : Aes.X64 { public static new bool IsSupported { get; } } public static Vector256 Decrypt(Vector256 value, Vector256 roundKey); public static Vector256 DecryptLast(Vector256 value, Vector256 roundKey); public static Vector256 Encrypt(Vector256 value, Vector256 roundKey); public static Vector256 EncryptLast(Vector256 value, Vector256 roundKey); } public static abstract class Avx512Vaes : Avx512F { public static new bool IsSupported { get; } public new abstract class X64 : Avx512F.X64 { public static new bool IsSupported { get; } } public new abstract class VL : Avx512F.VL { public static new bool IsSupported { get; } public static Vector128 Decrypt(Vector128 value, Vector128 roundKey); public static Vector128 DecryptLast(Vector128 value, Vector128 roundKey); public static Vector256 Encrypt(Vector256 value, Vector256 roundKey); public static Vector256 EncryptLast(Vector256 value, Vector256 roundKey); } public static Vector512 Decrypt(Vector512 value, Vector512 roundKey); public static Vector512 DecryptLast(Vector512 value, Vector512 roundKey); public static Vector512 Encrypt(Vector512 value, Vector512 roundKey); public static Vector512 EncryptLast(Vector512 value, Vector512 roundKey); } ``` Note VAES doesn't include round key assist or inverse mix columns instructions. ### API Usage Same as AES intrinsics, except using wider vector types. # References https://en.wikichip.org/wiki/x86/vaes https://en.wikipedia.org/wiki/AVX-512#VAES https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#othertechs=VAES ### Alternative Designs _No response_ ### Risks _No response_

Author:	e4m2
Assignees:	-
Labels:	`api-suggestion`, `area-System.Runtime.Intrinsics`, `untriaged`
Milestone:	-

MichalPetryka commented 1 year ago

    public new abstract class VL : Avx512F.VL
    {
        public static new bool IsSupported { get; }

        public static Vector128<byte> Decrypt(Vector128<byte> value, Vector128<byte> roundKey);
        public static Vector128<byte> DecryptLast(Vector128<byte> value, Vector128<byte> roundKey);

        public static Vector256<byte> Encrypt(Vector256<byte> value, Vector256<byte> roundKey);
        public static Vector256<byte> EncryptLast(Vector256<byte> value, Vector256<byte> roundKey);
    }

What's the benefit of exposing the EVEX variants separately?

colejohnson66 commented 8 months ago

Technically, VAES and AVX512-F only indicate 512-bit operation; AVX512-VL is required to use 128-bit and 256-bit vectors, hence the dedicated subclass. If you're asking why they exist when the VEX forms exist, it's probably just to allow the user to choose which prefix to use, or for consistency.

tannergooding commented 8 months ago

If you're asking why they exist when the VEX forms exist, it's probably just to allow the user to choose which prefix to use, or for consistency.

Users don't get to pick the prefix, the JIT picks based on the most optimal form. For V512, it's required to use EVEX. For V128/V256 it will pick VEX if only the lower 16 SIMD registers are used. If LSRA must allocate an extended SIMD register (one of the upper 16) or decides that it can take advantage of another EVEX only feature such as embedded broadcast or embedded masking, then it may use EVEX instead (assuming the hardware is capable of course).

We intentionally do not duplicate APIs needlessly, and so we shouldn't need them under Avx512Vaes.VL

Given that, given the future for Avx10, and given what we had previously opted for with VPCLMULQDQ (https://github.com/dotnet/runtime/issues/95772), we should likely name these Aes256 and Aes512, respectively.

However, depending on how we decide to do Avx10, it may be "better" to have these in nested V256/V512 classes under Aes and Pclmulqdq instead.

tannergooding commented 8 months ago

@e4m2, could you update to follow the same general pattern as Pclmulqdq for now and then I can get this reviewed after or as part of the Avx10 work, at which point we'll know the desired pattern?

e4m2 commented 8 months ago

Thanks for the input. Updated!

terrajobst commented 7 months ago

Video

Looks good as proposed, we opted to change the surface area slightly to match the approach approved for Avx10 where we have a nested V256 and V512 class

namespace System.Runtime.Intrinsics.X86;

public abstract class Aes
{
    public abstract class V256
    {
        public static new bool IsSupported { get; }

        public static Vector256<byte> Decrypt(Vector256<byte> value, Vector256<byte> roundKey);
        public static Vector256<byte> DecryptLast(Vector256<byte> value, Vector256<byte> roundKey);
        public static Vector256<byte> Encrypt(Vector256<byte> value, Vector256<byte> roundKey);   
        public static Vector256<byte> EncryptLast(Vector256<byte> value, Vector256<byte> roundKey);
    }

    public abstract class V512
    {
        public static Vector512<byte> Decrypt(Vector512<byte> value, Vector512<byte> roundKey);   
        public static Vector512<byte> DecryptLast(Vector512<byte> value, Vector512<byte> roundKey);
        public static Vector512<byte> Encrypt(Vector512<byte> value, Vector512<byte> roundKey);
        public static Vector512<byte> EncryptLast(Vector512<byte> value, Vector512<byte> roundKey);
    }
}

dotnet / runtime