dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.96k stars 4.65k forks source link

Fast path for {U}Int128.operator % when the modulus is within UInt64 #106915

Open hez2010 opened 3 weeks ago

hez2010 commented 3 weeks ago

Description

The modulus operator for {U}Int128 is too slow:

public static UInt64 ModMul_1(UInt64 first, UInt64 second, UInt64 modulus)
{
    return unchecked((UInt64)((Math.BigMul(first, second) % modulus));
}

public static UInt64 ModMul_2(UInt64 first, UInt64 second, UInt64 modulus)
{
    return unchecked((UInt64)(((UInt128)first * second) % modulus));
}

Instead we can provide a fast path by using DivRem for cases where modulus falls in the range of UInt64. For example:

public static UInt128 operator %(UInt128 left, UInt128 right)
{
    if (X86Base.X64.IsSupported)
    {
        if (right._upper == 0)
        {
            return X86Base.X64.DivRem(left._lower, left._upper, right._lower).Remainder;
        }
    }

    return left % right;
}

Configuration

.NET 9 Preview 7

Regression?

No

Data

public class ModMulModule
{
    [Benchmark]
    [Arguments(0x123456789ABCDEF0, 0x123456789ABCDEF0, 0x123456789ABCDEF0)]
    public UInt64 ModMul_Slow(UInt64 first, UInt64 second, UInt64 modulus)
    {
        return unchecked((UInt64)(((UInt128)first * second) % modulus));
    }

    [Benchmark]
    [Arguments(0x123456789ABCDEF0, 0x123456789ABCDEF0, 0x123456789ABCDEF0)]
    public UInt64 ModMul_Fast(UInt64 first, UInt64 second, UInt64 modulus)
    {
        return unchecked((UInt64)Mod((UInt128)first * second, modulus));
    }

    public static UInt128 Mod(UInt128 left, UInt128 right)
    {
        if (GetUpper(in right) == 0)
        {
            if (X86Base.X64.IsSupported)
            {
                return X86Base.X64.DivRem(GetLower(in left), GetUpper(in left), GetLower(in right)).Remainder;
            }
        }

        return left % right;
    }

    [UnsafeAccessor(UnsafeAccessorKind.Field, Name = "_upper")]
    static extern ref UInt64 GetUpper(ref readonly UInt128 value);

    [UnsafeAccessor(UnsafeAccessorKind.Field, Name = "_lower")]
    static extern ref UInt64 GetLower(ref readonly UInt128 value);
}

BenchmarkDotNet v0.14.0, Windows 11 (10.0.26100.1457) 13th Gen Intel Core i7-13700K, 1 CPU, 24 logical and 16 physical cores .NET SDK 9.0.100-preview.7.24407.12 [Host] : .NET 9.0.0 (9.0.24.40507), X64 RyuJIT AVX2 DefaultJob : .NET 9.0.0 (9.0.24.40507), X64 RyuJIT AVX2

Method Mean Error StdDev
ModMul_Slow 15.2984 ns 0.0508 ns 0.0475 ns
ModMul_Fast 0.4423 ns 0.0045 ns 0.0042 ns
tannergooding commented 3 weeks ago

The issue here isn't really that there is no "fast path", it's that it's implemented as left - (left / right) * right when instead it should be computing the remainder as part of the division, where its essentially "free".

The cost is significantly higher with the multi-step process because we have to fallback to a full division and multiplication algorithm.

dotnet-policy-service[bot] commented 3 weeks ago

Tagging subscribers to this area: @dotnet/area-system-numerics See info in area-owners.md if you want to be subscribed.