dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.11k stars 4.7k forks source link

Guid.CreateVersion7 is to slow for my use case, maybe a Version8 could be introduced. #108268

Open swtrse opened 2 weeks ago

swtrse commented 2 weeks ago

Description

I work in an high performance environment and came across the implementation of Guid. This class is one of the most used class in my environment therefore performance is of high priority there. I do have a solution running for .net 8 to create sequential Guids similar to the new CreateVersion7 method.

However, for my use case the current implementation of this method is technically correct but to slow. I did create an alternative implementation for .Net9. My thought is by sharing that implementation here maybe it will give some hints to further optimize the code so I can switch to the framework version without depending on my own implementation in the future.

My solution lacks the possibility to access the internal fields of the Guid structure, so maybe if embedded in the Guid struct directly more optimization is possible.

I did create 2 Methods both have a time base component and can be ordered. CreateVersion7(): Mimics the Framework implementation but adds a counter and is slightly faster. CreateVersion8(): Twice as fast as CreateVersion7, the thing is I am unsure if it does fulfill the Unguessability. Therefore, I made it Version 8. If it fulfulls the unguessability it can replace Version7.

Here are the results of my Benchmarks image

The Benchmark code itself

namespace Guid.Benchmarks;

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNetVisualizer;
using JetBrains.Annotations;
using Matching.Utils;

// ReSharper disable once ClassCanBeSealed.Global
[MaxColumn]
[MemoryDiagnoser(false)]
[MinColumn]
[RichHtmlExporter("Benchmark of UUID Creation",
                  ["Job"],
                  ["Mean", "Allocated"],
                  ["Mean", "Min", "Max", "Allocated"],
                  dividerMode: RenderTableDividerMode.SeparateTables,
                  htmlWrapMode: HtmlDocumentWrapMode.RichDataTables)]
[ShortRunJob(RuntimeMoniker.Net90)]
[UsedImplicitly]
public class BenchmarkSequentialGuidCreation
{
    [Benchmark]
    public Guid BenchmarUuidV4() => Guid.NewGuid(); // UUIDv4

    [Benchmark(Baseline = true)]
    public Guid BenchmarkUuidV7() => Guid.CreateVersion7(); // UUIDv7

    [Benchmark]
    public Guid BenchmarkUuidV7Custom() => SequentialGuid.CreateVersion7(); // UUIDv7

    [Benchmark]
    public Guid BenchmarkUuidV8Custom() => SequentialGuid.CreateVersion8(); // UUIDv7
}

And finally my implementation

namespace My.Utils;

using global::System.Runtime.CompilerServices;
using global::System.Runtime.InteropServices;
using global::System.Runtime.Intrinsics;
using global::System.Security.Cryptography;

/// <summary>
///     <see cref="SequentialGuid" /> generates instances of <see cref="Guid" /> that are in ascending order.
/// </summary>
/// <remarks>
///     It is both unique and ordered.
/// </remarks>
[SkipLocalsInit]
public static class SequentialGuid
{
    private const uint GuidVersion7 = 0x7000;
    private const uint GuidVersion8 = 0x8000;
    private const uint SequenceRollover = 0x1000;
    private const uint Variant = 0x80000000;
    private const uint VariantMask = 0xC0000000;
    private static readonly MapperCd BaseMapperCd;
    private static readonly uint C;
    private static long _lastMilliseconds;
    private static uint _sequence;
    private static SpinLock _spinLock;

    static SequentialGuid()
    {
        _spinLock = new SpinLock(false);
        int rnd = RandomNumberGenerator.GetInt32(int.MaxValue);
        rnd ^= Environment.MachineName.GetHashCode();
        BaseMapperCd.C = (uint)rnd;
        BaseMapperCd.C &= ~VariantMask;
        BaseMapperCd.C |= Variant;
        BaseMapperCd.D = (uint)(Environment.ProcessId ^ rnd);
        C = (uint)rnd;
        C &= ~VariantMask;
        C |= Variant;
    }

    public static Guid CreateVersion7()
    {
        return CreateVersion7(DateTimeOffset.UtcNow);
    }

    public static unsafe Guid CreateVersion7(in DateTimeOffset timestamp)
    {
        MapperAb mapperAb;
        GetTicksAndSequence(timestamp, &mapperAb);
        mapperAb.B |= GuidVersion7;

        var d = (uint)RandomNumberGenerator.GetInt32(int.MaxValue);
        Vector128<byte> vec = Vector128.Create(mapperAb.A, mapperAb.B, C, d).AsByte();
        if (BitConverter.IsLittleEndian)
        {
            Vector128<byte> result = Vector128.Shuffle(vec, Vector128.Create((byte)0, 1, 2, 3, 6, 7, 4, 5, 11, 10, 9, 8, 15, 14, 13, 12));
            return Unsafe.As<Vector128<byte>, Guid>(ref result);
        }

        return Unsafe.As<Vector128<byte>, Guid>(ref vec);
    }

    public static Guid CreateVersion8()
    {
        return CreateVersion8(DateTimeOffset.UtcNow);
    }

    public static unsafe Guid CreateVersion8(in DateTimeOffset timestamp)
    {
        MapperAb mapperAb;
        GetTicksAndSequence(timestamp, &mapperAb);
        mapperAb.B |= GuidVersion8;

        Vector128<byte> vec = Vector128.Create(mapperAb.A, mapperAb.B, BaseMapperCd.C, BaseMapperCd.D).AsByte();
        if (BitConverter.IsLittleEndian)
        {
            Vector128<byte> result = Vector128.Shuffle(vec, Vector128.Create((byte)0, 1, 2, 3, 6, 7, 4, 5, 11, 10, 9, 8, 15, 14, 13, 12));
            return Unsafe.As<Vector128<byte>, Guid>(ref result);
        }

        return Unsafe.As<Vector128<byte>, Guid>(ref vec);
    }

    private static unsafe void GetTicksAndSequence(in DateTimeOffset timestamp, MapperAb* mapperAb)
    {
        mapperAb->Ticks = timestamp.ToUnixTimeMilliseconds();
        ArgumentOutOfRangeException.ThrowIfNegative(mapperAb->Ticks, nameof(timestamp));

        var lockTaken = false;
        _spinLock.Enter(ref lockTaken);
        if (mapperAb->Ticks > _lastMilliseconds)
        {
            _sequence = 0;
            _lastMilliseconds = mapperAb->Ticks;
        }
        else
        {
            if (_sequence == SequenceRollover) // rollover will happen, so we increase ticks
            {
                _sequence = 0;
                ++_lastMilliseconds;
            }

            mapperAb->Ticks = _lastMilliseconds;
        }

        uint b = _sequence++;
        if (lockTaken) { _spinLock.Exit(); }

        mapperAb->Ticks <<= 16;
        mapperAb->B |= b;
    }

    [StructLayout(LayoutKind.Explicit)]
    private struct MapperAb
    {
        [FieldOffset(0)] public long Ticks;
        [FieldOffset(0)] public uint B;
        [FieldOffset(sizeof(uint))] public uint A;
    }

    [StructLayout(LayoutKind.Explicit)]
    private struct MapperCd
    {
        [FieldOffset(0)] public uint D;
        [FieldOffset(0)] public ushort ProcessId;
        [FieldOffset(sizeof(ushort))] public byte ThreadId;

        [FieldOffset(sizeof(ushort) + sizeof(byte))]
        public byte TaskId;

        // ReSharper disable once MemberHidesStaticFromOuterClass
        [FieldOffset(sizeof(uint))] public uint C;
    }
}

And here are unit tests to ensure correctness.

namespace My.Utils.Tests;

using System.Diagnostics.CodeAnalysis;
using System.Globalization;
using My.Utils;

[TestCategory("Unit")]
[TestClass]
[ExcludeFromCodeCoverage]
public class SequentialGuidTests
{
    [TestMethod]
    public void CreateVersion7()
    {
        var timestamp = DateTimeOffset.UtcNow;
        ReadOnlySpan<char> unixTsMs = timestamp.ToUnixTimeMilliseconds().ToString("x12", CultureInfo.InvariantCulture);

        var guid = SequentialGuid.CreateVersion7(timestamp);
        ReadOnlySpan<char> guidString = guid.ToString("", CultureInfo.InvariantCulture);

        Assert.AreEqual(7, guid.Version);
        Assert.IsTrue(guid.Variant == 0x8 || guid.Variant == 0x9 || guid.Variant == 0xA || guid.Variant == 0xB);

        Assert.AreEqual(unixTsMs.Slice(0, 8).ToString(), guidString.Slice(0, 8).ToString());
        Assert.AreEqual('-', guidString[8]);
        Assert.AreEqual(unixTsMs.Slice(8).ToString(), guidString.Slice(9, 4).ToString());
        Assert.AreEqual('-', guidString[13]);
        Assert.AreEqual('7', guidString[14]);
        Assert.AreEqual('-', guidString[18]);
        Assert.IsTrue(guidString[19] == '8' || guidString[19] == '9' || guidString[19] == 'a' || guidString[19] == 'b');
        Assert.AreEqual('-', guidString[23]);
    }

    [TestMethod]
    public void CreateVersion7ThrowsForPreUnixEpoch()
    {
        Assert.ThrowsException<ArgumentOutOfRangeException>(static () => SequentialGuid.CreateVersion7(DateTimeOffset.UnixEpoch - TimeSpan.FromMilliseconds(1)));
        Assert.ThrowsException<ArgumentOutOfRangeException>(static () => SequentialGuid.CreateVersion7(DateTimeOffset.MinValue));
    }

    [TestMethod]
    public void CreateVersion8()
    {
        var timestamp = DateTimeOffset.UtcNow;
        ReadOnlySpan<char> unixTsMs = timestamp.ToUnixTimeMilliseconds().ToString("x12", CultureInfo.InvariantCulture);

        var guid = SequentialGuid.CreateVersion8(timestamp);
        ReadOnlySpan<char> guidString = guid.ToString("", CultureInfo.InvariantCulture);

        Assert.AreEqual(8, guid.Version & 0xF); // Framework Bug. Version is correctly set to 8.
        Assert.IsTrue(guid.Variant == 0x8 || guid.Variant == 0x9 || guid.Variant == 0xA || guid.Variant == 0xB);

        Assert.AreEqual(unixTsMs.Slice(0, 8).ToString(), guidString.Slice(0, 8).ToString());
        Assert.AreEqual('-', guidString[8]);
        Assert.AreEqual(unixTsMs.Slice(8).ToString(), guidString.Slice(9, 4).ToString());
        Assert.AreEqual('-', guidString[13]);
        Assert.AreEqual('8', guidString[14]);
        Assert.AreEqual('-', guidString[18]);
        Assert.IsTrue(guidString[19] == '8' || guidString[19] == '9' || guidString[19] == 'a' || guidString[19] == 'b');
        Assert.AreEqual('-', guidString[23]);
    }

    [TestMethod]
    public void CreateVersion8ThrowsForPreUnixEpoch()
    {
        Assert.ThrowsException<ArgumentOutOfRangeException>(static () => SequentialGuid.CreateVersion8(DateTimeOffset.UnixEpoch - TimeSpan.FromMilliseconds(1)));
        Assert.ThrowsException<ArgumentOutOfRangeException>(static () => SequentialGuid.CreateVersion8(DateTimeOffset.MinValue));
    }

    [TestMethod]
    public void SequentialGuid_CreateVersion7_SameTime_Test()
    {
        // Arrange
        Task.Delay(TimeSpan.FromMilliseconds(1)).Wait(); // Ensure a sequence that starts at 0;
        var timestamp = DateTimeOffset.UtcNow;
        ReadOnlySpan<char> unixTsMs = timestamp.ToUnixTimeMilliseconds().ToString("x12", CultureInfo.InvariantCulture);
        // Act
        var firstId = SequentialGuid.CreateVersion7(timestamp);
        var secondId = SequentialGuid.CreateVersion7(timestamp);
        // Assert
        ReadOnlySpan<char> firstString = firstId.ToString("", CultureInfo.InvariantCulture);
        ReadOnlySpan<char> secondString = secondId.ToString("", CultureInfo.InvariantCulture);

        Assert.AreEqual(unixTsMs.Slice(0, 8).ToString(), firstString.Slice(0, 8).ToString());
        Assert.AreEqual(unixTsMs.Slice(0, 8).ToString(), secondString.Slice(0, 8).ToString());
        Assert.AreEqual(unixTsMs.Slice(8).ToString(), firstString.Slice(9, 4).ToString());
        Assert.AreEqual(unixTsMs.Slice(8).ToString(), secondString.Slice(9, 4).ToString());
        Assert.AreEqual("000", firstString.Slice(15, 3).ToString());
        Assert.AreEqual("001", secondString.Slice(15, 3).ToString());
        Assert.AreEqual(firstString.Slice(18, 10).ToString(), secondString.Slice(18, 10).ToString());
        Assert.AreNotEqual(firstString.Slice(28).ToString(), secondString.Slice(28).ToString());
    }

    [TestMethod]
    public void SequentialGuid_CreateVersion7_SameTimeSequenceOverflow_Test()
    {
        // Arrange
        Task.Delay(TimeSpan.FromMilliseconds(1)).Wait(); // Ensure a sequence that starts at 0;
        var timestamp = DateTimeOffset.UtcNow;
        ReadOnlySpan<char> expectedUnixTsMs = timestamp.AddMilliseconds(1D).ToUnixTimeMilliseconds().ToString("x12", CultureInfo.InvariantCulture);
        for (var i = 0; i < 0x1000; ++i) { SequentialGuid.CreateVersion7(timestamp); }

        // Act
        var nextId = SequentialGuid.CreateVersion7(timestamp);
        var anotherId = SequentialGuid.CreateVersion7(timestamp);
        // Assert
        ReadOnlySpan<char> nextString = nextId.ToString("", CultureInfo.InvariantCulture);
        ReadOnlySpan<char> anotherString = anotherId.ToString("", CultureInfo.InvariantCulture);

        Assert.AreEqual(expectedUnixTsMs.Slice(0, 8).ToString(), nextString.Slice(0, 8).ToString());
        Assert.AreEqual(expectedUnixTsMs.Slice(0, 8).ToString(), anotherString.Slice(0, 8).ToString());
        Assert.AreEqual(expectedUnixTsMs.Slice(8).ToString(), nextString.Slice(9, 4).ToString());
        Assert.AreEqual(expectedUnixTsMs.Slice(8).ToString(), anotherString.Slice(9, 4).ToString());
        Assert.AreEqual("000", nextString.Slice(15, 3).ToString());
        Assert.AreEqual("001", anotherString.Slice(15, 3).ToString());
    }

    [TestMethod]
    public void SequentialGuid_CreateVersion8_SameTime_Test()
    {
        // Arrange
        Task.Delay(TimeSpan.FromMilliseconds(1)).Wait(); // Ensure a sequence that starts at 0;
        var timestamp = DateTimeOffset.UtcNow;
        ReadOnlySpan<char> unixTsMs = timestamp.ToUnixTimeMilliseconds().ToString("x12", CultureInfo.InvariantCulture);
        // Act
        var firstId = SequentialGuid.CreateVersion8(timestamp);
        var secondId = SequentialGuid.CreateVersion8(timestamp);
        // Assert
        ReadOnlySpan<char> firstString = firstId.ToString("", CultureInfo.InvariantCulture);
        ReadOnlySpan<char> secondString = secondId.ToString("", CultureInfo.InvariantCulture);

        Assert.AreEqual(unixTsMs.Slice(0, 8).ToString(), firstString.Slice(0, 8).ToString());
        Assert.AreEqual(unixTsMs.Slice(0, 8).ToString(), secondString.Slice(0, 8).ToString());
        Assert.AreEqual(unixTsMs.Slice(8).ToString(), firstString.Slice(9, 4).ToString());
        Assert.AreEqual(unixTsMs.Slice(8).ToString(), secondString.Slice(9, 4).ToString());
        Assert.AreEqual("000", firstString.Slice(15, 3).ToString());
        Assert.AreEqual("001", secondString.Slice(15, 3).ToString());
        Assert.AreEqual(firstString.Slice(18).ToString(), secondString.Slice(18).ToString());
    }

    [TestMethod]
    public void SequentialGuid_CreateVersion8_SameTimeSequenceOverflow_Test()
    {
        // Arrange
        Task.Delay(TimeSpan.FromMilliseconds(1)).Wait(); // Ensure a sequence that starts at 0;
        var timestamp = DateTimeOffset.UtcNow;
        ReadOnlySpan<char> expectedUnixTsMs = timestamp.AddMilliseconds(1D).ToUnixTimeMilliseconds().ToString("x12", CultureInfo.InvariantCulture);
        for (var i = 0; i < 0x1000; ++i) { SequentialGuid.CreateVersion8(timestamp); }

        // Act
        var nextId = SequentialGuid.CreateVersion8(timestamp);
        var anotherId = SequentialGuid.CreateVersion8(timestamp);
        // Assert
        ReadOnlySpan<char> nextString = nextId.ToString("", CultureInfo.InvariantCulture);
        ReadOnlySpan<char> anotherString = anotherId.ToString("", CultureInfo.InvariantCulture);

        Assert.AreEqual(expectedUnixTsMs.Slice(0, 8).ToString(), nextString.Slice(0, 8).ToString());
        Assert.AreEqual(expectedUnixTsMs.Slice(0, 8).ToString(), anotherString.Slice(0, 8).ToString());
        Assert.AreEqual(expectedUnixTsMs.Slice(8).ToString(), nextString.Slice(9, 4).ToString());
        Assert.AreEqual(expectedUnixTsMs.Slice(8).ToString(), anotherString.Slice(9, 4).ToString());
        Assert.AreEqual("000", nextString.Slice(15, 3).ToString());
        Assert.AreEqual("001", anotherString.Slice(15, 3).ToString());
    }
}
dotnet-policy-service[bot] commented 2 weeks ago

Tagging subscribers to this area: @dotnet/area-system-runtime See info in area-owners.md if you want to be subscribed.

huoyaoyuan commented 2 weeks ago

See #106377.

aloksharma1 commented 2 weeks ago

following +1

tannergooding commented 2 weeks ago

The performance improvement you've given here is namely because you're requesting fewer random bytes. That has upsides and downsides and may not be suitable for all scenarios. -- Notably, UUIDv8 has a specific meaning, it is not a name that can be used in this context.

The "right" next improvement here would be to provide additional overloads that allow users to specify a counter -or- to generate many UUIDs in a batch.

Guid.CreateVersion7 is to slow for my use case

This is going to need a realistic repro and scenario showing why its "too slow", not simply a microbenchmark.

In the microbenchmark you've given above, the UUIDv7Custom is only 7% faster, but that 7% is 58ns instead of 62ns, which is a whole 4ns faster. In the UUIDv8Custom case, you're instead 54% faster but still taking 28ns instead of 62ns, so its only 34ns faster. Its also being skewed to look artificially faster due to various other considerations.

In practice, this 4-34ns that you're theoretically gaining likely won't be realized. This is because it will likely be lost in the overall noise of the system instead. Things like branch mispredicts (20 cycles, or 4ns on a 5GHz core, of latency), general memory access (4 cycles for L1, 15-80 cycles for L3, 10ns for 6000@30 RAM, etc), pipelining delays, disk latency, general OS level context switches every ~15ms, network latency/ping, database accesses, etc will completely hide it.

That is, you're unlikely going to be bottlenecked on the difference between generating 16.1m and 35.7m UUIDs per second, especially when you factor in everything else you need to do with those UUIDs and general overhead of touching roughly 256MB of memory. Additionally, even if the UUID generation is faster here, it is likely going to slow down the overall more important factors around security, sorting, and other factors for the scenarios that such UUIDs are typically used and which have overall more impact in their lifetime.

Things are rarely so cut and dry as "this micro-benchmark is 2x faster, so now my app is 2x faster", so providing an example of how this is impacting your real world code in the context of the entire application would be beneficial and help prioritize any work accordingly

swtrse commented 2 weeks ago

@tannergooding I cannot go to deep into details but the main setting in my case is this:

My company is hosting a trading platform. On our main market there are currently ~2 million bots that try gain advantage over the others with orders put into an order book. This order book was optimized multiple times and operates entirely in memory (<- for the sake of simplicity of this explanation). When an order is put into the order book an unique identifier that identifies the order is applied. This id is realized as a guid with a time base component, it did run through various implementations over time until I endet up with the one seeing in my first post. The only difference to current production code is, that I translated it to .net9 to be able to compare it with CreateVersion7 and made it follow RFC 9562 since the current implementation in .net8 follows RFC 4122. I do have a benchmark that simulates a whole day of business. Since everything is known we can verify performance and correctness with this benchmark. The benchmark data was taken with the first version of the OrderBook. Running this benchmark on the first Order Book it took ~23h for all orders to be processed. Which was an indicator that the OrderBook was the Bottleneck. The current implementation in .net8 can process all orders in ~3h. Just converting the code to .net9 reduces the time to ~6m (I was shocked at first and thought of some error, but results are all correct as far as verified yet, i am still in the process to do this), this is due to a massive increase of performance of the SortedDictionary<> and/or StreamReader i still have to check. Replacing my implementation of UUIDv8 with CreateVersion7 raises the benchmark time to ~45m, switching back restores the ~6m and I did try that several times because I could not believe that just by switching to .net the benchmark goes from hours to minutes. So back on track, it seems that the runtime of the Guid creation has a direct effect on my OrderBook. The one thing I am still working on is to get rid of the SpinLock since this reduces performance a little when used from different threads simultaneously.

I know that UUIDv8 has a specific meaning. However, since my implementation is experimental, I think it fits very well into the description.

UUIDv8 provides a format for experimental or vendor-specific use cases. The only requirement is that the variant and version bits MUST be set as defined in Sections 4.1 and 4.2.

I just did not dare to call it Version7 because if you know the previous guids generated by the process you can predict future guids for that process. In my environment with hundreds of processes running and consumers that are unable to generate guids by themself and send them into the system it is of no consequence.

I hope this clears the use case question up a bit. I do not have a problem on sticking to my implementation I just wanted to share my code so maybe someone more involved in the framework itself could optimize the hell out of it. I do understand that hunting for nanoseconds seams pointless when looked at it as single event, but I strongly believe a nanosecond there and a nanosecond here, still adds up.

tannergooding commented 1 week ago

The current implementation in .net8 can process all orders in ~3h. Just converting the code to .net9 reduces the time to ~6m (I was shocked at first and thought of some error, but results are all correct as far as verified yet, i am still in the process to do this), this is due to a massive increase of performance of the SortedDictionary<> and/or StreamReader i still have to check. Replacing my implementation of UUIDv8 with CreateVersion7 raises the benchmark time to ~45m, switching back restores the ~6m and I did try that several times because I could not believe that just by switching to .net the benchmark goes from hours to minutes. So back on track, it seems that the runtime of the Guid creation has a direct effect on my OrderBook.

I would recommend using a profiling utility to do hot spot analysis and see if you can root cause whether Guid creation is the actual bottleneck here and what other bottlenecks exist. Unfortunately, simply trying two pieces of code and seeing an improvement (even a significant one) isn't necessarily a good indicator that given piece of code is the bottleneck in the overall application, in many cases (especially with concurrency, work stealing, or similar) changes can give the wrong impression due to subtly impacting how the code is executing.

As for custom GUID creation, the existing Guid(ReadOnlySpan<byte>) and Guid(uint, ushort, ushort, byte, byte, byte, byte, byte, byte, byte, byte) constructors exist for that purpose and would be the recommended approach if you need to truly customize all aspects of the Guid and have it be constructed with no additional overhead.

The CreateVersion7 and related APIs need to ensure they're following the RFC and that they are filling any random bits with a cryptographically secure RNG. We can expose additional overloads for parts of the RFC that aren't covered yet, but they won't give the level of control that your customized algorithm is using.