[API Proposal]: Uuid v5 and v7 implementations into System.Guid

dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.

https://docs.microsoft.com/dotnet/core/

MIT License

14.98k stars 4.66k forks source link

[API Proposal]: Uuid v5 and v7 implementations into System.Guid #88290

Closed dariogriffo closed 2 months ago

dariogriffo commented 1 year ago

Background and motivation

With the increase of distributed storage, and extensive use of uuid in databases, uuid v4 insert times grow linearly making inefficient its use for primary keys. New versions are proven to have less impact on indexes than v4. A first benchmark with Uuid7 in my 8 core machine with docker/postgres 15 shows: inserting 1 million uuid v4 with EFCore takes 8.5 seconds with an empty table, and 16 seconds with 12750000 rows (table is only id) inserting 1 million uuid v7 with EFCore takes 8.5 seconds with an empty table, and 9 seconds with 12750000 rows (table is only id)

API Proposal

namespace System.Collections.Generic;

public static Guid NewUuid5()
{
...
}

public static Guid NewUuid7()
{
...
}

API Usage

Guid uuid5 = Guid.NewUuid5();
Guid uuid7 = Guid.NewUuid7();

Alternative Designs

No response

Risks

No response

rgroenewoudt commented 1 year ago

Please note that SQL Server needs a different ordering for unique identifier. Their sequential GUIDs work better then UUIDv7. For example: https://github.com/nhibernate/nhibernate-core/blob/master/src/NHibernate/Id/GuidCombGenerator.cs

A quick benchmark is 98% fragmentation for UUIDv7 vs 44% for sequential GUID for 100k inserted records.

dariogriffo commented 1 year ago

Please note that SQL Server needs a different ordering for unique identifier. Their sequential GUIDs work better then UUIDv7. For example: https://github.com/nhibernate/nhibernate-core/blob/master/src/NHibernate/Id/GuidCombGenerator.cs

A quick benchmark is 98% fragmentation for UUIDv7 vs 44% for sequential GUID for 100k inserted records.

Hi Robbie, yes, but that's SQL server specific that's why I didn't mention, yet still in proposal v7 hopefully can land into an official format. I guess until is approved this will remain inactive or closed... Let's see I opened now because seems like Microsoft takes time to implement new things refer to SHA3

vanbukin commented 1 year ago

The container proposed in this API Proposal could have been an excellent data type for storing Uuid v5 and v7, as well as any other versions, but it was closed. The alternative proposed option, in my opinion, does not solve the problems described here.

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-runtime See info in area-owners.md if you want to be subscribed.

Issue Details

### Background and motivation With the increase of distributed storage, and extensive use of uuid in databases, uuid v4 insert times grow linearly making inefficient its use for primary keys. New versions are proven to have less impact on indexes than v4. A first benchmark with [Uuid7](https://github.com/medo64/Medo.Uuid7) in my 8 core machine with docker/postgres 15 shows: inserting 1 million uuid v4 with EFCore takes 8.5 seconds with an empty table, and 16 seconds with 12750000 rows (table is only id) inserting 1 million uuid v7 with EFCore takes 8.5 seconds with an empty table, and 9 seconds with 12750000 rows (table is only id) ### API Proposal ```csharp namespace System.Collections.Generic; public static Guid NewUuid5() { ... } public static Guid NewUuid7() { ... } ``` ### API Usage ```csharp Guid uuid5 = Guid.NewUuid5(); Guid uuid7 = Guid.NewUuid7(); ``` ### Alternative Designs _No response_ ### Risks _No response_

Author:	dariogriffo
Assignees:	-
Labels:	`api-suggestion`, `area-System.Runtime`, `untriaged`
Milestone:	-

tannergooding commented 1 year ago

does not solve the problems described here.

UUID v5 and v7 have to do with how new values are constructed. It has little to do with the storage mechanism of the underlying bits.

Once a correct sequence of bits has been generated, it can then be serialized as required by the consumer. For some scenarios, this will be BigEndian format and for others LittleEndian format and which exactly is "correct" depends on how the consumer will interpret a raw sequence of 16-bytes.

API Proposal

It should probably be called NewGuid, rather than NewUuid, to match the existing naming and general framework design guidelines conventions. As per the UUID specification, GUID is simply an alternative name:

This specification defines a Uniform Resource Name namespace for UUIDs (Universally Unique IDentifier), also known as GUIDs (Globally Unique IDentifier).

And as per the existing NewGuid docs:

The method creates a Version 4 Universally Unique Identifier (UUID) as described in RFC 4122, Sec. 4.4

There will probably be some discussion as to whether they should be called NewGuid5, NewGuidV5, or NewGuidVersion5. There may also be some discussion on whether this should simply be NewGuid(int version) or something like NewGuid(GuidVersion version) for some enum GuidVersion.

Exposing version 5 is likely to be non-controversial as its already been formally reserved and defined by the RFC 4122 specification. Exposing methods for versions 1, 2, and 3 would probably be uncontroversial for similar reasons.

Exposing version 7 may get rejected on the basis that its not been formally ratified yet. The same would be true for versions 6 and 8. I expect that it would become uncontroversial once ratification is complete. -- This is namely because we would no longer have the flexibility to be compliant if anything changed between shipping such an API and ratification finalizing.

There may likewise need to be discussion on if the methods need any parameterization. NewGuid() itself is version 4 which is "random data". Other versions have different inputs or parameterization used to influence the generation. This is roughly:

version 1 is time based using 60-bits to represent the number of 100ns intervals in UTC since 15 OCT 1582
version 2 is for DCE Security
version 3 is from an MD5 hash over a given namespace value
version 4 is 128-bits of random data, replacing 6 bits with fixed values
version 5 is a SHA-1 hash over a given namespace value
version 6 is just version 1 with the bits stored in different positions
version 7 is similar to v1/6 but uses 48-bits representing the number of milliseconds in UTC since 1 JAN 1970 (Unix Epoch) + optional and otherwise random data for the remaining unused bits
version 8 is supposed to be set aside for vendor specific use-case and experimentation

Also noting, with regards to the parameterization, the UUID spec covers:

While Appendix A details a few interesting namespaces; implementations SHOULD provide the ability to input a custom namespace. For example, any other UUID MAY be generated and used as the desired namespace input for a given application context to ensure all names created are unique within the newly created namespace.

stephentoub commented 1 year ago

version 7 is similar to v1/6 but uses 48-bits representing the number of milliseconds in UTC since 1 JAN 1970 (Unix Epoch) + optional and otherwise random data for the remaining unused bits

Does this affect what it means to serialize as big vs little endian? The six bytes comprising that value won't be reversed on their own and will thus change where they are in the resulting byte order.

tannergooding commented 1 year ago

The actual definition of Version 7 is:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           unix_ts_ms                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          unix_ts_ms           |  ver  |       rand_a          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |var|                        rand_b                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                            rand_b                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                   Figure 11: UUIDv7 Field and Bit Layout

   unix_ts_ms:
      48 bit big-endian unsigned number of Unix epoch timestamp in
      milliseconds as per Section 6.1.

   ver:
      4 bit UUIDv7 version set as per Section 4.2

   rand_a:
      12 bits pseudo-random data to provide uniqueness as per
      Section 6.8 and/or optional constructs to guarantee additional
      monotonicity as per Section 6.2.

   var:
      The 2 bit variant defined by Section 4.1.

   rand_b:
      The final 62 bits of pseudo-random data to provide uniqueness as
      per Section 6.8 and/or an optional counter to guarantee additional
      monotonicity as per Section 6.2.

And the general layout description in the spec has been simplified to:

5.  UUID Layouts
   To minimize confusion about bit assignments within octets and among
   differing versions, the UUID record definition is provided as a
   grouping of fields within bit layout consisting four octets to a row.
   The fields are presented with the most significant one first.

Thus, the latest draft spec defaults to Variant 1 (0b10x) and what is effectively "BigEndian" layout.

The Variant 2 (0b110) layout that System.Guid defaults to is simply uint, ushort, ushort, byte[8] and thus its only the first 8-bytes that are effected by endianness and which differ, the last 8 bytes are always identical to Variant 1.

So we'd serialize it correctly if the user specifies ToByteArray(bigEndian: true) and deserialize it correctly given new Guid(array, isBigEndian: true). That is, the two layouts are

Where:
  a: unix_ts_ms  48-bits
  b: ver          4-bits
  c: rand_a      12-bits
  d: var          2-bits
  e: rand_b      62-bits

Variant 1 (Big Endian)
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | a0..7         | a8..15        | a16..23       | a24..31       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | a32..39       | a40..48       | b0..3 | c0..3 | c4..11        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | d | e0..5     | e6..13        | e14..21       | e 21..29      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | e30..37       | e38..45       | e46..53       | e 54..61      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Variant 2 (Little Endian)
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | a24..31       | a16..23       | a8..15        | a0..7         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | a40..48       | a32..39       | c4..11        | b0..3 | c0..3 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | d | e0..5     | e6..13        | e14..21       | e 21..29      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | e30..37       | e38..45       | e46..53       | e 54..61      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

dariogriffo commented 1 year ago

UUID v5 and v7 have to do with how new values are constructed. It has little to do with the storage mechanism of the underlying bits.

I will quote the proposal of the new uuid versions to answer. You can read the proposal here

So is not me who says that, is the reason to have a pseudo sequential generate random IDs.

One area UUIDs have gained popularity is as database keys. This stems from the increasingly distributed nature of modern applications. In such cases, "auto increment" schemes often used by databases do not work well, as the effort required to coordinate unique numeric identifiers across a network can easily become a burden. The fact that UUIDs can be used to create unique, reasonably short values in distributed systems without requiring synchronization makes them a good alternative, but UUID versions 1-5 lack certain other desirable characteristics:

GunboatDiplomat commented 1 year ago

If you are using UUIDs for purpose of DB indexing, I would recommend https://github.com/mareek/UUIDNext. It uses v7 for most DBs and a custom v8 low endian format for MSSQL.

The only thing missing is Guid type support for formatting LE UUIDs, which we will hopefully get soon.

The container proposed in this API Proposal could have been an excellent data type for storing Uuid v5 and v7, as well as any other versions, but it was closed. The alternative proposed option, in my opinion, does not solve the problems described here.

Am I missing something? The proposal (with bigEndian boolean) completely solves the issue, it allow us to add APIs mentioned in the original post like Guid.NewUuid7().

stephentoub commented 1 year ago

So we'd serialize it correctly if the user specifies ToByteArray(bigEndian: true) and deserialize it correctly given new Guid(array, isBigEndian: true)

I'm not concerned about whether we'd roundtrip it well; we could output any order we wanted and as long as we used the same order for deserialization it would roundtrip :)

My question is really:

we'd end up taking the first 32-bits of the 48-bit unix_ts_ms and reversing the bytes in those, and then the next 16-bits of the 48-bit unix_ts_ms and reversing the bytes in those, such that we didn't actually reverse all the bytes in the 48-bit value.
the reason we're exposing the big endian APIs is for interop with other systems
so, will other systems understand this schema of reversing portions of unix_ts_ms in this way?

Or asked another way, do the new APIs we've approved make sense to use with Guids created from the APIs proposed in this issue?

tannergooding commented 1 year ago

Or asked another way, do the new APIs we've approved make sense to use with Guids created from the APIs proposed in this issue?

Yes.

For isBigEndian: true, it exactly matches the spec for these types. For isBigEndian: false, it exactly matches the only other widely used and well-defined layout.

That is, the NewGuid() methods simply create a valid new GUID/UUID of the format f81d4fae-7dec-11d0-a765-00a0c91e6bf6. This can be visualized as a binary number 11111000000111010100111110101110011111011110110000010001110100001010011101100101000000001010000011001001000111100110101111110110 or a decimal number 329800735698586629295641978511506172918. This is irrespective of the underlying field layout or storage mechanism (it's effectively just generating a new 128-bit value and replacing very specific bits).

There are then two well-defined layouts:

variant 1, which is what's defined by the RFC spec and is functionally big endian (in a big endian setup, the types of fields don't matter)
variant 2, which is what System.Guid uses as its actual field layout
variant 0 and variant 3 are both reserved for future use

variant 2 is functionally defined as uint32, uint16, uint16, uint8[8] and is generally compatible with the original UUID field layout and is what has been historically called little-endian because every individual field is in little-endian format.

It would not make sense for us to define a new format where we have some uint48, uint16, uint64 or uint48, uint4, uint12, , uint2, uin62 because that would be incompatible with the two layouts everything currently expects.

george-polevoy commented 1 year ago

As for the performance of database index, just inserting does not give the full picture. Try reading in the insertion order in the same performance testing workload. For example, add an indexed timestamp column, and query some recently inserted records. This way timestamp index seek would retrieve the keys (cheap operation), and then key lookup will end up in reading a separate page for each key. This would bring the database to it's knees.

Here is a test harness designed specifically for this problem: https://github.com/george-polevoy/uuid-load-tests/

aloraman commented 1 year ago

so, will other systems understand this schema of reversing portions of unix_ts_ms in this way?

UUIDv7 is a very novel concept, all currently existing [^1] implementations use OSF DCE-compatible layout (i.e. Big Endian), and specifically for v7 it is essential to have a natural order based on timestamp value. Reversing portions of timestamp in a binary representation will break that assumption, so it makes total sense only in isBigEndian: true case.

Note that RFC-4122 and recent IETF drafts only describe Variant 1 with Big Endian layout. System.Guid implementaion is different for compatibility reasons and is explicitly out of scope of RFC-4122. Although, the MSFT "flavor" has little endian layout, RFC-4122 states that Variant 2 is reserved for "MSFT backward compatibility", and recent IETF draft specifically state "there is a known caveat that Microsoft's Component Object Model (COM) GUIDs leverage little-endian when saving GUIDs" - it does not actually mean MSFT flavor = Little Endian = Variant 2

First of all, Variant 2 is reserved for MSFT backward compatibility, but not all MSFT Guids are Variant 2. This variant actually refers to very old type of GUID, from OLE/COM/DCOM era, way before RFC-4122. For example:

{00000000-0000-0000-C000-000000000046} // IUnknown
{00000001-0000-0000-C000-000000000046} // IClassFactory
{00000112-0000-0000-C000-000000000046} // IOleObject
{0000031b-0000-0000-C000-000000000046} // CLSID_ErrorObject

Note that the variant field here has a value of 'C' = 1100, which means Variant 2, as expected. Also, there is no valid version field here, because version is a feature exclusive to Variant 1 only. Later, MSFT moved away from these variant 2 GUIDs to Variant 1 Version 1 (time/mac based), e.g.:

{B196B28F-BAB4-101A-B69C-00AA00341D07} // IClassFactory2
{99FCFEC4-5260-101B-BBCB-00AA0021347A} // IID_IObjectExporter
{4D9F4AB8-7D1C-11CF-861E-0020AF6E7C57} // IID_IActivation

Note that the variant field has a pattern of 10xx, which means Variant 1. Also, now there's a version field, with a value of 0001, which means Version 1. And then, MSFT moved to Variant 1 Version 4 (RNG based), e.g.:

{4193A62E-F128-410D-9746-E57B5485903D} // Windows.UI.Internal.Input.IMouseCapture
{C0EFA91A-EEB7-41C7-97FA-F0ED645EFB24} // MsRDP.MsRDP.10
{214A1A2F-232B-4FEF-93B7-2F7D9AF053AD} // produced by System.Guid.NewGuid()

Note that the variant field still has a pattern of 10xx, which means Variant 1. And the version field has a value of 0100, which means Version 4. However, even after the switch from Variant 2 to Variant 1, MSFT implementation stayed being little-endian based.

So, considering the existence of such "MSFT flavor", I think the input of Windows dev team is desirable here. If v5 or v7 to be implemented in .NET, it should have the same "flavor" as v4 has now, i.e., be compatible with the implementation Windows would have, in case there will be one in the future.

[^1]: NEWSEQUENTIALID in SqlServer serves the same purpose, but it's different implementation.

tannergooding commented 1 year ago

The internal field layout used by a given runtime (for it's GUID/UUID type) has very little to do with the actual data being tracked. It only matters for scenarios where you are unsafely accessing the raw bytes of the struct, such as via pointers, other forms of reinterpret like casting, or in the case you are implementing official serialization/deserialization APIs for the type.

.NET's internal field layout is, due to historical reasons, compatible with the Microsoft GUID type defined by the Win32 header files. This is formally defined as the following, which in .NET is uint, ushort, ushort, fixed byte[8]:

typedef struct _GUID {
  unsigned long  Data1;
  unsigned short Data2;
  unsigned short Data3;
  unsigned char  Data4[8];
} GUID;

This definition has not fundamentally changed since its original introduction and its not going to change in the future either, because doing so would be massively breaking to the entire Windows ecosystem. Since these are just regular fields, the byte order observed via unsafe code matches the endianness of the host system. Since Windows does not currently support any big endian systems, the layout in practice appears to users as always being "little endian" and discussions often simplify it down to this, even if its not "technically correct".

This also means that the layout of these bytes is "functionally fixed". Given a GUID/UUID of 6B29FC40-CA47-1067-B31D-00DD010662DA, Data1 is always equal to 0x6B29FC40, Data2 is always equal to 0xCA47, Data3 is always equal to 0x1067, and Data4 is always equal to { 0xB3, 0x1D, 0x00, 0xDD, 0x01, 0x06, 0x62, 0xDA }, regardless of the endianness of the underlying system. This then translates over to every single other representable value and so the layout is always well-defined for such a struct.

The NewGuid() API that exists today generates a "spec compliant" 128-bit GUID/UUID following the variant 1, version 4 mechanism described by RFC 4122. Which is to say it generates 122-bits of random data setting the variant field to 0b10x and the version field to 0b0100. The general format of this is xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx where M represents the version, N represents the variant, and x represents "random data". Since the encoding of variant 1 in the N location is itself described as 0b10xx, we then have 4 baseline masks:

xxxxxxxx-xxxx-4xxx-8xxx-xxxxxxxxxxxx
xxxxxxxx-xxxx-4xxx-9xxx-xxxxxxxxxxxx
xxxxxxxx-xxxx-4xxx-Axxx-xxxxxxxxxxxx
xxxxxxxx-xxxx-4xxx-Bxxx-xxxxxxxxxxxx

This proposal is simply asking for additional functions which would generate variant 1, version 5 and variant 1, version 7 mechanisms described by the latest RFC 4122 draft specification. This would be generally the same as the above but with the M location being 5 -or- 7, rather than 4. The x locations would likewise be generated as per the specification rather than being "random data".

it does not actually mean MSFT flavor = Little Endian = Variant 2

The general discussion was being somewhat simplified as it basically boils down to, in practice, that Windows uses "little endian" and many other (but not all) systems use "big endian". In both cases, this is regardless of the variant/version.

Ultimately, users who need to serialize a Guid as a sequence of 16-bytes should use the relevant API and pass in the appropriate true or false value to isBigEndian depending on which byte layout they need. This will typically be true for developers interacting with various Unix APIs or systems, some types of networking, and some databases. This will typically be false for developers interacting with COM and various Windows APIs or systems. The "correct" value will be scenario dependent.

There is no need to discuss hypothetical alternative layouts which are extremely unlikely to be encountered in practice and the APIs being exposed are sufficient for what is effectively the two de-facto layouts users will actually encounter.

aloraman commented 1 year ago

This proposal is simply asking for additional functions which would generate variant 1, version 5 and variant 1, version 7

I'm interested specifically in Variant 1 Version 7, specifically, whether the fundamental property of v7 - time-based ordering will be preserved with current layout, and will it be compatible with changes from #86798 I.e., Consider {t1, t2, ... tN} is a sequence of timestamps, and {g1, g2, ... gN} is a sequence of v7 Guids produced at these timestamps. Now, if t1 < t2 < ... tN, will it also be true that g1 < g2 < ... < gN. In code, in string serialized form, in little-endian binary serialized form (e.g. in SqlServer), and in big-endian serialized form (e.g. in PostgreSql)

tannergooding commented 1 year ago

Endianness makes no difference as to the value.

Just as the 32-bit unsigned integer 0x0000_0001 is always 1 regardless of big-endian ({ 0x00, 0x00, 0x00, 0x01 }) or little-endian ({ 0x01, 0x00, 0x00, 0x00 }) and therefore always compares less than 0x0000_0002.

The same is true that 6B29FC40-CA47-1067-B31D-00DD010662DA is always this exact GUID, regardless of how its stored. It is compared as if it were a 128-bit unsigned integer 0x6B29FC40_CA471067_B31D00DD_010662DA and therefore will compare less than 0x6B29FC41_CA471067_B31D00DD_010662DA (6B29FC41-CA47-1067-B31D-00DD010662DA) and so on.

osexpert commented 1 year ago

In the proposal: public static Guid NewUuid5() But v5 is namespace based Guid's. Signature for Guid v5 would be something like: public static Guid NewUuid5(Guid namespace, string name)

aloraman commented 1 year ago

But v5 is namespace based...

Yes, you're right, it is namespace based - so the signature would be different. However, it's also the reason it probably won't be implemented in BCL. Namespace based UUIDs (v3 and v5) use MD5 and SHA-1 hash functions to construct an actual binary value. These hash functions are considered vulnerable, so MSFT has an outright ban on any new usage of MD5/SHA-1, even in non-cryptographic context.

osexpert commented 1 year ago

It is compared as if it were a 128-bit unsigned integer 0x6B29FC40_CA471067_B31D00DD_010662DA and therefore will compare less than 0x6B29FC41_CA471067_B31D00DD_010662DA (6B29FC41-CA47-1067-B31D-00DD010662DA) and so on.

According to this there is no standard way to CompareTo\order a Guid: https://devblogs.microsoft.com/oldnewthing/20190426-00/?p=102450

tannergooding commented 1 year ago

That’s referring to the across all GUID types.

The .NET System.Guid type has long had a standardized sorting behavior and matches how I detailed it above.

matteocontrini commented 4 months ago

The new UUIDs specification was published a few days ago with RFC 9562: https://datatracker.ietf.org/doc/rfc9562/

Any chance this could now be prioritized?

memark commented 3 months ago

Looking forward to this as well!

In the meantime, my praise goes to @tannergooding for his thorough and insightful posts in this thread.

tannergooding commented 3 months ago

I expect this won't happen in .NET 9, it's a bit late in the cycle with about 1-2 months before the time we normally snap for RC1.

There's a lot of other work across the stack that is already the focus and a priority, but this can potentially be included for .NET 10. In order for that to happen (or to even attempt getting it in for .NET 9 if there's community members willing to do the work), the proposed surface needs to be updated to account for some of the feedback given here.

Namely we should ensure that signatures exist that make sense given the existing APIs on System.Guid and which take in any necessary parameters. The proposal should likely include a link to the spec and a small explanation of the layout for the given version and how the parameters are used in constructing the value.

For example, UUIDv5 is generated using bits from the SHA-1 of a string namespaceId + string name, but there may be considerations based on it being SHA-1 based as well, since SHA-1 is no longer recommended for use. While for UUIDv7 there likely needs to be consideration about how the Unix timestamp is seeded (is it automatic, can the user provide one explicitly, etc). There should likely also be a consideration on whether support for other stable UUID versions should be provided simultaneously (noting that we have UUIDv4 already via NewGuid())

neuecc commented 3 months ago

I have created a library for Ulid (890 stars) that has similar characteristics to UUIDv7. https://github.com/Cysharp/Ulid

While its characteristics are desirable, the fact that it is a custom type (struct Ulid) has caused significant frustration because the .NET ecosystem and databases require GUIDs. Therefore, I strongly believe it is very desirable for UUIDv7 to be provided as a standard and for its type to be GUID. I hope for its early adoption (in .NET 9!).

tannergooding commented 2 months ago

I opened #103658 which covers this for UUIDv7 and opens the path to also do it for UUIDv5 in the future if that's desired