dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.47k stars 4.76k forks source link

Read/Write Big/LittleEndian support for Guid #86798

Closed tannergooding closed 1 year ago

tannergooding commented 1 year ago

Rationale

System.Guid represents .NETs support for Globally Unique Identifiers or GUIDs (sometimes also referred to as Universally Unique Identifiers or UUIDs).

This type represents a 128-bit value in the general format of xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx where each x represents a hexadecimal digit, where the 4 bits of M represent the version and the 4 bits of N represent the variant number. This is sometimes referred to as the 8-4-4-4-12 format string. And while the string representation is "well-defined", the actual underlying order of these bytes has a few different representation and there are several variants of the general RFC 4122 that may require a specific ordering or even may limit specific bytes to have a particular meaning.

.NET's System.Guid follows a field layout best matching variant 2 which is identical to variant 1 outside the endianness. In particular, variant 1 is "big endian" and variant 2 is "little endian". Variant 1 and 2 are used by the current UUID specification and are by far the most prominent while variant 0 is largely considered deprecated. Outside of the endianness these variants are differented by minor bit pattern requirements.

Given this is largely an endianness difference and is otherwise just a minor difference in the bits used for M and N, we would prefer to not introduce a new type just to handle this and would instead prefer to introduce explicit APIs and overloads on Guid that help identify and handle these differences.

Proposed APIs

namespace System
{
    public partial struct Guid
    {
        // public Guid(byte[] value);
        // public Guid(ReadOnlySpan<byte> value);
        public Guid(ReadOnlySpan<byte> value, bool isBigEndian);

        // public byte[] ToByteArray();
        public byte[] ToByteArray(bool isBigEndian);

        // public bool TryWriteBytes(Span<byte> destination);
        // public bool TryWriteBytes(Span<byte> destination, out int bytesWritten); -- new in .NET 8
        public bool TryWriteBytes(Span<byte> destination, bool isBigEndian, out int bytesWritten);
    }
}

namespace System.Buffers.Binary
{
    public static partial class BinaryPrimitives
    {
        public static Guid ReadGuidBigEndian(ReadOnlySpan<byte> source);
        public static Guid ReadGuidLittleEndian(ReadOnlySpan<byte> source);

        public static bool TryReadGuidBigEndian(ReadOnlySpan<byte> source, out Guid value);
        public static bool TryReadGuidLittleEndian(ReadOnlySpan<byte> source, out Guid value);

        public static bool TryWriteGuidBigEndian(ReadOnlySpan<byte> destination, Guid value);
        public static bool TryWriteGuidLittleEndian(ReadOnlySpan<byte> destination, Guid value);

        public static void WriteGuidBigEndian(ReadOnlySpan<byte> destination, Guid value);
        public static void WriteGuidLittleEndian(ReadOnlySpan<byte> destination, Guid value);
    }

Drawbacks

As discussed on https://github.com/dotnet/runtime/issues/86084, there is a general concern that users may not be aware that these other overloads exist -or- may not be aware that the difference between variant 1 and variant 2 is endianness and that .NET defaults to variant 2.

However, the same general considerations exists from exposing a new type such as System.Uuid. There are then additional considerations on top in that it further bifurcates the type system, doesn't easily allow polyfilling the support downlevel without shipping a new OOB package, and may further confuse users due to the frequent interchange of the GUID and UUID terminology.

After discussion with a few other API review team members, the general consensus was that shipping a new type is undesirable and we should prefer fixing this via new APIs/overloads and potentially looking into additional ways to surface the difference to users (such as analyzers, API documentation, etc).

Additional Considerations

Given the above, we may want to consider how to help point users towards their desired APIs given the overloads on Guid that do not require specifying endianness.

We can clearly update the documentation, but an analyzer seems like a desired addition that can help point devs towards specifying the endianness explicitly. Obsoleting the existing overloads was also proposed, but may be undesirable since the current behavior isn't "wrong", it just may be the undesired behavior in some scenarios.

We may also want to consider whether a static Guid NewGuid() overload that allows conforming to Version 4, Variant 1 is desired. The docs only indicate it is version 4 and calls into the underlying System APIs. It does not indicate if it produces Variant 1, Variant 2, or truly random bits for N.

stephentoub commented 1 year ago

@tannergooding has done a good job of representing the viewpoints of those of us who maintain the .NET APIs. I understand not everyone is happy with those viewpoints, but at this point no additional arguments are being presented and the discussion has run its course. I'll mark this as api-ready-for-review again so that we can discuss the specifics of this proposal again at the next API review meeting opportunity and decide whether to add the proposed APIs or a subset of them or none at all. Thanks to all who shared their perspectives on the matter.

aloraman commented 1 year ago

no additional arguments are being presented

Well, this discussion is now mostly an essay exchange, with timezone differences, job and sleep it takes a lot of time to produce another argument, so it mostly halted to a stop. Though

discussion has run its course

is probably correct. For single post here, there's one weighted and thought-out reply by @tannergooding here, but there are ten giggles and insults from the crowd on Discord as well. This is no longer maintainable. So, I'll address the latest arguments and will eagerly await this issue to be closed.


Bool parameters exist and are used in a plethora of places

At the cost of readability. Consider the aforementioned BigInteger:

var bytes = bigInt.ToByteArray(true, false); // is it signed little-endian? unsigned big-endian?

It's no surprise in the fact top search results for "bool method parameter" are generally negative, e.g., "What is wrong with boolean parameters?", "Is it wrong to use a boolean parameter to determine behavior", "Clean code: The curse of a boolean parameter", "Do Boolean Parameters Make You a Bad Programmer?" From the top of my head, there are three general scenarios of using boolean parameters where readability doesn't suffer:

Proposed overloads are to be used outside Guid.cs file, so it's not the first case. Second case is largely absent in C# - due to property syntax availability. For third case, there are named parameter syntax in C#, but you can't actually enforce the usage (Though creating analyzer for such scenarios is a good idea). Inlay hints with parameter names alleviate the issue, though they are present only in IDE.


RFC-4122 explicitly defines itself as a sequence of named fields Per the RFC, the field layout is the same between variant 1 (0b10x) and variant 2 (0b110)

Even the original RFC-4122 document from 2005 claims to describe only variant 1, and even specifically requires big-endianness by stating in the same section:

The fields are encoded as 16 octets, with the sizes and order of the fields defined above, and with each field encoded with the Most Significant Byte first (known as network byte order)

But newer drafts define a different structure for v7 and v8, and even redefine the field structure for versions 2-5. Compare the v1 structure

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           time_low                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           time_mid            |  ver  |       time_high       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |var|         clock_seq         |             node              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                              node                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

with v4

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           random_a                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          random_a             |  ver  |       random_b        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |var|                       random_c                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           random_c                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

or v7

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           unix_ts_ms                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          unix_ts_ms           |  ver  |       rand_a          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |var|                        rand_b                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                            rand_b                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Also, System.Guid is not just a Variant 2 implementation. Guid.NewGuid() produces values that claims to be Variant 1, which can be easily checked with the following code snippet:

Console.WriteLine(Convert.ToString((Guid.NewGuid().ToByteArray()[8] & 0xF0) >> 4,2))

It produces the 10xx pattern of Variant 1, not 110x of Variant 2

This issue notwithstanding, I hope Runtime and BCL will be prepared for eventual popularization of UUID v6/v7/v8


As for the "Why don't you write your own library" argument. It was addressed already, actually. And these libraries already exist. The problem is with integrating these libraries with other libraries. In .NET Community there's a general preconception against taking a hard dependency on another third-party library (fueled not only by recent drama with IdentityServer of ImageSharp, but the whole 20 years of history with open source in .NET circles). The simple, blittable (big-endian. Machines are generally little-endian, yes, but network is generally not. At least middle-endian is dead and forgotten) primitive type in BCL would help a lot in easing cross-integration and operation between these libraries. System.Uuid is the easiest one to implement. If it is not to be, there are alternatives, but they are much farther away, beyond the horizon. That is, support for configurable size value type fixed buffers,

public struct Binary<int n>
{
     byte[n] _buffer;
}

with the support from EntityFrameworkCore in mapping these types (which would promote support for these types in other libraries) would, I hope, alleviate all the issues current UUID libraries have.

vanbukin commented 1 year ago

Before everything is completed, I would like to highlight a few points. Developers have been facing the non-obvious behavior of this API for a very long time. Here are a few examples:

Creating your own primitive to solve the issues described here is not difficult. Moreover, I have been maintaining one such package myself since 2019. As correctly noted above, the problem is integrating it somewhere. As a developer, it disappoints me that I have to solve such issues and related problems myself. Primitives are the work that BCL should take on. In any other popular technology stack, there is no such issue at all simply because their roots do not go deep into Windows and its API.

Ilchert commented 1 year ago

Made quick search in Githib of Guid.ToByteArray usage and there are 3 common usage:

  1. Just as random byte generator/identifier link, link. In this case it does not matter what ToByteArray returns.
  2. .NET <-> .NET serialization serialize - deserialize serialize - deserialize serialize - deserialize. In this case consistence between ToByteArray and new Guid(byte[]) is enought.
  3. Cross platform serialization serialize - deserialize serialize - deserialize serialize - deserialize. In this case authors already made workaround around Guid.

Also, some libs use own serialization mechanism for Guid npgsql.

I would like to suggest do not change Guid api to not affect case 1, 2 and add only proposed BinaryPrimitives api to provide standardized api for Guid ->Big/Little endian conversion that can be consumed in case 3 or replace own serialization mechanizm. Also, BinaryPrimitives distributes via nuget System.Memory and user can use new api without additional preprocessor conditions link.

bartonjs commented 1 year ago

Video

namespace System
{
    public partial struct Guid
    {
        // public Guid(byte[] value);
        // public Guid(ReadOnlySpan<byte> value);
        public Guid(ReadOnlySpan<byte> value, bool bigEndian);

        // public byte[] ToByteArray();
        public byte[] ToByteArray(bool bigEndian);

        // public bool TryWriteBytes(Span<byte> destination);
        // public bool TryWriteBytes(Span<byte> destination, out int bytesWritten); -- new in .NET 8
        public bool TryWriteBytes(Span<byte> destination, bool bigEndian, out int bytesWritten);
    }
}
AlexRadch commented 1 year ago

Can you assign this issue to me?

AlexRadch commented 1 year ago

public bool TryWriteBytes(Span<byte> destination, out int bytesWritten); -- new in .NET 8 does not exist yet. Should it be created also?

huoyaoyuan commented 1 year ago

Should this supersede #53354 ?

AlexRadch commented 1 year ago

Should this supersede #53354 ?

Yes. #53354 can be closed.

AlexRadch commented 1 year ago

30940 issue can be closed also.