jogibear9988 commented 1 week ago

Describe the bug

If GuidsV7 are added to NET9, SQLServer inserts them in a way, so the index still gets fragmented (see issue: https://github.com/dotnet/runtime/issues/103658)

There should be a new setting in the SQLClient to switch GUID serialization to "TryWriteBytes(destination, bigEndian: true)" so also SQLServer can profit from the advantages of the new guid format.

jogibear9988 commented 1 week ago

see also: https://github.com/dotnet/runtime/issues/103658#issuecomment-2181837369 with a comment from @tannergooding

    This would then be something to raise with the SQL Server team, which is fully external to the .NET Libraries team and has their own management, customers (including beyond .NET), developers, back-compat bars, API review process, and other considerations.

jogibear9988 commented 1 week ago

https://feedback.azure.com/d365community/idea/69d13f8a-b9fb-ed11-a81c-000d3adb7ffd https://feedback.azure.com/d365community/idea/cb67ad60-2433-ef11-a296-6045bdbc5330

MichelZ commented 1 week ago

2946 is a prereq for this

edwardneal commented 1 week ago

It's also worth referencing dotnet/efcore#33579, which covers a similar problem within EF Core.

jogibear9988 commented 1 week ago

It's also worth referencing dotnet/efcore#33579, which covers a similar problem within EF Core.

I thought maybe add a Parameter to ConnectionString, like "SerializeGuidsBigEndian:true" could work. So also EFCore or any other ORM could benefit.

ErikEJ commented 1 week ago

@jogibear9988 Should it not be possible to simply read the new Version property? If so, a much better UX I think.

jogibear9988 commented 1 week ago

@jogibear9988 Should it not be possible to simply read the new Version property? If so, a much better UX I think.

??? Wich Version? You could also create a GUID V7 before NET9, and stor it already in the Database. If you now change the byte order, you maybe break old Programs. I would create a setting, and the read/and write all GUID's as BigEndian.

Wraith2 commented 1 week ago

If the setting to change how guids are handled is in the connection string then you can't migrate a database from using one type to another without losing the ability to use guids that already exist and that doesn't seem like an acceptable change to me.

Can you provide some simple tests for 1) "normal" guids and 2) v7 guids that show how they should be written and read back? I suspect that we need to have dedicated read and write methods because the type does not contain information about whether the bytes should be interpreted in a specific way or not. Consider how you would want to have both old and new guids used as parameter values. How do you specify that the contents are one or the other?

If we need to extend only Sql* types like SqlDataReader with a method like GetGuidV7 then that's ok but if we need to push that up to DbDataReader then that requires a framework change and I don't know what the opinion of other providers (postgres, mysql, sqlite) would be about that change. /cc @roji

jogibear9988 commented 1 week ago

Second thought about my suggestion, it will not work. It will, but then the guids will have another value when written with dotnet as when you read them without the connection parameter. I think the only correct solution would be a new Guid Type in Sqlserver, that serializes the same as all other databases do.

edwardneal commented 1 week ago

Now that I've got a bit of time to respond properly!

I think you're right; a blanket change to the way GUIDs are written to the network wouldn't work. It'd mean that every primary key value would be interpreted differently, and potentially that a developer could run a query from SQL Server Management Studio/Azure Data Studio and see a GUID with one value, but run the same query via M.D.S and see another value.

The real problem isn't the way that a TDS library transports GUIDs to the server, it's the way that a client generates GUIDs in the first place. The EF Core issue I linked is designed to address that for one library.

A better approach may be to create a utility method similar to SqlGuid.NewGuid(), which wraps Guid.NewGuid() and makes the result sortable by SQL Server. I'm not sure where this'd need to go to though. If that's a practical addition to something in the SqlTypes namespace, it'd be an API request in dotnet/runtime. If the SqlTypes namespace isn't accepting changes, maybe we could have a SqlGuidExtensions utility class in our SqlTypes namespace.

Wraith2 commented 1 week ago

The real problem isn't the way that a TDS library transports GUIDs to the server, it's the way that a client generates GUIDs in the first place.

Guids are just data. SqlClient doesn't generate them at all. We're currently in a situation where anything that is identified as a guid gets handled in a specific way and that works. What we have to add is some way to have a second handler without having a second type and that means the type information must be maintained out of band by the user.

The SqlParameter input case is mentioned is indicative of this. If you just put a guid into an object typed variable then there isn't enough information. The same problem happens in the other direction. If you have some rowdata which is typed as uniqueidentifier by sql server there is no way to know if that's old or new format. So whatever happens I don't see any other way to proceed than having special methods to both read and write the new format. We'll also need a wrapper type for object typed location like SqlParameter.

vanbukin commented 1 week ago

Depending on what type of data is used in the database column for storage, indexes may or may not be fragmented (uniqueidentifier / binary(16)) https://github.com/dotnet/runtime/issues/103658#issuecomment-2180882270

edwardneal commented 1 week ago

The real problem isn't the way that a TDS library transports GUIDs to the server, it's the way that a client generates GUIDs in the first place.

Guids are just data. SqlClient doesn't generate them at all.

I'm talking about a client-side operation, not M.D.S. Sorry for the confusion here.

We're currently in a situation where anything that is identified as a guid gets handled in a specific way and that works. What we have to add is some way to have a second handler without having a second type and that means the type information must be maintained out of band by the user.

The SqlParameter input case is mentioned is indicative of this. If you just put a guid into an object typed variable then there isn't enough information. The same problem happens in the other direction. If you have some rowdata which is typed as uniqueidentifier by sql server there is no way to know if that's old or new format. So whatever happens I don't see any other way to proceed than having special methods to both read and write the new format. We'll also need a wrapper type for object typed location like SqlParameter.

The methods you're describing make complete sense, and if that's definitely the problem to solve then I agree - it's the right way to solve it. I just don't think that SqlClient's conversion of Guids is the core problem. As I read it, the core problem seems to be that if a .NET application generates and inserts many GUIDs in parallel into a table with an index covering a uniqueidentifier column, that index will be fragmented. We'd be able to patch it within SqlClient with the method you've described, but I think the core problem is that the GUIDs being generated aren't currently "SQL Server-friendly." I've suggested implementing SqlGuid.NewId() to try to fix the problem at the point of generation.

tannergooding commented 1 week ago

SQL Server is using ToByteArray (or TryWriteBytes) on the Guid when storing the data, since the bool bigEndian parameter is new and they aren't passing it in, this means the data is serialized in little endian format.

However, when it later loads it as SqlGuid to do a comparison, it ends up doing so in byte order (effectively comparing assuming its in big endian format): https://source.dot.net/#System.Data.Common/System/Data/SQLTypes/SQLGuid.cs,8332f43230838509,references -- Edit: This might be the wrong code link, I seem to have misplaced where the actual logic that shows the problem was

What this means is that the issue isn't in how the data is serialized nor is the issue in how the data is compared. The root cause of the problem here is that serialization and comparison do not agree with eachother on the data format. If you changed the serializer to write as big endian that would fix the issue (as now it would match what the comparer expects). Identically if you instead changed the comparer to compare using little endian that would also fix the issue (as now it would match what the serialization is storing).

This behavior cannot be "fixed" for back-compat reasons, but some new APIs could be exposed that allow users to get the right thing to happen. Similarly, users can rely on the current behavior and workaround it by swapping the endianness of their produced Guid prior to giving it to the SQL client (which will have the net effective of it being serialized as big endian).

jogibear9988 commented 1 week ago

But a new ColumnType like Guid2 could fix it, or am I wrong?

Also a new Darabase flag, that could change default guid comparison?

Wraith2 commented 1 week ago

Can you show how a database reader would use a new ColumnType enum member to identify that a column should be reversed from the current representation? The tds internal metadata will be the same uniqueidentifier type for both versions of the data.

This sort of thing is why I'd like you to write tests. Throwing ideas around is very nice but it's abstract. Getting you to consider the real usage will prevent a lot of pointless arguing back and forth.

Consider this example. What should the results look like?

using (var connection = new SqlConnection(conStr))
using (var command = new SqlCommand ("select idCol from myTable", connection))
{
    con.Open();
   using (var reader = command.ExecuteReader())
   {
       var guidV7 = reader.GetGuidReversed();
       var guid = reader.GetGuid();
   }
}

Can you tell me what the values for the two guid variables will be?

edwardneal commented 1 week ago

I've been running some tests with big/little endian transformations of Guid.NewGuid and Guid.CreateVersion7. Have we ever been able to demonstrate that writing a v7 Guid in big-endian order will actually help reduce index fragmentation?

Considering the code below:

Guid v7Guid = Guid.CreateVersion7();
Span<byte> bigEndianBytes = stackalloc byte[16];
v7Guid.TryWriteBytes(bigEndianBytes, bigEndian: true, out _);

Guid transportGuid = new Guid(bigEndianBytes, bigEndian: false);
Span<byte> transportedBytes = stackalloc byte[16];
transportGuid.TryWriteBytes(transportedBytes);

transportedBytes and bigEndianBytes are identical - so if we followed the same steps to create transportGuid and passed it to the existing SqlClient as a parameter value, transportedBytes would be sent to SQL Server and the original v7Guid variable would effectively be sent in big-endian order.

I've run that in parallel across 8 threads, inserting the value into a column covered by a non-clustered index. Index fragmentation remains high at 99.24%, so it seems to me that simply sending/receiving Guids in big-endian isn't enough - however this is enabled. I'm in agreement with Wraith2, a code snippet to show how the output of Guid.CreateVersion7 should actually be sent to SQL Server is needed here.

One snippet which was linked earlier (thanks vanbukin) is this:

string ReorderUuid(string uuid)
{
    var src = Convert.FromHexString(uuid);
    var dst = new byte[16];
    // reorder for SQL SERVER Sort order
    dst[0] = src[12];
    dst[1] = src[13];
    dst[2] = src[14];
    dst[3] = src[15];
    dst[4] = src[10];
    dst[5] = src[11];
    dst[6] = src[8];
    dst[7] = src[9];
    dst[8] = src[6];
    dst[9] = src[7];
    dst[10] = src[0];
    dst[11] = src[1];
    dst[12] = src[2];
    dst[13] = src[3];
    dst[14] = src[4];
    dst[15] = src[5];
    // reorder for guid internal layout
    var tmp0 = dst[0];
    var tmp1 = dst[1];
    var tmp2 = dst[2];
    var tmp3 = dst[3];
    dst[0] = tmp3;
    dst[1] = tmp2;
    dst[2] = tmp1;
    dst[3] = tmp0;
    var tmp4 = dst[4];
    var tmp5 = dst[5];
    dst[4] = tmp5;
    dst[5] = tmp4;
    var tmp6 = dst[6];
    var tmp7 = dst[7];
    dst[6] = tmp7;
    dst[7] = tmp6;
    return Convert.ToHexString(dst);
}

This results in around 48% fragmentation in my testing.

Cross-referencing this to the right section of RFC9562, it looks to me like the key part of this is that the unix_ts_ms and rand_b fields are (effectively) being exchanged, so that unix_ts_ms lands in bytes 10-15 - the first bytes used by SQL Server to sort a GUID.

That methodology makes perfect sense, although I don't think it'd be correct to still describe it as a v7 Guid - at that point, we would have taken the v7 Guid components and shuffled them into a layout which aligns with SQL Server's sort order.

tannergooding commented 1 week ago

Have we ever been able to demonstrate that writing a v7 Guid in big-endian order will actually help reduce index fragmentation?

The consideration has nothing to do with how the data is written. Just as an int with the value 2 is the same regardless of whether its stored as big-endian (0x00, 0x00, 0x00, 0x02) or as little-endian (0x02, 0x00, 0x00, 0x00), the same is true for a GUID (aka UUID). Which is to say, 019311df-625f-7fc8-a7b6-91e14f776336 is always exactly that, regardless of how it is stored.

The reason that UUIDv7 provides benefit is because it is generated based on a timestamp and so newer UUIDs will be functionally "greater than" older UUIDs, allowing them to be sorted. That is, the first might be 019311df-a2d8-7adf-976a-9ab9b0b0461b and if you generate another approximately 1ms later, you might get 019311df-a2e2-7400-a910-ad22599a80cb (which when viewed as a 128-bit unsigned integer, is greater than the previous).

The bug in SQL here (which I don't remember exactly where it is, unfortunately) is that the GUID is stored as one format and then is later loaded and compared as the other format. This means, functionally, that the value 019311df-a2d8-7adf-976a-9ab9b0b0461b ends up being compared as df119301-d8a2-df7a-976a-9ab9b0b0461b, likewise the value of 019311df-a2e2-7400-a910-ad22599a80cb ends up being compared as df119301-e2a2-0074-a910-ad22599a80cb

Because of this mismatch, the comparisons "can" end up inverted in some cases, depending on the exact scenarios, which breaks the sorting and can lead to worse throughput.

As long as the data is stored and read consistently, such a bug can't exist. It's only when the data is serialized (stored) one way and deserialized (loaded) the other way that it breaks.

edwardneal commented 1 week ago

Thanks @tannergooding. I think we're on the same page about the benefits of UUIDv7, and the difference we're thinking of is located in the way that SQL Server itself compares the GUIDs. SqlGuid.Compare mirrors that logic here.

To compare two GUIDs, SQL Server and SqlGuid look at bytes 10-15, 8 & 9, 6 & 7, 4 & 5 and 0-3. In UUIDv7, bytes 10-15 contain random data, so enabling SQL Server to sort these in the expected order means that the contents of bytes 10-15 need to be swapped with the contents of bytes 0-5 (which contain the incrementing timestamp we care about.)

As I see it, the mismatch here is that we're sending SQL Server v4 or v7 UUIDs in little-endian format, and SQL Server is loading them in big-endian format and treating them as v1 UUIDs; this is where that index fragmentation/throughput loss comes from. It's also why I'd slightly lean towards providing a way to generate "SQL Server sort compatible" UUIDs (whether that's in SqlClient or in the runtime) in the first place, rather than asking the network layer to compensate for it.

jogibear9988 commented 1 week ago

The main problem is, v7 guids work in most other databases. They don‘t fragment. But in Sqlserver they do…

tannergooding commented 1 week ago

In UUIDv7, bytes 10-15 contain random data

This isn't strictly true. The data in bytes 10-15 "can" be random, but they are not required to be and it is fully allowed for this data (as per the official RFC) to contain extended precision for the timestamp and/or some form of monotonic counter.

For UUIDv7, a value 019311df-a2d8-7adf-976a-9ab9b0b0461b is functionally equivalent to a 128-bit unsigned integer 0x019311df_a2d87adf_976a9ab9_b0b0461b and so for correct sorting purposes (whether the additional data is seeded or not), it should be comparing the bytes in an order that represents this value. -- This general treatment of a UUID string being effectively the same order as you'd declare a 128-bit unsigned integer in hexadecimal is also relevant to other UUID versions, and so its generally considered the "right" way to compare any UUID

Because of this, a value serialized to a big-endian byte array should have its indices compared in the following order: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15. While a value serialized to a little-endian bye array, the order should instead be: 3, 2, 1, 0, 5, 4, 7, 6, 8, 9, 10, 11, 12, 13, 14, 15 -- The order here for little-endian stems from the fact that the definition is functionally uint32, uint16, uint16, uint8[8]. So you consider endianness with respect to the individual fields (and bytes do not themselves have endianness, since they are one unit and are always the same). This is different from a true uint128 where you'd have 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0; since it is a "single field"

edwardneal commented 1 week ago

Thanks for the explanation tannergooding. I agree that the right way to sort UUIDs is as 128-bit uints, and I think it brings us full circle to the core problem - SQL Server sorts them differently. The root cause of the fragmentation is that v7 UUIDs are only effectively sorted in order of unix_ts_ms (and only reduce index fragmentation) when the database server sorts UUIDs in the way we expect, and we don't have any way to force SQL Server to use that algorithm. IMO, the remaining topic is purely a question of where we adapt to that:

When transporting the UUIDv7, sending the bytes in an order such that the timestamp bytes in the UUIDv7 GUID are transparently rearranged to fit the SQL Server sort order;
At the point of generating it, adding a method which generates a "SQL Server sort compatible" UUIDv7 GUID up-front.

We could change the way we read and write a GUID on the transport. There are a few different options for that - if we wanted to be explicit, we could use a Uuidv7 SqlDbType, a SqlDataReader.GetUuidv7 method, etc. If we wanted to make it "just work" then we could detect a GUID with version 7 and handle it differently. Either of these approaches would let us send and receive UUIDv7 GUIDs.

One risk which I can see with that approach is how clients would switch from using normal Guids to UUIDv7 GUIDs when other people have a record of the ID to look up, such as the below:

Guid id = Guid.Parse(userInput);
SqlCommand cmd = CreateCommandWithUuidv7Parameter(sql: "SELECT TOP 1 Name FROM MyTable WHERE Id = @Id", id);

using SqlDataReader reader = cmd.ExecuteReader();

Assert.IsTrue(reader.Read());

In the simple case (table contains only reshuffled UUIDv7 uniqueidentifiers) this'll be fine. In a real-world case, where the table contains a mixture of UUIDv7 and non-UUIDv7 uniqueidentifiers the results will have changed. While there are workarounds, it feels to me like we'd be opening a pit of failure.

Placing the responsibility on the transport layer also opens a slightly weird quirk: a test similar to the below would fail, because the Guid would have its bytes rearranged on the transport but the varchar would not.

Guid id = Guid.CreateVersion7();
SqlCommand cmd = CreateCommandWithUuidv7Parameter(sql: "SELECT @Id, CAST(@Id AS varchar(max))", id);
using SqlDataReader reader = cmd.ExecuteReader();

reader.Read();

Assert.AreEqual(id, reader.GetUuidv7(0)); // True
Assert.AreEqual(id.ToString(), reader.GetString(1)); // False

My opinion's thus slightly in favour of creating a utility method somewhere - whether we create a SqlGuidHelper.NewSequentialId method in SqlClient, or request a SqlGuid.NewSequentialId method in runtime - which generates a SqlGuid which SQL Server would consider sequential, based on transforming Guid.CreateVersion7 as vanbukin suggested. This leaves the transport of GUIDs untouched, and means that switching from using Guid.NewGuid to the new method becomes trivial.

jogibear9988 commented 1 week ago

My opinion's thus slightly in favour of creating a utility method somewhere - whether we create a SqlGuidHelper.NewSequentialId method in SqlClient, or request a SqlGuid.NewSequentialId method in runtime - which generates a SqlGuid which SQL Server would consider sequential, based on transforming Guid.CreateVersion7 as vanbukin suggested. This leaves the transport of GUIDs untouched, and means that switching from using Guid.NewGuid to the new method becomes trivial.

We use on our side different Databases, and also Guids could be generated in the Frontend, so this would not help. We generate V7 Guids, so it works correctly for other databases, but not for SQLServer

jogibear9988 commented 1 week ago

https://devblogs.microsoft.com/oldnewthing/20190426-00/?p=102450

jogibear9988 commented 1 week ago

https://github.com/mareek/UUIDNext/issues/2#issuecomment-1367399283

Don't use SQL Server (just kidding). You should file a bug, though, that UNIQUEIDENTIFIER doesn't work well for your use case and that we need a sortable UUID type in SQL Server.

Wraith2 commented 1 week ago

Ok. So we've been around all the arguments back and forth. Are we back to the point where we can discuss the api instead of taking cheap shots at sql server?

Methods can be provided which will accept a little endian guid and convert it to big endian. These would be SqlClient specific. You could use them directly or EFCore could use them on your behalf and provide some sort of attribute [SqlServerBigEndianGuid] for decorating cto properties. GetGuidBE, GetSqlGuidBE? We'll need a wrapper of some sort for SqlParameter, do we just force users to manually create a big endian SqlGuid?

edwardneal commented 1 week ago

Thanks @Wraith2. I'll be able to write a concise reply at some point.

I'm (tentatively) in favour of something like SqlGuid.NewSequentialId or GuidExtensions.NewSequentialId to generate a new UUIDv7 value from scratch and reshuffle the bytes into a layout which matches SQL Server's sort algorithm; this aligns with the NEWSEQUENTIALID function in SQL Server. I'd then add a couple of Wrap functions to allow running those transformations on existing UUIDv7 values, and add variations for SqlGuid.

public static class GuidExtensions
{
    // Generates a UUIDv7 and reshuffles its bytes to align with SQL Server's sort algorithm.
    public static Guid NewSequentialId() { }
    public static SqlGuid NewSqlSequentialId() { }

    // Wraps user input, performing the same byte reshuffling. Throws if version7Uuid.Version is not 7.
    public static Guid WrapVersion7(Guid version7Uuid) { }
    public static SqlGuid WrapSqlVersion7(SqlGuid version7Uuid) { }
}

Once that Guid has been created by the helper class, I don't think it actually needs any transformation layers on SqlDataReader or SqlParameter - the return value would just be another Guid, all transformation taken care of. EF Core could then update its SequentialGuidValueGenerator to take advantage of it and close efcore#33579.

Wraith2 commented 1 week ago

People can do that already. The problem is that for guids that aren't created in that way they'll capable of being fragmented. You shouldn't need to use the database (or driver) as a source guids in order to make sure it works. We should be able to take a guid create by an application or taken from a postgres database and be able to insert it into the database in a specific way. Pushing it up the stack to creation time leaks implementation details around sql server that people shouldn't need to care about.

tannergooding commented 1 week ago

I'm (tentatively) in favour of something like SqlGuid.NewSequentialId or GuidExtensions.NewSequentialId to generate a new UUIDv7 value from scratch and reshuffle the bytes into a layout which matches SQL Server's sort algorithm; this aligns with the NEWSEQUENTIALID function in SQL Server. I'd then add a couple of Wrap functions to allow running those transformations on existing UUIDv7 values, and add variations for SqlGuid.

I think this is focusing too much on the wrong details as well, notably. Any new API should do the "right thing" with any Guid value. It doesn't really matter if its UUIDv7, UUIDv8 (user-defined/custom format), an arbitrary 128-bit GUID that doesn't follow any existing RFC, etc.

UniqueIdentifier are already stored in a particular way, that doesn't really need to change and is already fine as is. What is really needed is an equivalent to SqlGuid that does the comparison of the bytes in the right order and a way to tell SQL Client to use that it instead.

That is, the needed fix is functionally similar to providing a custom IComparer to a Dictionary, rather than using the built-in default.

jogibear9988 commented 1 week ago

My initial Idea when I opend the Ticket here was that the SQLClient could change the bytes when writeing to SQL server. I didn't think about that then the Guid is wrong when I for example would use it with another connection library to SQLServer or directly in SQL.

So I think now, the only Solution is a fix directly in SQLServer, for example a Database Flae to change default Guid Handling, so it is stored/compared in another way

edwardneal commented 1 week ago

I'm slightly in favour of leaving the transport layer alone because of a scenario like the below.

Guid id = Guid.Parse(userInput);

using var connection = new SqlConnection(connStr);

connection.Open();

using var command = new SqlCommand("SELECT COUNT(1) FROM MyTable WHERE Id = @Id", command);

command.Parameters.Add("@Id", SqlDbType.UniqueIdentifier).Value = id;

var rowCount = (int)command.ExecuteScalar();

Console.WriteLine($"Row count: {rowCount}");

The contents of MyTable might look as such:

Id
4e6c67ef-28bd-4f98-9fb1-3f2309f984b0
00acd809-4313-46fa-a774-a97cc67a3d62
4c5713e9-3a67-4f1c-ad3e-f1c5ff6b1c7d
019311df-a2d8-7adf-976a-9ab9b0b0461b
1f688481-9f18-4ba2-a33a-0ae45c0bb5bc

All told, it's a pretty trivial example: we accept user input, then print the number of records with a matching ID. Most of the IDs here are not V7 UUIDs, so its index fragmentation will eventually grow.

If we now implement some kind of byte-shuffling within SqlClient, our sample code behaviour has changed.

Byte-shuffle all GUIDs

Now, if the user inputs a GUID of 4e6c67ef-28bd-4f98-9fb1-3f2309f984b0 the row count will be zero. SqlClient returns incorrect results.

Byte-shuffle all GUIDs if the parameter has a new SqlDbType

If the user inputs a GUID of 4e6c67ef-28bd-4f98-9fb1-3f2309f984b0 the row count will be one. This result is correct. If the user inputs a GUID of 019311df-a2d8-7adf-976a-9ab9b0b0461b the row count will be one. This result is correct.

This is an improvement. However, now insert a new UUIDv7 value into MyTable. This UUIDv7 value has been generated via Guid.CreateVersion7() and it has been passed as a parameter with the new SqlDbType type (SqlDbType.Uuidv7?) into a simple INSERT INTO SQL statement. The output of CreateVersion7 has been returned to the user, but the byte-shuffle has inserted a different value into the table.

If the user provides the value they've been given to the code snippet, the row count will be zero. SqlClient returns incorrect results.

Same as above, with changes to the code snippet

Perhaps we try to fix this by using a code snippet as so:

Guid id = Guid.Parse(userInput);

using var connection = new SqlConnection(connStr);

connection.Open();

using var command = new SqlCommand("SELECT COUNT(1) FROM MyTable WHERE Id = @Id", command);

// This has changed: the SqlDbType is now the same as the SqlDbType passed to the INSERT statement.
command.Parameters.Add("@Id", SqlDbType.Uuidv7).Value = id;

var rowCount = (int)command.ExecuteScalar();

Console.WriteLine($"Row count: {rowCount}");

The user provides the UUIDv7 value given in the previous scenario to the code snippet, the row count is one. Bug appears to be fixed. The user re-tests with a pre-existing GUID of 4e6c67ef-28bd-4f98-9fb1-3f2309f984b0, and the row count will be zero. SqlClient returns incorrect results - it has the same problem as the previous section, but impacting all non-UUIDv7 values.

I don't see these situations as corner cases: it's the result of using an existing database which has a combination of Uuidv7 and non-Uuidv7 GUIDs in a single table.

Working scenario

For this to work, we'd need to change the snippet as so:

Guid id = Guid.Parse(userInput);

using var connection = new SqlConnection(connStr);

connection.Open();

using var command = new SqlCommand("SELECT COUNT(1) FROM MyTable WHERE Id IN (@Id_1, @Id_2)", command);

// Note that there are now two parameters: one for the old format, one for the new format.
command.Parameters.Add("@Id_1", SqlDbType.UniqueIdentifier).Value = id;
command.Parameters.Add("@Id_2", SqlDbType.Uuidv7).Value = id;

var rowCount = (int)command.ExecuteScalar();

Console.WriteLine($"Row count: {rowCount}");

This works, but it's not obvious - and I'd personally say that it's overkill when all the user wants is to insert a GUID which doesn't fragment the index covering its column. Separately, while this is unlikely, we no longer have the guarantee of "at-most-one-record" we'd get from a direct equality check to a primary key column.

Other variations

We might want to turn this problem into an edge case by deciding that we'll only byte-shuffle UUIDv7 Guids with a new SqlDbType. This solve the problem for people who only use the runtime, but it still means that it'll affect somebody who already generates their own UUIDv7 Guids.

Another way to handle this might be a separate Uuid type, and only byte-shuffle these. I've got no technical objection to this, although it feels a little consistent for the runtime to store a Uuidv7 in a standard Guid but for SqlClient to have a dedicated type.

If I'm missing something obvious and this type of case isn't a problem then I've not got any technical objection to writing Guids differently in the transport layer.

tannergooding commented 1 week ago

The storage is correct and the loading of the guid into SqlGuid is correct, this all round trips today to any GUID the user authors.

It’s really just an issue with the comparer function internally serializing to a stack local area of memory and then comparing bytes in the wrong order

dotnet / SqlClient

Add API to correctly insert GuidV7 to SQLServer (BigEndian), so no index fragmentation occurs #2989

Describe the bug

2946 is a prereq for this

Byte-shuffle all GUIDs

Byte-shuffle all GUIDs if the parameter has a new SqlDbType

Same as above, with changes to the code snippet

Working scenario

Other variations