Closed prezaei closed 2 years ago
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.
Tagging subscribers to this area: @dotnet/area-system-runtime See info in area-owners.md if you want to be subscribed.
Author: | prezaei |
---|---|
Assignees: | - |
Labels: | `api-suggestion`, `area-System.Runtime`, `untriaged` |
Milestone: | - |
Your sample has the comment prints out something like: "abcdefgh123". Can you give a concrete example of the type of output you expect? For example, what would be the exact output of Guid.Parse("7a1e687f-a5a9-47e1-b5ec-fd71abf06303").ToString("U")
?
I think you won't be able to do much better than base64-encoding the bytes of the GUID. If you do that, you'll get a 24 character string (22 if you remove padding). And since you can already easily do that today (e.g. Convert.ToBase64String(guid.ToByteArray())
), it doesn't help that much and any such encoding would be completely non-standard, I don't see much reason to add this directly to Guid
.
I think you won't be able to do much better than base64-encoding the bytes of the GUID.
As for Web, Base64Url encoding fits better, there is a nice helper method in ASP.NET Core. https://docs.microsoft.com/en-us/dotnet/api/microsoft.aspnetcore.webutilities.webencoders.base64urlencode?view=aspnetcore-5.0
You got it. Effectively, we want to base64url encode the Guid
. Doing this outside of System.Guid
forces a heap allocation if we use Guid.ToByteArray()
. The only way around the heap allocation that I can think of is something like this:
var guid = Guid.NewGuid();
Span<byte> bytes = stackalloc byte[16];
guid.TryWriteBytes(bytes);
// now convert the bytes to a string using a base64URL encoder...
var result = Base64UrlEncode(bytes);
This is messy and given how often we have all seen Guid
s in URLs of the sites that we visit, it seems like a common problem that we should have a solution for.
Thoughts?
I don't see much appetite for adding a domain-specific method (base64url encoding) directly on the Guid
type. Keep in mind also that GUIDs and other identifiers tend to be used as paths in URLs rather than as query string components, and base64 is a case-sensitive encoding. Most real-world applications stick to all-lowercase identifiers for things that appear in paths and do not expect to see mixed case-sensitive identifiers. This further restricts the range of applications which might get use out of such an API.
@GrabYourPitchforks, totally agree that we might end up with Base45. I would not look at this as a domain specific thing here. The actual problem I am trying to solve right now is to pass shorter correlation id (x-correlation-id
) headers between some of our Azure products. Today, we use the simple Guid.ToString("N")
. That wastes bandwidth.
In fact, Guid.ToString("N")
is significantly used for serializing to JSON, YAML, gRPC and much more. Oh and don't forget all the logs that go into Geneva
with all these long identifiers. Only if there was a shorter version of this, we will be helping climate change! You think I am joking, but I am not. This really is not a niche scenario for service code.
That last response kinda provides evidence for my point that this is domain-specific, no? :) The problem as originally stated is that you wanted something appropriate for placement in URLs; but https://github.com/dotnet/runtime/issues/55290#issuecomment-875939075 shows that you actually want something that's the shortest ASCII computer-readable representation of arbitrary binary data (which doesn't need to be URL-safe); and that making something human-readable and URL-appropriate might require yet another format (like base45). But Guid.ToString
is really meant to produce something that fulfills both a standard pattern and is human-readable, so it's really not the ideal place for putting this functionality.
I'm sympathetic to the problem, but since your desire is for the shortest possible representation and that you're willing to use a non-standard format to accomplish it, what's wrong with defining your own extension method?
public static string ToMinimalRepresentation(this Guid guid)
{
Span<byte> asBytes = stackalloc byte[16];
Guid.TryWriteBytes(asBytes);
Span<char> asChars = stackalloc char[22];
Base64UrlEncode(from: asBytes, to: asChars);
return asChars.ToString(); // the one and only allocation
}
@GrabYourPitchforks, I can certainly do this and in fact have done so. My point is this pattern is pretty common out there. From websites to HTTP headers, to logs, etc. One of the reasons is that frameworks just don't make it available/easy for all devs to use these. Open any of our logs in Kusto/Cosmos and you will be shocked that no-one has taken the time to use a shorter version for a correlation id. Why? Is it because they can't write the code? No. It is because we don't make it easy for them to use an out of the box formatter and they are busy with so many other things. A good framework is there to simplify these types of work.
Let me ask you this: Why do we have so many other format specifiers but feel hesitant to add one more that has serious and real use cases? For instance, have you ever seen a Guid
in this format: {0x00000000,0x0000,0x0000,{0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00}}
(that is format = X
).
Another question: What is the downside of adding this format? I am totally with you that we need to have a high bar for corlib but I believe I have made a good case here with so many use cases.
@prezaei
The actual problem I am trying to solve right now is to pass shorter correlation id (
x-correlation-id
) headers between some of our Azure products. Today, we use the simpleGuid.ToString("N")
. That wastes bandwidth.
If you care about saving every byte of bandwidth, why are you using a globally unique identifier? Wouldn't an identifier that's unique just to your application serve you as well, while being much shorter?
On the other hand, I just googled "guid to short string" and it seems to be a relatively common problem (with base64 usually being the suggested solution).
Why do we have so many other format specifiers but feel hesitant to add one more that has serious and real use cases?
Maybe there was a reason for the other formats when they were first added. Maybe there still is. Or maybe they were a mistake. In any case, I don't that's really a justification to add one more format.
@svick, still need something globally unique. This is not for a single application. It will potentially be used by all of Azure if I get my way. HTH
Agree with @GrabYourPitchforks that this seems like a very domain specific API and not something we'd be interested in expose on System.Guid
directly.
Given that Guid
can format to a Span<char>
, Utf8Formatter
can be used to format to a Span<byte>
, and Base64Encoder
likewise has APIs that can process a Span
, you can already do this "allocation free" just potentially with an additional loop over what a custom implementation might provide/allow.
Background and Motivation
The shortest form of a string representation of
System.Guid
is 32 characters long (format = "N"
). Although this is URL friendly, it is not the most concise URL friendly representation of it. From RFC2396 Section 2.3, the URL safe characters are:It would be worthwhile to add support for a new format specifier, perhaps
U
toSystem.Guid
that generates a shorter URL friendly string representation of the guid.Proposed API
The following changes will be required:
System.Guid.Parse(string input)
System.Guid.Parse(ReadOnlySpan<char> input)
System.Guid.ParseExact(string input, string format)
format
is"U"
System.Guid.ParseExact(ReadOnlySpan<char> input, ReadOnlySpan<char> format)
format
is"U"
System.Guid.TryParse([NotNullWhen(true)] string? input, out Guid result)
System.Guid.TryParse(ReadOnlySpan<char> input, out Guid result)
System.Guid.TryParseExact(ReadOnlySpan<char> input, ReadOnlySpan<char> format, out Guid result)
format
is"U"
System.Guid.TryParseExact([NotNullWhen(true)] string? input, [NotNullWhen(true)] string? format, out Guid result)
format
is"U"
System.Guid.ToString(string? format)
format
isU
System.Guid.TryFormat(Span<char> destination, out int charsWritten, ReadOnlySpan<char> format = default)
Guid
instance into the provided character span in its shorter string form whenformat
is"U"
Usage Examples
Alternative Designs
We could also add extension methods.
Risks
All I can think of is that
TryParse(...)
now requires an extra check on the length of the string to determine if it should try to parse the string as a short representation of the URL.Notes
marks
characters ("-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
) to keep the URLs even more readable.