dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.93k stars 4.64k forks source link

Remove or relax the length limitation of Uri #96544

Open hez2010 opened 8 months ago

hez2010 commented 8 months ago

The data is a valid scheme which can carry some data other than path in a Uri. For example, we can save an image as base64 data in a Uri:

var uri = new Uri("data:image/jpeg;base64,...");

However, Uri in .NET has a maximum length limitation of 0xFFF0 which makes it unable to save some large data (especially, base64 with length more than 65496 (0xFFF0 - "data:image/jpeg;base64,".Length)).

It's common that a HTML contains an image tag which use base64 encoded data for its src attribute:

<image src="data:image/jpeg;base64,..." />

However, we are unable to get the src as a Uri if the length of src is too large, for example, an 1 mb image.

new Uri("data:image/jpeg;base64," + Enumerable.Repeat("a", 70000).Aggregate((s, n) => $"{s}{n}"))

The above code will throw a UriFormatException with Invalid URI: The Uri string is too long..

While the same code is perfect valid in JavaScript: new URL("data:image/jpeg;base64," + "a".repeat(70000)). The limitation in .NET here also makes it hard to do interop with JavaScript URL types (such as Blazor), because we cannot simply project the Uri type to the JavaScript URL type.

In the real-world scenario, I hit this issue while using semantic-kernel with GPT-4 Vision, which uses base64 image data in its image_url field. I have to resize the images to make them less than the length limitation.

I would like to suggest remove this limitation, or relax it, for example, relax the limitation from 0xFFF0 to 0x7FFFFFF0.

ghost commented 8 months ago

Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.

Issue Details
The `data` is a valid scheme which can carry some data other than path in a `Uri`. For example, we can save an image as base64 data in a `Uri`: ```csharp var uri = new Uri("data:image/jpeg;base64,dGVzdA=="); ``` However, `Uri` in .NET has a maximum length limitation of `0xFFF0` which makes it unable to save some large data (especially, base64 with length more than 65496 (`0xFFF0 - "data:image/jpeg;base64,".Length)`). It's common that a HTML contains an `image` tag which use base64 encoded data for its `src` attribute: ```html ``` However, we are unable to get the `src` as a `Uri` if the length of `src` is too large, for example, an 1 mb image. ```csharp new Uri("data:image/jpeg;base64," + Enumerable.Repeat("a", 70000).Aggregate((s, n) => $"{s}{n}")) ``` The above code will throw a `UriFormatException` with `Invalid URI: The Uri string is too long.`. While the same code is perfect valid in JavaScript: `new URL("data:image/jpeg;base64," + "a".repeat(70000))`. The limitation in .NET here also makes it hard to do interop with JavaScript URL types (such as Blazor), because we cannot simply project the `Uri` type to the JavaScript `URL` type. In the real-world scenario, I hit this issue while using semantic-kernel with GPT-4 Vision, which uses base64 image data in its `image_url` field. I have to resize the images to make them less than the length limitation. I would like to suggest remove this limitation, or relax it, for example, relax the limitation from `0xFFF0` to `0x7FFFFFF0`.
Author: hez2010
Assignees: -
Labels: `area-System.Net`, `untriaged`
Milestone: -
colejohnson66 commented 8 months ago

https://github.com/dotnet/runtime/blob/22068a8f96d6d1c01b26db70fd24a433e1fdd5ef/src/libraries/System.Private.Uri/src/System/Uri.cs#L1911-L1912

What's the purpose of there even being a cap on the length? RFC 3986 doesn't mention anything about a limit needing to exist, and there's no comment as to why.

MihaZupan commented 8 months ago

The same discussion from a few years ago: Uri rejects otherwise valid strings with length >= 65520 (#1857)

Essentially:

Ideally, we would have a dedicated type for dealing with data Uris - https://github.com/dotnet/runtime/issues/85164#issuecomment-1557163136. Also related: #95838

But we don't have a good answer for cases where you're stuck with using Uri by existing APIs, like in your case, so I'm hesitant to close this as wont-fix as we did with #1857.

Moving to Future for now to let others comment/upvote if they run into the same problem

dersia commented 6 months ago

@MihaZupan this would be really important for the Azure Ai sdk and semantic kernel and I would love for this to move forward. can we do something to move this forward?

I thought about your suggestion to use a new type like proposed in https://github.com/dotnet/runtime/issues/85164 but the problem that I see with that is for apis/sdks like Azure SDK that are built from open-api-specs using tools like autorest. the type in the spec just says url and what would you map it to? System.Uri or System.DataUri.

so from that point of view I think it makes much mir sense to move this proposal forward and make System.Uri to work with DataUri's. and in fact it does work with DataUri's already, it is just that because of the size limitations it won't work with bigger DataUri's.

also adding a new type would bring up a weird state where you can use System.Uri as long as you stay within bound and have to switch to System.DataUri when you exceed this limit. so there would be no clear recommendation which one to use + this isn't breaking.

//Edit: having said the above, I would change the api to add an ctor overload that takes a ROM<bytes> that would than do the base64 encoding and create a valid datauri out of bytes.

//Edit2: I looked into the DataUri spec again and as a result I would also suggest to add a factory-method to allow correct construction of non base64 DataUri's. in this case I would change the byte-based constructor to be also a factory method. I will add alternate api change suggestions.

namespace System;

public class Uri
{
    // new apis
    // data:content/type;foo=bar;base64,R0lGODdh
    public Uri(ReadOnlyMemory<byte> data, string contentType, Dictionary<string, string>? additionalProperties = null);
    // data:content/type;foo=bar;utf8,hello world
    public static Uri AsDataUri(string data, string contentType, Dictionary<string, string>? additionalProperties = null);
}

alternate and preferred design:

namespace System;

public class Uri
{
    // new apis
    // data:content/type;foo=bar;base64,R0lGODdh
    public static Uri AsDataUri(ReadOnlyMemory<byte> data, string contentType, Dictionary<string, string>? additionalProperties = null);
    // data:content/type;foo=bar;utf8,hello world
    public static Uri AsDataUri(string data, string contentType, Dictionary<string, string>? additionalProperties = null);
}
mikepizzo commented 1 month ago

Note: this is not just for data: uris.

We have scenarios in which users create really long queries (for example, return instances that match this set of ids, where the number of ids can be in the thousands).

In OData, we have added a pattern for adding /$query to the resource path and using POST to pass the query string in the body of the request to work around limitations with HTTP stacks. However, internally we still build full URIs for the request (for example, in order to generate a "nextLink") and run into this limitation.

See, for example, OData#1293

hez2010 commented 1 month ago

Yeah. This issue has to be fixed as internally we are creating an instance of Uri even if you are using the string overload in HttpRequestMessage. @MihaZupan We have no way to workaround this issue if there's any usage to HttpClient.