Closed airbreather closed 4 years ago
The System.Uri class seems to reject URI strings greater than a strangely short length (experimentally, 65520 or higher
I'm really not sure you can consider that short.
The System.Uri class seems to reject URI strings greater than a strangely short length (experimentally, 65520 or higher
I'm really not sure you can consider that short.
Relative to the majority of use cases, I suppose not. "Short" was relative to hard limits of the system, given that this is the recommended type to use for data
URIs and generated query strings; i.e., I understand unstable behavior brought about by strings approaching 2 GiB in length, but this is an artificial (edit: undocumented (edit-edit: actually just poorly-documented)) limitation that triggers well before we get to that range.
It's really data
URIs that I'm concerned about here, since base64-encoded data streams hit this limit before the payload reaches 48 KiB.
I believe we have an issue for this already, but I can't find it.
CC @MihaZupan
This is a documented length limit for Uri
One of the conditions listed for throwing UriFormatException:
The length of uriString exceeds 65519 characters.
I agree that we can add a comment about this limit to the main Uri page as rn it is only documented on constructors.
Changing the length limit on Uri is not a trivial change.
If you require more info about specific sections of really long data Uris, I would recommend writing a small parser specific to data uris, as the format seems trivial to parse.
If you require more info about specific sections of really long data Uris, I would recommend writing a small parser specific to data uris, as the format seems trivial to parse.
FWIW, the issue came up when a user of my library was trying to use a "dumb" record type that happens to contain a URI. I'm not directly using the services that System.Uri
adds on top of System.String
, other than validating that the string is a valid URI.
My data model uses System.Uri
primarily because the following FxCop rules document that it is best-practice to strongly favor System.Uri
over System.String
for all API members that represent URIs:
CA1054:UriParametersShouldNotBeStrings
CA1055:UriReturnValuesShouldNotBeStrings
CA1056:UriPropertiesShouldNotBeStrings
Those pages all include explanations along the lines of:
A string representation of a URI is prone to parsing and encoding errors, and can lead to security vulnerabilities. The System.Uri class provides these services in a safe and secure manner.
I can, of course, work around this issue by exposing an alternative way to create an instance of the GpxWebLink
class that supports pass-through for an overlong URI, but that means exposing the user to all of the bad things that the quoted text above warns about. I really like to incorporate concepts of "fail-fast" and "pit of success" into my API designs, and the workaround is the antithesis of both.
Is it the intent of your data model that your System.Uri
field be able to handle arbitrary file uploads? I realize URLs theoretically can be of arbitrary length, but in practice I imagine the majority of Uri
's consumers thinking of it as "what goes into the address bar in the web browser?"
Is it the intent of your data model that your
System.Uri
field be able to handle arbitrary file uploads?
Not really, no.
The library defines a .NET object model that matches the XML schema for GPX files. The type in question is the equivalent of linkType
in the XML schema.
The idea is that the library can be used for either:
It's part of the NetTopologySuite project, so this object model representation isn't necessarily the beginning or the end of the chain (there are converters to go between this direct object model translation and the standard NetTopologySuite geometry types), but some people find enough value in just being able to interact with GPX objects more directly.
From what I gather from IsraelHikingMap/Site#1128 and NetTopologySuite/NetTopologySuite.IO.GPX#39:
data
URIs in other contexts. 50 KiB is a little on the big side, but I think it's not outlandish.I realize URLs theoretically can be of arbitrary length, but in practice I imagine the majority of
Uri
's consumers thinking of it as "what goes into the address bar in the web browser?"
Of course -- I'm sure that's why I can't seem to actually find other widespread complaints about this from the fundamental System.Uri
side of things, despite this type being around for so long (though I have found evidence of people running into issues that flow from this same constant, e.g., #14674).
As explained by @airbreather the site uses NetTopologySuite.IO.GPX to serialize files.
We want to store the content of an image inside this GPX file by using a base64 string to represent the image.
Creating multiple files is not part of the GPX sceme and opening this file offline should present the image.
The solution we decided was to use the URI field to store this base64 data, which as far as I understand is a valid value.
Bottom line: URL="data:image/jpg;base64, dksljfhalskjdfh...
is our use case.
Duplicate of #14674
Fixed in 5.0 in PR https://github.com/dotnet/corefx/pull/41686 Please try 5.0 and let us know if you still face the problem
@karelz The length limit on the Uri constructor remains the same, that change only adresses the static (Un)Escape* methods since those do not allocate a Uri.
Removing the length limit would be possible, tho a breaking change if someone relies on the ctor for length validation. It would also be a performance regression (larger allocation for the Uri object), as we currently store indexes as ushorts in a packed struct (hence the ~2^16 length limit). Seeing as there isn't much demand for Uris that long I don't think it is worth doing it.
Duplicate of #14674 Fixed in 5.0 in PR dotnet/corefx#41686 Please try 5.0 and let us know if you still face the problem
@karelz my comment is probably unnecessary after @MihaZupan's, but just to confirm... the problem is still there. I did a shallow clone of d9c448e, then ran build.cmd
, then re-ran the repro with the obvious tweaks (netcoreapp5.0
, incremented TextLength
in Program.cs
).
Here's the console output from that run (irrelevant path details collapsed to just (...)
):
C:\(...)\dotnet5>dotnet --info
C:\(...)\dotnet5\.dotnet
.NET Core SDK (reflecting any global.json):
Version: 5.0.100-preview.1.20112.7
Commit: db3a77a3f6
Runtime Environment:
OS Name: Windows
OS Version: 10.0.18363
OS Platform: Windows
RID: win10-x64
Base Path: C:\(...)\dotnet5\.dotnet\sdk\5.0.100-preview.1.20112.7\
Host (useful for support):
Version: 5.0.0-alpha.1.20112.1
Commit: 673635654c
.NET Core SDKs installed:
5.0.100-preview.1.20112.7 [C:\(...)\dotnet5\.dotnet\sdk]
.NET Core runtimes installed:
Microsoft.AspNetCore.App 5.0.0-preview.1.20111.6 [C:\(...)\dotnet5\.dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.NETCore.App 5.0.0-alpha.1.20112.1 [C:\(...)\dotnet5\.dotnet\shared\Microsoft.NETCore.App]
Microsoft.WindowsDesktop.App 5.0.0-preview.1.20112.4 [C:\(...)\dotnet5\.dotnet\shared\Microsoft.WindowsDesktop.App]
To install additional .NET Core runtimes or SDKs:
https://aka.ms/dotnet-download
C:\(...)\dotnet5>dotnet run --project ..\ConsoleApp0\ConsoleApp0.csproj
C:\(...)\dotnet5\.dotnet
Unhandled exception. System.UriFormatException: Invalid URI: The Uri string is too long.
at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind)
at System.Uri..ctor(String uriString)
at Program.Main() in C:\(...)\ConsoleApp0\Program.cs:line 10
Removing the length limit [...] would also be a performance regression (larger allocation for the Uri object), as we currently store indexes as ushorts in a packed struct (hence the ~2^16 length limit).
Sorry in advance for what's probably a naïve analysis, I only took a brief look at the data layout to get a feel for what's behind this, and I had a thought. Also, I'm a sucker for performance-related puzzles.
MoreInfo
looks like it's implementing a pattern that I've used before when storing property values that aren't needed in performance-sensitive cases; is that right?
If so, perhaps we could continue to store Offset
values directly on the UriInfo
for instances that fit within the current limits, but support longer strings by adding some uint
properties onto MoreInfo
, adding yet another flag to Flags
to indicate when this happens?
A sketch of the idea:
[Flags]
private enum Flags : ulong
{
Zero = 0x00000000,
/* existing unrelated things ... */
VeryLongPath = 0x200000000000,
}
private Flags _flags;
private UriInfo _info = null!; // initialized during ctor via helper
private class UriInfo
{
/* existing unrelated things ... */
public Offset Offset;
/* existing unrelated things ... */
public MoreInfo? MoreInfo; // Multi-threading: This field must be always accessed through a _local_
// stack copy of m_Info.
};
[StructLayout(LayoutKind.Sequential, Pack = 1)]
private struct Offset
{
public ushort Scheme;
public ushort User;
public ushort Host;
public ushort PortValue;
public ushort Path;
public ushort Query;
public ushort Fragment;
public ushort End;
};
private class MoreInfo
{
/* existing unrelated things ... */
public uint Offset_Scheme;
public uint Offset_User;
public uint Offset_Host;
public uint Offset_PortValue;
public uint Offset_Path;
public uint Offset_Query;
public uint Offset_Fragment;
public uint Offset_End;
};
private bool IsVeryLongPath
{
get { return (_flags & Flags.VeryLongPath) != 0; }
}
private uint Offset_Scheme
{
// assumption: we allocate MoreInfo as soon as we set the VeryLongPath flag
get { return IsVeryLongPath ? _info.MoreInfo!.Offset_Scheme : _info.Offset.Scheme; }
set
{
if (IsVeryLongPath)
{
_info.MoreInfo!.Offset_Scheme = value;
}
else
{
_info.Offset.Scheme = checked((ushort)value);
}
}
}
private uint Offset_User
{
// assumption: we allocate MoreInfo as soon as we set the VeryLongPath flag
get { return IsVeryLongPath ? _info.MoreInfo!.Offset_User : _info.Offset.User; }
set
{
if (IsVeryLongPath)
{
_info.MoreInfo!.Offset_User = value;
}
else
{
_info.Offset.User = checked((ushort)value);
}
}
}
/* etc. */
/* and then, of course, change callers that use _info.Offset */
It could even be taken a little bit further, changing Offset
to be LayoutKind.Explicit
and storing half of the double-wide offsets directly, only going to MoreInfo
for the other half of the offset values.
If I'm reading it right, the performance cost becomes:
MoreInfo
is not allocatedMoreInfo
is allocated, depending on whether the values would be ulong
or uint
(even for "short" URI strings).My intuition is that these penalties are not too significant, given how much memory an instance of System.Uri
already takes up.
In case my intuition is wrong, another possible way of doing this without storing extra data per instance would be to continue to store offsets in ushort
values the way we do when the values are short enough, but for offsets >65519, store a sentinel value of 65535 indicating that the "real" value lies somewhere beyond. When we need to access such an offset, we would need to re-scan the original string to figure it out.
O(1)
all of a sudden switches to O(n)
when a string gets long enough...Reopened per https://github.com/dotnet/runtime/issues/1857#issuecomment-590012273 since it was incorrectly resolved as a dupe. The NCL team may still want to resolve it as "won't fix" at their discretion, but it's not a dupe.
Sorry, I assumed the static methods are used in constructor as well. I was wrong. Thanks for reopening @GrabYourPitchforks.
Removing the length limit would be possible, tho a breaking change if someone relies on the ctor for length validation. It would also be a performance regression (larger allocation for the Uri object), as we currently store indexes as ushorts in a packed struct (hence the ~2^16 length limit). Seeing as there isn't much demand for Uris that long I don't think it is worth doing it.
I don't mind the breaking change, but the perf impact is undesired given rare demand. I recommend to close it as "Won't fix" -- recommending workaround to use the static methods without the limit. @MihaZupan if you agree (which it seems you are), please resolve it as such.
MoreInfo looks like it's implementing a pattern that I've used before when storing property values that aren't needed in performance-sensitive cases; is that right?
Correct
In case my intuition is wrong, another possible way of doing this without storing extra data per instance would be to continue to store offsets in ushort values the way we do when the values are short enough, but for offsets >65519, store a sentinel value of 65535 indicating that the "real" value lies somewhere beyond. When we need to access such an offset, we would need to re-scan the original string to figure it out.
65535 is also a valid length that could be stored if the input is shorter than 65520 and has characters that would expand it to 65535.
If we were to implement it, it would most likely be done by changing the minimum amount of fields to int (we could likely keep Scheme, User, Host, PortValue and Path as ushort). That would result in a 4-8 byte size increase of Uri, which might be acceptable considering other inefficiencies in Uri.
Considering that data Uris are the only realistic scenario for Uris longer than 65k, I don't think it is worth making such a change.
I understand that in your case you may not be interacting with the created Uri other than for validation / satisfying the object model. But otherwise the utility of Uri for data uris is practically equivalent to
bool IsValid(string uriString) =>
uriString.StartsWith("data:", StringComparison.OrdinalIgnoreCase);
It provides no utility for accesing the media type/data - those have to be manually parsed out. It also provides no validation for it except for the scheme name.
There are other downsides to removing the length restriction.
So if for some reason we decided to remove the length restriction in the future, a lot of work would have to be done to improve Uri overall first.
Given that this issue is the only ask for relaxing the length restriction that I know of and you already implemented a workaround for your use case, I don't see a real benefit for moving forward here.
Consider as won't-fix until then.
In my original comment, I wrote this:
I totally understand why an artificial restriction might appear as a result of engineering trade-offs, so perhaps a decent first step might be to document it better?
Should open an issue on dotnet/dotnet-api-docs for that "decent first step" part of it?
As of right now, the "Filing Issues" section of this repo doesn't link over there (I only found out about it when scrolling through and noticing the cross-reference from dotnet/dotnet-api-docs#3906).
I suppose we could extend the scope of dotnet/dotnet-api-docs#3906 to cover this as well
I suppose we could extend the scope of dotnet/dotnet-api-docs#3906 to cover this as well
Things like this can go into the 'Remarks' section of the relevant API methods/properties of Uri.
The
System.Uri
class seems to reject URI strings greater than a strangely short length (experimentally, 65520 or higher, so probably this constant). I'm not able to find any documentation of this fact at https://docs.microsoft.com/en-us/dotnet/api/system.uri?view=netcore-3.1, nor does RFC 3986 discuss length restrictions (though I went through those two sources very quickly, so this could be my error).Most URIs I've seen are short enough, but
data
URIs can easily be imagined to exceed this limit.CA1056:UriPropertiesShouldNotBeStrings
and related rules have been guiding me towards using this class to represent URI strings for a while now; their documentation does not list "you want to support URI strings that exceed 65519 characters in length" as good reasons to suppress warnings.In my perfect world, I would appreciate it if the
System.Uri
class could represent all URI values that are legal in RFC 3986 (or at least the subset that can be represented bySystem.String
).I totally understand why an artificial restriction might appear as a result of engineering trade-offs, so perhaps a decent first step might be to document it better?
Repro program
The following C# program writes out the longest URI I could manage.
Incrementing the commented line causes it to throw an exception:
Program.cs
ConsoleApp0.csproj
System Information