Open MattMcL4475 opened 1 year ago
cc: @dotnet/ncl
Yes, according to RFC3986, query
is the value following the ?
separator.
But I would argue that a lot of people (myself included) think of the query as including the leading ?
.
The exact same quirk applies to the Fragment
property. The #
character is considered as only the delimiter by the spec.
By extension, APIs on UriBuilder
follow a similar pattern. Note that UriBuilder.Query/Fragment
accept values without the leading ?
/#
and will add one for you if not already present.
ASP.NET also exposes a QueryString
type that always includes the leading separator.
.NET behavior should continue to match between all of these.
As you point out, Uri.Query
has behaved this way for a very long time, and yet I don't see other issues in this repo questioning the behavior. A rough GitHub code search shows tens of thousands of uses of the property, where ~10% of those are followed by some combination of .Substring(1)
, .TrimStart('?')
, .Replace("?", "")
.
A change in behavior is unacceptable in this case, as it would break tons of existing uses, including those currently using workarounds such as Substring(1)
.
I also do not believe that possible confusion due to the extra ?
warrants obsoleting this API. Misuse in the minority case should be easy to catch by running the application/test.
Augmenting the API documentation (especially IntelliSense) to point out this behavior would be a useful addition though:
-Gets any query information included in the specified URI.
+Gets any query information included in the specified URI, including the leading '?' character if not empty.
-Gets the escaped URI fragment.
+Gets the escaped URI fragment, including the leading '#' character if not empty.
While comparing the behavior of different language standard libraries, JavaScript does include the leading ?
/#
:
new URL("http://example.com:8042/over/there?name=ferret#nose").search
> '?name=ferret'
new URL("http://example.com:8042/over/there?name=ferret#nose").hash
> '#nose'
How impactful are the inconsistencies in Azure Storage's documentation in practice?
My (of course biased) view is that developers working with REST APIs are/should be aware of what a ?
in a url is.
With updates to the documentation to point out that leading delimiters are included by these properties, do you still believe a new API would lead to less confusion?
An additional API represents another thing a developer has to consider, where a workaround such as .TrimStart('?')
is both short and very clear in its intent.
Thanks for your thoughtful reply @MihaZupan
Great point about Uri.Fragment
. This is also incorrect according to RFC 3986
, and I think it should also be marked [Obsolete]
in favor of Uri.FragmentRfc3986
. I've updated the issue above. I do think that this new API would lead to less confusion, since it informs the developer to carefully consider the implications of the .NET Uri
implementation. Visual Studio will also add a green underline if they select the legacy Query
or Fragment
. This will make it less likely that a developer will make a Type II error (false negative) that results in a defect later in the SDLC. Later might be after a unit test runs, or it could be in production. It's cheapest to catch these defects in the IDE the first time the line of code is written.
I think .NET should strive to follow industry standards such as RFC 3986
, especially in this case where all other major languages are compliant with it. I don't find an appeal to tradition persuasive in this case. In fact, I think that .NET's evolution away from its distinct, proprietary, closed-source traditions to its current fully open-source, cross-platform nature, supporting the latest standards, should be celebrated and continued. I also think that .NET should aim to minimize .NET idiosyncrasies.
I agree that a breaking change would be inappropriate. However, I think .NET should be actively discouraging developers from using Uri.Query
and Uri.Fragment
, and thus mark them [Obsolete]
. You said that a lot of people, including yourself, think of the query as including the leading ?
. I think this actually reinforces the underlying point, that the .NET community may have a mental model of query
that is not aligned with either the standard nor the rest of the major programming languages, due to a non-compliant API that many of us have never questioned, and mistakenly assumed is correct. I don't think JavaScript is a valid comparison, since search
and hash
are different terms with different meanings. The main term I am taking issue with is query
, which has a specific meaning as defined in RFC 3986
.
I agree with updating the documentation and summary tag as well.
It's not just the Azure documentation that is inconsistent. For example, here is an Azure API that returns a SAS token
with no question mark (correct). But, if you use the Azure Portal to generate a SAS token
, they do include a question mark (incorrect). I think the Azure example is important and is in Microsoft's interest to resolve - but it's not central to this issue - more a symptom of the inconsistencies this issue may be causing.
@MihaZupan can I edit this issue to make it an API proposal, or should I create a new issue for it and reference this discussion? Even if the team considers it low priority, I'd still like it to be in the backlog, for if/when the decision is made to make .NET Uri
compliant with RFC 3986
. Thank you!
Obsoletion is a big hammer that should be reserved for cases where the API is irreparably broken, or where it's believed that the API is virtually unused and a better alternative exists.
The difference of an extra ?
does not meet that bar in my opinion. The majority of existing uses are also perfectly fine with, or rely on ?
being present.
I would consider the proposed alternative to be even more error-prone. A nice benefit of existing behavior is that in most cases you don't have to deal with separators manually. If there is no query, you'll get back an empty string. Same for fragment.
Consider an existing use like
$"{uri.AbsolutePath}{uri.Query}{uri.Fragment}"
If the user is hit with the obsoletion warning and updates the code to use the "good" properties, they may end up with
$"{uri.AbsolutePath}{uri.QueryRfc3986}{uri.FragmentRfc3986}"
// or
$"{uri.AbsolutePath}?{uri.QueryRfc3986}#{uri.FragmentRfc3986}"
Both of which are functionally wrong.
The "correct" alternative for those users is substantially worse.
$"{uri.AbsolutePath}/{(uri.QueryRfc3986 == "" ? "" : $"?{uri.QueryRfc3986}")}{(uri.FragmentRfc3986 == "" ? "" : $"#{uri.FragmentRfc3986}")}"
I don't see aligning with some other languages to be of sufficient value to force those users into rewriting their code, ending up with a non-trivial amount of extra, or more error-prone code.
Potential misuse of the existing API in practice should be caught quickly on first use/test.
Given how trivial workarounds are, I see little value in adding an additional API here. We could consider it if it turns out that this is a common thing for people to run into though.
can I edit this issue to make it an API proposal
Feel free to update the original post. We can keep the issue open to see if more people are running into this.
Including Rfc3986
in the name is problematic IMO as it does not convey how it's different from the existing property. Both use the same encoding, following the same standard, the only difference is the leading delimiter for non-empty values.
What would happen with APIs on UriBuilder
? It also exposes Query
and Fragment
properties with setters.
What about ASP.NET Core's QueryString
type and all the APIs that work with it?
Making any of these inconsistent with each other is a net negative for the platform.
I don't find an appeal to tradition persuasive in this case.
If you're referring to this part of my comment As you point out, Uri.Query has behaved this way for a very long time, and yet I don't see other issues in this repo questioning the behavior
, the point I was trying to make is that this doesn't appear to be such a widespread issue that is very common for people to run into.
Agreed that making changes to this would have a ripple effect across many .NET classes and libraries and require examining all of them in detail.
Regarding this: $"{uri.AbsolutePath}{uri.Query}{uri.Fragment}"
uri.PathAndQueryAndFragment
could be added to the API, since uri.PathAndQuery
already exists.
Or, perhaps ToAbsolutePathQueryFragment()
would be better.
Either one could encapsulate the concatenation logic within the Uri
class itself.
Either way, if obsoletion is too big a hammer, and addition of a new API too onerous or duplicative - perhaps also updating the documentation to indicate that Query and Fragment are non-compliant with RFC 3986 would be helpful. Regardless of how frequently C# developers discover this quirk, or how trivial it may be to work around - I still think it's important to note that the widely-used C# class Uri
doesn't comply with a fundamental industry standard followed by all other major web programming languages.
perhaps also updating the documentation to indicate that Query and Fragment are non-compliant with RFC 3986 would be helpful
Isn't that what https://github.com/dotnet/dotnet-api-docs/pull/8977 is?
Thanks @MihaZupan for your help on this. I just noticed in that doc, that it explicitly says the query information is escaped according to RFC 3986. It mentions "RFC 3986" 15 times:
curl -s https://raw.githubusercontent.com/dotnet/dotnet-api-docs/main/xml/System/Uri.xml | grep -o -i "RFC 3986" | wc -l
To me, this seems even more paradoxical that query
and fragment
are non-compliant with RFC 3986
. Perhaps it would make more sense with a better understanding of the full commit history of this file.
Also, it omits explicitly stating that Query
or Fragment
themselves are non-compliant with RFC 3986
(seems like an error by omission to me). What do you think about making this explicit in the summary tag? https://github.com/dotnet/dotnet-api-docs/pull/8985/files
Also, it omits explicitly stating that Query or Fragment themselves are non-compliant with RFC 3986 (seems like an error by omission to me). What do you think about making this explicit in the summary tag?
They do work according to the RFC, with the exception of also including the leading separator, which the message now calls out.
What is a user supposed to do with the information that 'Query' is non-compliant with RFC 3986.
?
I think it just makes it a lot more confusing. An average user shouldn't have to be aware of what RFC 3986 is, and even if they are, what does "non-compliant" mean? In this case, it's that it includes the separator, which the sentence right before already says.
I think it just makes it a lot more confusing. An average user shouldn't have to be aware of what RFC 3986 is, and even if they are, what does "non-compliant" mean? In this case, it's that it includes the separator, which the sentence right before already says.
We can agree to disagree :) I don't feel strongly about what the language is, or whether the user is told this information via an obsoletion message or Quick Info message, but the point would be to make the user aware what the true definition of query
or fragment
is, so they don't mistakenly misuse it and cause a bug, or documentation bug, or perpetuate the inaccurate definition of these terms.
If you feel strongly about this, feel free to add a note along the lines of
Note that the property includes the leading delimiter (`?`), whereas the Uri specification (RFC 3986) recognizes the query as the portion of a Uri without the delimiter.
under the Remarks
section in the documentation.
Description
Currently, the
Query
property of aUri
starts with a question mark?
:This is incorrect according to RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax. The question mark is a
delimiter
, andquery
does not include the question mark. .NET is the outlier for how it handlesUri.Query
(see below), and it's been 18 years since theRFC 3986
standard. This inconsistent usage of the question mark is likely a causative factor for the common inconsistent usage of the question mark in Azure SAS tokens.Uri.Fragment
has this same issue too (it also incorrectly starts with a delimiter#
).HTTP/1.1 [RFC 3986]
1.2.3. Hierarchical Identifiers The generic syntax uses the slash ("/"), question mark ("?"), and number sign ("#") characters to delimit components that are significant to the generic parser's hierarchical interpretation of an identifier.
3. Syntax Components
Notice that in Java, Go, Python, and PHP, the canonical URI implementations do NOT include the
?
in the query:Java
Result:
name=ferret
Go
Result:
name=ferret
Python
Result:
name=ferret
PHP
Result:
name=ferret
Reproduction Steps
Expected behavior
It's expected that the question mark
?
is not included in the string returned byuri.Query
, so we should expect:However, a breaking change is not the correct solution to this (please see the Proposed Solution section below).
Actual behavior
Currently, the question mark is included in uri.Query:
Regression?
No,
Uri.Query
in .NET Framework 1.1 (February 2002) pre-dates RFC 3986 (January 2005). Although it's been a considerable amount of time, .NET is the outlier for how it handlesUri.Query
, and it's been 18 years since theRFC 3986
standard. This inconsistent usage of the question mark is likely a causative factor for the common inconsistent usage of the question mark in Azure SAS tokens.Known Workarounds
Configuration
No response
Other information
The root cause is here: https://github.com/dotnet/runtime/blob/9aefa9daa141bb7d9ba3f2b373d4b050c9b243fe/src/libraries/System.Private.Uri/src/System/Uri.cs#L1091
| UriComponents.KeepDelimiter
causes the delimiter to be included in bothUri.Query
andUri.Fragment
.Proposed Solution
Uri.QueryRfc3986
andUri.FragmentRfc3986
that do not start with delimitersUri.Query
with the attribute:[Obsolete(message: "This starts with the question mark delimiter ('?'), which is non-compliant with RFC 3986. Use QueryRfc3986 instead.", error: false)]
Uri.Fragment
with the attribute:[Obsolete(message: "This starts with the hash delimiter ('#'), which is non-compliant with RFC 3986. Use FragmentRfc3986 instead.", error: false)]
The result would then look like this: