dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.24k stars 1.57k forks source link

Uri.isAbsolute depends on the scheme and the fragment #14657

Open whesse opened 11 years ago

whesse commented 11 years ago

A Uri returns true for isAbsolute only if it has a non-empty scheme and it has no fragment identifier.

I think this is a misreading of the spec. There is a grammar for "Absolute URI" that has no fragment identifier, but the grammar for "Relative URI" also has no fragment identifier.

I think isAbsolute should just return true if there is a scheme, and false if there isn't.

The spec calls URIs with a scheme and no fragment "Absolute URIs", URIs with a scheme and a fragment "Absolute URI references", URIs with no scheme and no fragment "Relative URIs", and URIs with no scheme and a fragment "Relative URI references". So strictly speaking, if it has a fragment, it isn't a URI. But we include fragments in our URI class, and it really has nothing to do with whether it is relative or absolute. Here is the section, from rfc 2396: (quotes from rfc 3986 are further down)

 URI References

   The term "URI-reference" is used here to denote the common usage of a    resource identifier. A URI reference may be absolute or relative,    and may have additional information attached in the form of a    fragment identifier. However, "the URI" that results from such a    reference includes only the absolute URI after the fragment    identifier (if any) is removed and after any relative URI is resolved    to its absolute form. Although it is possible to limit the    discussion of URI syntax and semantics to that of the absolute    result, most usage of URI is within general URI references, and it is    impossible to obtain the URI from such a reference without also    parsing the fragment and resolving the relative form.

      URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]

   The syntax for relative URI is a shortened form of that for absolute    URI, where some prefix of the URI is missing and certain path    components ("." and "..") have a special meaning when, and only when,    interpreting a relative path. The relative URI syntax is defined in    Section 5.

4.1. Fragment Identifier

   When a URI reference is used to perform a retrieval action on the    identified resource, the optional fragment identifier, separated from    the URI by a crosshatch ("#") character, consists of additional    reference information to be interpreted by the user agent after the    retrieval action has been successfully completed. As such, it is not    part of a URI, but is often used in conjunction with a URI.

In rfc 3986, the term URI actually means absolute URI, with a scheme and a possible fragment, and relative URIs are called "relative references" only:

URI-reference is used to denote the most common usage of a resource    identifier.

      URI-reference = URI / relative-ref

   A URI-reference is either a URI or a relative reference. If the    URI-reference's prefix does not match the syntax of a scheme followed    by its colon separator, then the URI-reference is a relative    reference.

The section 4.1 labeled absolute URI actually defines a grammar for the URI without the fragment, and makes the point that schemes should define a grammar matching this, and not include the fragment as part of the scheme-specific grammar for the URI, because the fragment at the end is not-scheme-specific. But example URIs for that scheme should include cases with a fragment.

Another example, where they separated the issue of absolute form and fragment identifiers, is in "If the    base URI is obtained from a URI reference, then that reference must    be converted to absolute form and stripped of any fragment component    prior to its use as a base URI."

So conversion to absolute form does not include the stripping of the fragment - that must be added.

peter-ahe-google commented 11 years ago

I think Bill is right.

However, I'll point out that it is a very bad idea to use RFC 2396 to make this point. RFC 2396 is obsoleted by RFC 3986. In other words, RFC 2396 is obsolete and has be replaced by RFC 3986.

sgjesse commented 11 years ago

We should go ahead and make this change if we are 100% sure that RFC 3986 only mandates a scheme for a absolute URI. The documentation for isAbsolute should describe exactly what is means.

peter-ahe-google commented 11 years ago

Can we remove isAbsolute? I'm not sure it is useful.

whesse commented 11 years ago

Yes, I think we should remove it. It is just scheme == ''.

And isAbsolute is misleading, because it is unclear for what it should return for the relative URI (called a "relative reference" in the RFC) "/dir/file". Currently, it returns false.

I'm curious about what we think it should return for a "relative reference" starting with a slash - which is a legitimate relative reference that has an "absolute" path, but no scheme.

The resolution of it replaces the base path completely, but not the target scheme. The RFC says:       if (R.path starts-with "/") then                   T.path = remove_dot_segments(R.path);                else                   T.path = merge(Base.path, R.path);                   T.path = remove_dot_segments(T.path);

anders-sandholm commented 10 years ago

Added Area-Library label.

sgjesse commented 10 years ago

We will add isRelative and update the documentation to refer to the spec.

The problem is still that the isAbsolute checking for fragment is not necessarily what users expect.


Set owner to @sgjesse. Added this to the 1.5 milestone. Removed Type-Defect label. Added Type-Enhancement label.

kevmoo commented 10 years ago

I dug into this w/ https://code.google.com/p/dart/issues/detail?id=18053

Per RFC, URIs with fragments http://tools.ietf.org/html/rfc3986#section-4.3 are generally not considered absolute.

I believe this IS the case for http, which likely the vast majority use case of Uri.

I'd love to see isRelative added to Uri as an alternative.

We should not change the semantics of isAbsolute before v2.0.

whesse commented 10 years ago

Issue #18053 has been merged into this issue.

lrhn commented 10 years ago

Issue #18053 has been merged into this issue.

whesse commented 10 years ago

I think the best solution is to add isRelativePath and isRelativeRef. In the long run, it would be nice to rename isAbsolute to isAbsoluteUri, which is the name Haskell uses for the same thing. In any case, we should document isAbsolute to only accept "absolute URIs", without a fragment.

Relative path and relative reference are unambiguously defined in section 4.2 of RFC 3986, and are exactly what most people want. Relative reference is something that doesn't begin with a scheme, and relative path is something that doesn't begin with a scheme or a slash. Both of these can have fragment identifiers (and query strings).

We could file a new bug for adding these, or just consider this bug resolved if they are added.

Looking at other languages, they all do different things. Some of their "absolute" queries allow a fragment, others don't. I think Haskell may be the closest to the spec - it uses isAbsoluteURI for the semantics of our current isAbsolute.

I would like to see isAbsolute renamed to isAbsoluteURI in 2.0, but I agree that the best solution now would be to add an isRelative or isRelativeRef, which is

Test if string contains a valid URI (an absolute URI with optional fragment identifier).

isURIReference :: String -> BoolSource

Test if string contains a valid URI reference (an absolute or relative URI with optional fragment identifier).

isRelativeReference :: String -> BoolSource

Test if string contains a valid relative URI (a relative URI with optional fragment identifier).

isAbsoluteURI :: String -> BoolSource

Test if string contains a valid absolute URI (an absolute URI without a fragment identifier).

whesse commented 10 years ago

Issue #18053 has been merged into this issue.

kevmoo commented 10 years ago

So we either retitle this issue or close as WontFix and open up an enhancement request for the changes.

kasperl commented 10 years ago

Removed this from the 1.5 milestone. Added this to the 1.6 milestone.

kasperl commented 10 years ago

Removed this from the 1.6 milestone. Added Oldschool-Milestone-1.6 label.

kasperl commented 10 years ago

Removed Oldschool-Milestone-1.6 label.