Open wouterbeek opened 7 years ago
For query components you are probably right. For path components there is a problem that a relative uri can be mistaken for a fully qualified uri. That is what Samer discovered and has caused the current behaviour (older versions did not escape :
). Some git blame
and search on the mailinglist will probably find the discussion. This seems consistent with JavaScript encodeURIComponent()
, which also escapes :
.
I guess you want a canonical, minimally escaped URI? That is a different task that could be implemented in uri_normalized/2 (which now escapes :
as it shares the code). Note that using
a :
in a segment is allowed, but complicates the translation of an absolute URI into a relative one.
I surely wouldn't call this a bug ...
The use of an unescaped colon is actually not ambiguous. RFC 3986 took this into account:
A path segment that contains a colon character (e.g., "this:that") cannot be used as the first segment of a relative-path reference, as it would be mistaken for a scheme name. Such a segment must be preceded by a dot-segment (e.g., "./this:that") to make a relative-path reference.
(I did not know this last year, otherwise I would gave given this pointer earlier.)
Interesting. This probably does require a different set of URI encoding primitives than that what is current practice though. Notably we not only need something to encode, but also something to create a relative URI. But, who is going to call that where?
The URI library currently encodes colon in the path and in the query component.
Colons in query components
In Semantic Web services it is very common to include IRIs in the query component, e.g., to indicate a selection or query.
uri_query_components/2
encodes colons in the query component, even though this is not necessary. In the following example,%3A
should simply be:
. The#
is legitimately encoded as%23
, because it would otherwise be confused with the fragment component separator.Colons in path components
Colons are not very common in IRIs, but some datasets (e.g., DBpedia) do use them.
iri_normalized/2
unnecessarily encodes colons in paths, e.g., translating [1] to [2].Reference