elm / url

Build and parse URLs. Useful for HTTP and "routing" in single-page apps (SPAs)
BSD 3-Clause "New" or "Revised" License
74 stars 43 forks source link

When should percent decoding occur by default with paths? #42

Open evancz opened 3 years ago

evancz commented 3 years ago


Some issues on this repo are saying that more decoding should happen on path segments by default:

Issue #20 also says that this decoding should not occur in all cases, referencing this issue that points to implementations in many languages that do not call percentDecode in a naive way.

What should be done?

Current Status

As mentioned in #16, it is currently possible to use Url.Parser.custom to create custom path parsers that have whatever behavior you personally think is best. For example, custom "STRING" Url.percentDecode would do very aggressive percent decoding.

So nobody is blocked on this topic. It is a question of defaults, and any change should be considered a breaking change that triggers a MAJOR version bump.


Make a table of scenarios to try to find the ideal defaults for a broad range of people. The default options for path parsing and building are:

  1. no percent decoding
  2. percent decoding for specific characters
  3. percent decoding for all characters

We currently do (1) but maybe it'd be good to make a table to show the various options. Right now I think it would be ideal for someone interested in this topic to:

  1. Build a table of interesting paths and see how they all work under different defaults.
  2. Check the defaults of implementations in other languages.

Hopefully this will reveal more information / some sort of consensus. I think being thorough about this is very important, since any change here could break a lot of peoples' code in ways that might be very hard to detect. Please share your efforts here or on the Elm Discourse.

choonkeat commented 3 years ago

Since an Elm application produces a [ href ] links, and upon clicking it, the Browser.application is also expected to Url.Parser.parse router url, I think the important bit is: the value need to survive "round trip"

Current elm/url does not round trip.

Given String Url.Builder.absolute produces Url.Parser.string returns String Round trip
"hello" "hello" "hello" ✔️
"👍" "👍" "%F0%9F%91%8D"
"искать" "искать" "%D0%B8%D1%81%D0%BA%D0%B0%D1%82%D1%8C"

If we use the auto encode and auto decode versions

Given String Url.Builder.absolute+encode produces Url.Parser.string+decode returns String Round trip
"hello" "hello" "hello" ✔️
"👍" "%F0%9F%91%8D" "👍" ✔️
"искать" "%D0%B8%D1%81%D0%BA%D0%B0%D1%82%D1%8C" "искать" ✔️

We can try it out in this tiny app https://elm-url-test.netlify.app (source)

  1. Fill in the values
  2. Choose the mode
    • NoPercentEncodeDecode uses Url.Builder.absolute and Url.Parser.string as-is
    • PercentEncodeDecodePath uses Url.Builder.absolute + encode and Url.Parser.string + decode
  3. And most importantly, click on the generated link to see how the round trip turns out. Are we able to parse and obtain our original String values back

NOTE: if you gave Set Path a value of !@#$%^&*() and run the following js in browser console,

Array.from(document.getElementsByTagName('a')).forEach(function(a) { console.log(a.href) })

you'll notice that the a[href] wasn't escaped. this is a problem since # is treated literally as url hash


if you did the same but with Set mode: [x] PercentEncodeDecodePath and run the same js, you'll notice that the a[href] is properly escaped


since any change here could break a lot of peoples' code in ways that might be very hard to detect

The function names would best be renamed to avoid this problem

choonkeat commented 3 years ago

using strings as-is, vuejs mostly round trips properly ~except it got thrown off by #~ (fixed by using named route for !@#$%^&*())

<router-link to="/path/foo?query=foo">Go to foo</router-link>
<router-link to="/path/👍?query=👍">Go to 👍</router-link>
<router-link :to="{ name: 'pathRoute', params: { id: '!@#$%^&*()' }}">Go to !@#$%^&*()</router-link>
<router-link to="/path/{}[]<>;/?query=/{}[]<>;/">Go to {}[]<>;/</router-link>
<router-link to="/path/日本?query=日本">Go to 日本</router-link>
<router-link to="/path/خحجث?query=خحجث">Go to خحجث</router-link>

the generated href attribute values are escaped


https://vue-url-test.netlify.app/ (source)

choonkeat commented 3 years ago

using strings as-is, react router mostly round trips properly. # character gave runtime error; solved with manual encodeURI

<li><Link to={{ pathname: "/path/foo", search: "?query=foo" }}>foo</Link></li>
<li><Link to={{ pathname: "/path/👍", search: "?query=👍" }}>👍</Link></li>
<li><Link to={{ pathname: "/path/!@#$%^&*()", search: "?query=!@#$%^&*()" }}>!@#$%^&*()</Link></li>
<li><Link to={{ pathname: "/path/" + encodeURI("!@#$%^&*()"), search: "?query=!@#$%^&*()" }}>!@#$%^&*()</Link> fixed</li>
<li><Link to={{ pathname: "/path/{}[]<>;", search: "?query={}[]<>;" }}>{}[]<>;</Link></li>
<li><Link to={{ pathname: "/path/日本", search: "?query=日本" }}>日本</Link></li>
<li><Link to={{ pathname: "/path/خحجث", search: "?query=خحجث" }}>خحجث</Link></li>


Array.from(document.getElementsByTagName('a')).forEach(function(a) { console.log(a.href) })

running the above js in codesandbox's own console (bottom right), you'll notice most of the a[href] values are escaped, except for the 2 with !@#$%^&*()

choonkeat commented 3 years ago

So autoencoding Url.Builder.absolute and auto decoding Url.Parser.string are both required.

[very sorry this came in as multiple comments. I didn't know how much time I had or needed]