curl / trurl

trurl is a command line tool for URL parsing and manipulation.
https://curl.se/trurl/
Other
3.15k stars 102 forks source link

--json/--get: .[].parts and {component} should be urldecoded not encoded; add --urlencode #217

Closed emanuele6 closed 1 year ago

emanuele6 commented 1 year ago

I accidentally implemented {:component} backwards in https://github.com/curl/trurl/commit/5bc344a704dad9413b9ff2406c163fb471f8e7be

Oops :)

This also caused URL parts in the json output to be urlencoded instead of urldecoded, since code is shared.


One of the reason we split URL parts into a separate object was to allow users to apply transformations on a part of the URL using a jq command that maps the parts object to a trurl command that constructs the new modified URL, e.g. like so:

trurl --json --url "$url" |
jq -r '
  .[].parts |
  (.path | values) |= ascii_upcase |# uppercase the path if it present
  @sh "trurl \([ to_entries[] | "-s", "\(.key)=\(.value)"])"
' |
sh

But unfortunately, since the parts are urldecoded, this can't work.

libcurl, intead of returning a URL encoded part even if CURLU_URLDECODE is used like it does for CURLUPART_URL, when you request

  curl_url_get(uh, CURLUPART_QUERY, &query, CURLU_URLDECODE)

returns the URL decoded version of the entire query string which is pretty much useless.

?a%26b&a&b becomes ?a&b&a&b which is completely different, so that jq pipeline will run a command that won't reparse the query correctly. And there is not much it can do about that.

To fix this inconvenience, I suggest to add a --urlencode option that works like --punycode and --default-port, and makes {path} behave like {:path} by default (and consequently causes URL parts in the JSON output to be urlencoded).

Now users should be able to just use --urlencode --json instead of just --json, and "-s", "\(.key):=\(.value)]" instead of "-s", "\(.key)=\(.value)]" or "-s", "\(.key):=\(.value | @uri)]", and that pipeline should work fine.

I also added tests to verify that the pipeline actually works.

And I also mentioned in the manual that the host will be punycoded in the json output if --punycode is used, which was previously not documented.

emanuele6 commented 1 year ago

I forgot to make {query:foo} and {query-all:bar} respect --urlencode last time. I fixed that, and added tests for {:query:foo} and --urlencode -g {query:foo}.