curl / trurl

a command line tool for URL parsing and manipulation.
https://curl.se/trurl/
Other
3.15k stars 102 forks source link

trurl: restructure JSON, add --default-port, --keep-port, --punycode #204

Closed emanuele6 closed 1 year ago

emanuele6 commented 1 year ago

This patch restructures the JSON output with the following format:

  {
    "url": // same value as `{url}`
    "parts": {
      ..., // urldecoded URL parts except the URL
    },
    "params": [
       ...
    ],
    // other stuff
  }

I decided to remove raw_port for now, since now I think it is less useful than it used to be.

This patch also reworks default output and -g. {raw:} has been removed in favour of {default:}. {default:} and {puny:} are "get modifiers" like {:component}, and may be specified in any order front of a component to modify the curl_url_get() flags used by trurl to get that URL.

{default:} makes trurl use CURLU_DEFAULT_PORT, use the libcurl-supported scheme's default port if one is not specified, and {puny:} makes trurl use CURLU_PUNYCODE e.g.

{default:}, {puny:}, and {:part} may be specified at the same time in which case multiple additional flags are passed to curl_url_get().

$ trurl -g '{puny:default:url}' imap://pè.example.org imap://xn--p-8fa.example.org:143/

If --default-port is used, {default:} is implied for every {} in the --get; and, if --puny is implied.

Default output now behaves exactly like -g '{url}'. (note: previously -g '{url}' behaved like the new {default:url}, it always included, )

This allows more fine tuning of -g the output, and default URL output since it was previously not possible to:

By default, trurl removes redundant explict ports for libcurl-supported schemes, you can use --keep-port to inhibit that behaviour.

I fixed a bug that made -g {raw:port} (or -g {port}) never able to print the default port for libcurl-supported scheme even if the port was explictly specified. (this bug was unique to {raw:port}, raw_port in the JSON worked fine.)

I also fixed some inconsistencies in the code, e.g. get() had a output stream variable that was not being propagated to showqkey() that was always outputting to stdout, and failed gets were printing warnings using printf(stderr, PROGNAME ...) instead of warnf().

This is only an initial draft, so I have not updated test expectations and the manual yet.

Ref: #201

emanuele6 commented 1 year ago

Just need to update the man page now

emanuele6 commented 1 year ago

I updated the man page, rebased on master, and reverted f44a586 since it is not necessary anymore

bagder commented 1 year ago

I'm guessing the test now needs a new required ?

emanuele6 commented 1 year ago

I will remove these two tests:

    {
        "input": {
            "arguments": [
                "--accept-space",
                "--redirect",
                "https://example.org/a b",
                "https://curl.se"
            ]
        },
        "expected": {
            "stdout": "https://example.org/a%20b\n",
            "returncode": 0,
            "stderr": ""
        }
    },
    {
        "input": {
            "arguments": [
                "--verify",
                "--redirect",
                "https://example.org/a b",
                "https://curl.se"
            ]
        },
        "expected": {
            "stdout": "",
            "returncode": 9,
            "stderr": true
        }
    },

As previously mentioned, now trurl always parses URLs with CURLU_URLENCODE, and, before 8.1.0 (curl/curl@4cfa5bcc), the URL parser was not strict enough and always allowed spaces with CURLU_URLENCODE, even when CURLU_ACCEPT_SPACE was not used.

I avoided having to delete these tests earlier by using "minruntime": "8.1.0",, but now I can't do that any more.

bagder commented 1 year ago

:+1: