curl / trurl

trurl is a command line tool for URL parsing and manipulation.
https://curl.se/trurl/
Other
3.15k stars 102 forks source link

Please support zero-sized fragment and query #226

Closed lu-zero closed 2 weeks ago

lu-zero commented 1 year ago

scheme://host/path/# and scheme://host/path/ are different and so is scheme://host/path/?. It would be nice if --get lets you differentiate them and and --set lets you produce them.

emanuele6 commented 1 year ago

I think this is a limitation of libcurl's urlapi

lu-zero commented 1 year ago

From the manual for curl_url_get:

CURLUPART_QUERY

The initial question mark that denotes the beginning of the query part is a delimiter only. It is not part of the query contents.

A not-present query will lead part to be set to NULL. A zero-length query will lead part to be set to a zero-length string.

The query part will also get pluses converted to space when asked to URL decode on get with the CURLU_URLDECODE bit. 
emanuele6 commented 1 year ago

I was mainly referring to it normalising scheme://host/path/? as scheme://host/path/ even though it can distinguish no query and empty query. But yeah, also note that it can only do that for queries, not fragments.

$ ./foo 'scheme://host/path/'
in:     scheme://host/path/
out:    scheme://host/path/
query:  NULL
frag:   NULL
$ ./foo 'scheme://host/path/?'
in:     scheme://host/path/?
out:    scheme://host/path/
query:
frag:   NULL
$ ./foo 'scheme://host/path/#'
in:     scheme://host/path/#
out:    scheme://host/path/
query:  NULL
frag:   NULL
$ ./foo 'scheme://host/path/?#'
in:     scheme://host/path/?#
out:    scheme://host/path/
query:
frag:   NULL
$ ./foo 'scheme://host/path/?#hello'
in:     scheme://host/path/?#hello
out:    scheme://host/path/#hello
query:
frag:   hello
$ ./foo 'scheme://host/path/?hello#'
in:     scheme://host/path/?hello#
out:    scheme://host/path/?hello
query:  hello
frag:   NULL 

libcurl always normalises empty query/fragment as no query/fragment; and it does not provide a way to distinguish empty fragment from no fragment.


#include <curl/curl.h>

int main(int const argc, char const *const argv[])
{
    if (argc != 2)
        return 1;

    CURLU *const uh = curl_url();
    curl_url_set(uh, CURLUPART_URL, argv[1],
                 CURLU_NON_SUPPORT_SCHEME|
                 CURLU_GUESS_SCHEME|
                 CURLU_URLENCODE);
    char *url;
    curl_url_get(uh, CURLUPART_URL, &url, CURLU_DEFAULT_PORT);
    char *query;
    curl_url_get(uh, CURLUPART_QUERY, &query, CURLU_DEFAULT_PORT);
    char *frag;
    curl_url_get(uh, CURLUPART_FRAGMENT, &frag, CURLU_DEFAULT_PORT);
    printf("in:\t%s\n"
           "out:\t%s\n"
           "query:\t%s\n"
           "frag:\t%s\n",
           argv[1], url, query ? query : "NULL", frag ? frag : "NULL");
    curl_free(url);
    curl_free(query);
    curl_free(frag);
    curl_url_cleanup(uh);
    return 0;
}
lu-zero commented 1 year ago

Yes, I was hoping that it could be reflected in trurl as well. (and curl itself is in the good bucket for that :))

Thank you for providing also the full demo code :)

bagder commented 4 months ago

See https://github.com/curl/curl/pull/13396

bagder commented 4 months ago

With libcurl supporting empty queries and fragments now, how do you think we should enable this in trurl?

lu-zero commented 4 months ago

Probably would be useful to have --unset to clean up query and frag, and make so trurl scheme://host/path --set query="" would return scheme://host/path?.

and have --get {component} return nothing if not present and the empty line if present.

But those would be breaking changes.

bagder commented 4 months ago

But those would be breaking changes.

I think we are free to do breaking changes if we want, at least before an official version one. I think the bigger problem is that they would work differently depending on what the underlying libcurl in use supports...

jacobmealey commented 4 months ago

another (clunky) solution may be a new flag--allow-empty for --get and --set?

or something clever with the the modifiers in the get brackets --get, something like --get "{empty:query} {empty:fragment}" ? Im not sure how this would work with setting empty fields though.

bagder commented 1 month ago

Im not sure how this would work with setting empty fields though.

I figure we might need to do some other syntax extension/change for that. Maybe

Of course, this would only work for query and fragment. Maybe path?

bagder commented 2 weeks ago

There is also the shell problem: how would a script differentiate between a blank query and a non-existing one?

trurl example.com/? -g a{query}a

vs

trurl example.com/ -g a{query}a

What is the expected output for a zero length query vs a non-existing one?

lu-zero commented 2 weeks ago

I guess the latter has to report an error somehow, maybe adding a {fail:component} modifier so both behaviors are supported?

bagder commented 2 weeks ago

336 at least partly satisfies this.

bagder commented 2 weeks ago

@lu-zero does this satisfy your use case or is there anything more you want/need to differentiate empty/missing components for?

lu-zero commented 2 weeks ago

I think it is enough, thank you :)