curl / trurl

trurl is a command line tool for URL parsing and manipulation.
https://curl.se/trurl/
Other
3.1k stars 99 forks source link

Components with control characters don't appear in `--json` output, and non-urlencoded `--get` fails #262

Open emanuele6 opened 8 months ago

emanuele6 commented 8 months ago
$ ./trurl 'http://example.org/%18' --json | jq -c .
[{"url":"http://example.org/%18","parts":{"scheme":"http","host":"example.org"}}]
$ ./trurl 'http://example.org/%18' --urlencode --json | jq -c .
[{"url":"http://example.org/%18","parts":{"scheme":"http","host":"example.org","path":"/%18"}}]
$ ./trurl 'http://example.org/%18' -g {path}
trurl note: URL decode error, most likely because of rubbish in the input (path)

$ ./trurl 'http://example.org/%18' -g {:path}
/%18
jacobmealey commented 8 months ago

Something interesting I noticed is that is works for queries. I wonder if we're missing a memdupdec somewhere?

I'd bet I broke this in this PR https://github.com/curl/trurl/pull/214 but maybe it's been broken the whole time.

jacobmealey commented 7 months ago

This looks like it's behavior from libcurl. I was able to get the same result with the following code. Should we open a ticked over there or are we just overlooking something simple?

#include <curl/curl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {
    CURL *curl;
    CURLU *url;
    CURLUcode uc;
    // do it how trurl does it
    char *array= calloc(32, sizeof(char));
    const char *url_string = "http://example.org/%18";
    curl = curl_easy_init();
    url = curl_url();
    uc = curl_url_set(url, CURLUPART_URL, url_string, 0);

    uc = curl_url_get(url, CURLUPART_PATH, &array, CURLU_URLDECODE);
    if(uc) {
        printf("%s\n", curl_url_strerror(uc));
    } else {
        printf("%s\n", array);
    }
    // try with curl easy unescape 
    int decode_len;
    char *decoded = curl_easy_unescape(curl, url_string, strlen(url_string), &decode_len);
    printf("%s\n", decoded);
    printf("length: %ld, amount decoded: %d\n", strlen(url_string), decode_len);
    curl_url_cleanup(url);
    curl_easy_cleanup(curl);
    free(array);
    return 0;
}
jacobmealey commented 7 months ago

Ahh it could also be that %18 maps to the ASCII character CAN (cancel), I'd bet curl doesn't play nice with decoding most control characters in the path. If you do it with %21 (either trurl or the example above you get the following:

$ trurl http://example.org/%21 --get "{path}"    
/! 

After some more testing I think you are just supposed to pass --urlencode for this scenario. We could do something to try and hint at this to the user?

$ trurl http://example.org/%18 --get "{path}"     
trurl note: URL decode error, most likely because of rubbish in the input (path)
                  try again with --urlencode