dhasenan / urld

URL handling for the D programming language
MIT License
26 stars 10 forks source link

failed to parse URL https://www.blogfree.net/?l=4&wiki=Allgemeine_Gesch%E4ftsbedingungen #17

Open FraMecca opened 5 years ago

FraMecca commented 5 years ago

The url in question is: https://www.blogfree.net/?l=4&wiki=Allgemeine_Gesch%E4ftsbedingungen"

import url;

void main()
{
    auto url = "https://www.blogfree.net/?l=4&wiki=Allgemeine_Gesch%E4ftsbedingungen"
            .parseURL;
}
(2) user (7) /t/ur> dub
Performing "debug" build using /usr/bin/dmd for x86_64.
urld 2.1.1: target for configuration "library" is up to date.
ur ~master: building configuration "application"...
Linking...
To force a rebuild of up-to-date targets, run again with --force.
Running ./ur 
url.URLException@/home/user/.dub/packages/urld-2.1.1/urld/source/url.d(31): failed to parse URL https://www.blogfree.net/?l=4&wiki=Allgemeine_Gesch%E4ftsbedingungen
----------------
/home/user/.dub/packages/urld-2.1.1/urld/source/url.d:1146 pure @safe url.URL url.parseURL(immutable(char)[]) [0xd4f267d0]
/tmp/ur/source/app.d:7 _Dmain [0xd4f26517]
Program exited with code 1
dhasenan commented 5 years ago

Thanks for the report! This is a somewhat awkward problem to work around.

%E4f is not a valid UTF-8 sequence, so percent-decoding it would result in an invalid string that would blow up later.

Right now, urld auto-encodes query parameters, so not decoding that sequence would fail to round-trip correctly — it would turn into '%25E4'.

What urld needs to do is keep around both the encoded and unencoded version of each string.

FraMecca commented 5 years ago

Why do you need to store query parameters as encoded sequence?