glenscott / url-normalizer

Syntax based normalization of URI's
MIT License
100 stars 16 forks source link

Punycode support #23

Open glenscott opened 8 years ago

glenscott commented 8 years ago

I have had some problems with urls that have been copied from browser bars. When the real url is in punycode the browser bar shows the UTF-8 representation which is then copied causing issues.

I have a suggested change to function mbParseUrl()

After $encodedParts = parse_url($encodedURL); insert

       // Fix for IDN based URLs that have been copied from browser bars in UTF-8 (TRS)
    //  Convert back to punycode
       $temp_host = idn_to_ascii($encodedParts['host']);
       if (!empty($temp_host)) $encodedParts['host'] = $temp_host;