JuliaWeb / URIParser.jl

Uniform Resource Identifier (URI) parser in Julia
Other
17 stars 29 forks source link

Support punycode #9

Open porterjamesj opened 10 years ago

porterjamesj commented 10 years ago

I'm not sure about the details, but it might be nice to allow URIs of non-ASCII text (i.e. use String rather than ASCIIString in the type definition). The RFC is a bit vague on this. Perhaps it isn't technically allowed but it seems possible to encounter in the wild (e.g. ☃.net will resolve in a browser). On a practical level it's somewhat annoying to not be able to pass UTF8Strings or SubStrings thereof to methods in Requests.jl.

Keno commented 10 years ago

The browser does the appropriate mangling of the URI which we probably would have to implement.

porterjamesj commented 10 years ago

For now we could just add a URI constructor that converts String inputs to ASCIIString? Won't actually handle unicode but it will at least allow one to pass things that are typed UTF8 but will conform to the ASCII character set to Requests.get, etc.

Keno commented 10 years ago

That sounds reasonable.

porterjamesj commented 10 years ago

I'll make a PR.

tanmaykm commented 10 years ago

I think the hostname part needs to be encoded in punycode (http://www.faqs.org/rfcs/rfc3492.html) and the path percent-escaped UTF8.

malmaud commented 8 years ago

punycode is fairly complex - see an example implementation. Is anyone up for having a go at it?

randyzwitch commented 8 years ago

Roughly two years old at this point...is this still a desirable feature/anyone going to claim this?

samoconnor commented 6 years ago

another ~2 years later...

The URI part of this seems to currently not-fail in HTTP.jl:

julia> x = HTTP.URI("http://☃.net")
HTTP.URI("http://☃.net")

julia> HTTP.URIs.showparts(x)
HTTP.URI("http://☃.net"
    scheme = "http",
    userinfo = "" (absent),
    host = "☃.net",
    port = "" (absent),
    path = "",
    query = "" (absent),
    fragment = "" (absent))

getaddrinfo still fails

julia> HTTP.get("http://☃.net")
ERROR: non-ASCII hostname: ☃.net
Stacktrace:
 [1] getaddrinfo(::Function, ::String) at ./socket.jl:619

And there is this: https://github.com/apricis/Punycoder.jl/blob/master/punycoder.jl