Closed mna closed 7 years ago
My bad, it seems that rfc-5891 ("Internationalized Domain Names in Applications (IDNA): Protocol") obsoletes the "nameprep" rfc-3491 ("Nameprep: A Stringprep Profile for IDN") and states in "Appendix A. Summary of Major Changes from IDNA2003":
Remove the mapping and normalization steps from the protocol and have them, instead, done by the applications themselves, possibly in a local fashion, before invoking the protocol.
So I guess x/net/idna does the right thing and it is up to the caller to normalize or not. Though it means the caller should know whether a domain in non-normalized form is equivalent to one in normalized form, which I have no idea if it is (maybe it is incosistent in the wild, registration for www.\u00e9tat.com
and www.e\u0301tat.com
may or may not be separate domains?).
If anyone knows about that last part, I'd love to know (it would be very helpful for the purell
normalization package that I maintain), but otherwise this is not an issue for the idna package, so I'll close it.
Re-nevermind that last part, rfc-5891 states that:
By the time a string enters the IDNA registration process as described in this specification, it MUST be in Unicode and in Normalization Form C (NFC)
Please answer these questions before submitting your issue. Thanks!
go version
)?go env
)?https://play.golang.org/p/zS-UR4WhIx
When running
idna.ToASCII
, it should perform a normalization of unicode before encoding to punycode (https://en.wikipedia.org/wiki/Internationalized_domain_name, section "ToASCII and ToUnicode": "ToASCII will apply the Nameprep algorithm, which converts the label to lowercase and performs other normalization, and will then translate the result to ASCII using Punycode").The golang.org/x/net/idna does not seem to perform that normalization step, while e.g. the userspace github.com/DanielOaks/go-idn package does.
So running idna.ToASCII on
www.état.com
and onwww.e\u0301tat.com
should (if I understand IDNA correctly) return the same punycode form:www.xn--tat-9la.com
.The userspace package correctly returns
www.xn--tat-9la.com
for both inputs, but x/net/idna returns "www.xn--tat-9la.com" and "www.xn--etat-vvc.com".