Open bradfitz opened 10 years ago
The PublicSuffix interface of net/http is used by net/http/cookiejar to determine the public suffix of a domain name. RFC 6265 governs cookie handling and describes how domain names should be canonicalized (section 5.1.2) and how such names are matched (section 5.1.3). It is thus just natural to require http.PublicSuffix to work properly with this format of domain names. CL 47560044 requires that any type implementing http.PublicSuffix has to work properly with this format. I assume this is consensus? Restricting cookie setting is tricky and a naive compare if cookieDomain == PublicSuffix(requestDomain) { reject } is doomed to fail. Admittedly some of these failures can be avoided if PublicSuffix behaves clever like: PublicSuffix("example.co.uk") == "co.uk" PublicSuffix("example.co.uk.") == "co.uk." // FQDN PublicSuffix("eXaMple.Co.uK") == "Co.uK" PublicSuffix("www.shishi.xn--fiqs8s") == "shishi.xn--fiqs8s" // 中国 punycoded PublicSuffix("www.shishi.中国") == "shishi.中国" I would argue that our implementation of PublicSuffix in go.net/publicsuffix _should_ behave in the above clever way. Maybe for security reasons and for sure as a convenience to the users. But I do not see why this fancy behaviour should be contracted in the standard library where http.PublicSuffix lives. Especially as this cleverness/convenience does not safeguard against attacks on a really broken jar or a too naive HTTP client implementation. Nigel's main objection was "...how to approach domain canonicalization and cookies so that it's always right, not just almost always right." Domain canonicalization and cookies is done in the interplay of http, normally a cookie jar and a public suffix list. All three must agree on how to interact. Currently package http exposes the raw (un-canonicalized) domain names in URLs and in cookies. Package cookiejar handles canonicalization (but doesn't expose these canonical domain names). Cookiejar relies on http.PublicSuffix accepting lowercase, punycoded, dotless domain names and return values in the same format. Beside strange name resolution on Linux cookiejar works properly. I believe this to be a sensible approach which is "always right". And I admit I do not see which approach to domain canonicalization would give this "always right", other than just doing it properly: No behaviour of http.PublicSuffix alone will totally prevent mishandling of cookies. I do not see how a consensus on domain canonicalization will provide us with a solution that "always works", even in conjunction with a naive (broken or deliberately wrong) HTTP client implementation. The public suffix (or more often the eTLD+1 based on this suffix) is used for stuff like sorting domains, search of domains or highlight relevant parts of a domain. Should the contract for http.PublicSuffix cover such uses? Sorting becomes easier if the returned public suffix is canonicalized (any form) but searching and highlighting seem easier if PublicSuffix exhibits the "clever" behavior. And if there is no obvious right format for the returned public suffix I don't think that any additional behaviour of http.PublicSuffix should be contracted. My understanding of an interface contract was that each type implementing that interface must work like described on the documented input but may allow additional input values. As PublicSuffix is an interface in the standard library I think it should be pretty minimal so that it can be implemented easily. Minimal not only in the number of methods but also in the range of allowed input values. Maybe I got this totally wrong, in which case please correct me. I thus think that it is sensible to do the following: - Require that http.PublicSuffix works on lowercase, punycoded, trailing-dot-less domain names and... - ...returns the public suffix in the same format. - Document that it is the callers responsibility to provide the domain in the required format. - Augment the implementation in go.net/publicsuffix to allow additional domain forms and have the return value in the format of the input. The first two points are CL 47560044, the third point was missing. The fourth point would be a different CL. With this it would be: - Simple for a user to provide his own conforming implementation of http.PublicSuffix (as the range of required input values is limited) which can be used properly with net/http/cookiejar. - Users of go.net/publicsuffix benefit from its clever behaviour: If you want to sort on lowercase IDN just call publicsuffix.List's PublicSuffix with with lowercased IDNs and if you want to highlight the eTLD+1 as entered in the address bar just pass in the domain to publicsuffix.EffectiveTLDPlusOne. - And a "naive HTTP client implementation" which does non-RFC-6265-style cookie domain handling will benefit from our clever go.net/publicsuffix list which mitigates some security issues. If someone insists on providing his own jar and his own public suffix list and his jar and his public suffix list do not work properly together he might get issues with cookies being sent to wrong domains. But as explained above: I do not see how to help him in this scenario.
As the https://golang.org/cl/47560044/ discussion indicates, there isn't a simple answer, and it's not a matter of programming, it's a matter of design. It's been like this since Go 1.1, so I've punted this to 1.4.
Labels changed: added release-go1.4, removed release-go1.3maybe.