golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.02k stars 17.68k forks source link

net/http/cookiejar: figure out canonicalization resposibility #7125

Open bradfitz opened 10 years ago

bradfitz commented 10 years ago
Nigel's comments on https://golang.org/cl/47560044/ :

"""
OK, but this means that e.g., in the
code.google.com/p/go.net/publicsuffix implementation, it's perfectly
valid thatPublicSuffix("example.com.au") is "com.au" but
PublicSuffix("EXAMPLE.COM.AU") is "AU".

What happens if, somewhere out there in the wild internet, a anchor's
href contains a domain in ALL.CAPS? A naive HTTP client implementation
might end up associating any returned cookies with the wrong public
suffix, unless somewhere along the line the domains are canonicalized to
lower case. Is this a cookie security problem, and a particularly
insidious one since it 'works' 99.999% of the time?

This particular CL says that the PublicSuffixList is not responsible for
canonicalization (lower-casing, punycode), but the bigger question of
who *is* responsible is left unanswered. Having different parts of a
program each do their own ad hoc canonicalization can be precarious,
e.g. if some separate part of the program then lower-cases the "AU" to
"au". It's not exactly related to cookie jars, but I am reminded of
http://labs.spotify.com/2013/06/18/creative-usernames/

Personally, I would not have removed the TODO without first building
consensus (presumably as a new discussion on golang-dev) about how to
approach domain canonicalization and cookies so that it's always right,
not just almost always right.
"""
vdobler commented 10 years ago

Comment 1:

The PublicSuffix interface of net/http is used by
net/http/cookiejar to determine the public suffix of a
domain name. RFC 6265 governs cookie handling and
describes how domain names should be canonicalized
(section 5.1.2) and how such names are matched (section
5.1.3). It is thus just natural to require http.PublicSuffix
to work properly with this format of domain names.
CL 47560044 requires that any type implementing
http.PublicSuffix has to work properly with this format.
I assume this is consensus? 
Restricting cookie setting is tricky and a naive compare
if cookieDomain == PublicSuffix(requestDomain) { reject }
is doomed to fail. Admittedly some of these failures
can be avoided if PublicSuffix behaves clever like:
  PublicSuffix("example.co.uk") == "co.uk"
  PublicSuffix("example.co.uk.") == "co.uk."  // FQDN
  PublicSuffix("eXaMple.Co.uK") == "Co.uK"
  PublicSuffix("www.shishi.xn--fiqs8s") == "shishi.xn--fiqs8s"  // 中国 punycoded 
  PublicSuffix("www.shishi.中国") == "shishi.中国"
I would argue that our implementation of PublicSuffix in
go.net/publicsuffix _should_ behave in the above clever way.
Maybe for security reasons and for sure as a convenience
to the users. But I do not see why this fancy behaviour should
be contracted in the standard library where http.PublicSuffix
lives. Especially as this cleverness/convenience does not
safeguard against attacks on a really broken jar or a too naive
HTTP client implementation. 
Nigel's main objection was "...how to approach domain
canonicalization and cookies so that it's always right, not
just almost always right."
Domain canonicalization and cookies is done in the interplay
of http, normally a cookie jar and a public suffix list.
All three must agree on how to interact. Currently package
http exposes the raw (un-canonicalized) domain names in URLs
and in cookies. Package cookiejar handles canonicalization
(but doesn't expose these canonical domain names). Cookiejar
relies on http.PublicSuffix accepting lowercase, punycoded,
dotless domain names and return values in the same format.
Beside strange name resolution on Linux cookiejar works
properly. I believe this to be a sensible approach which is
"always right". And I admit I do not see which approach to
domain canonicalization would give this "always right", other
than just doing it properly: No behaviour of http.PublicSuffix
alone will totally prevent mishandling of cookies.
I do not see how a consensus on domain canonicalization
will provide us with a solution that "always works", even
in conjunction with a naive (broken or deliberately wrong)
HTTP client implementation.
The public suffix (or more often the eTLD+1 based
on this suffix) is used for stuff like sorting domains,
search of domains or highlight relevant parts of a domain.
Should the contract for http.PublicSuffix cover such uses?
Sorting becomes easier if the returned public suffix is
canonicalized (any form) but searching and highlighting
seem easier if PublicSuffix exhibits the "clever" behavior.
And if there is no obvious right format for the returned
public suffix I don't think that any additional behaviour
of http.PublicSuffix should be contracted.
My understanding of an interface contract was that each
type implementing that interface must work like described
on the documented input but may allow additional input values.
As PublicSuffix is an interface in the standard library I
think it should be pretty minimal so that it can be implemented
easily. Minimal not only in the number of methods but also
in the range of allowed input values.
Maybe I got this totally wrong, in which case please correct me.
I thus think that it is sensible to do the following:
- Require that http.PublicSuffix works on lowercase,
  punycoded, trailing-dot-less domain names and...
- ...returns the public suffix in the same format.
- Document that it is the callers responsibility to
  provide the domain in the required format.
- Augment the implementation in go.net/publicsuffix
  to allow additional domain forms and have the
  return value in the format of the input.
The first two points are CL 47560044, the third point
was missing. The fourth point would be a different CL.
With this it would be:
- Simple for a user to provide his own conforming implementation
  of http.PublicSuffix (as the range of required input values is
  limited) which can be used properly with net/http/cookiejar.
- Users of go.net/publicsuffix benefit from its clever behaviour:
  If you want to sort on lowercase IDN just call publicsuffix.List's
  PublicSuffix with with lowercased IDNs and if you want
  to highlight the eTLD+1 as entered in the address bar just
  pass in the domain to publicsuffix.EffectiveTLDPlusOne.
- And a "naive HTTP client implementation" which does
  non-RFC-6265-style cookie domain handling will benefit
  from our clever go.net/publicsuffix list which mitigates
  some security issues.
If someone insists on providing his own jar and his own
public suffix list and his jar and his public suffix list do
not work properly together he might get issues with cookies
being sent to wrong domains. But as explained above: I do
not see how to help him in this scenario.
rsc commented 10 years ago

Comment 2:

Nigel?
nigeltao commented 10 years ago

Comment 3:

As the https://golang.org/cl/47560044/ discussion indicates, there isn't a
simple answer, and it's not a matter of programming, it's a matter of design.
It's been like this since Go 1.1, so I've punted this to 1.4.

Labels changed: added release-go1.4, removed release-go1.3maybe.

rsc commented 10 years ago

Comment 4:

Giving up on a release target.

Labels changed: added release-none, removed release-go1.4.