golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.03k stars 17.67k forks source link

x/net/publicsuffix: PublicSuffix() is case sensitive #25254

Open OrBaruk opened 6 years ago

OrBaruk commented 6 years ago

What version of Go are you using (go version)?

go version go1.10.1 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOOS="linux"

What did you do?

PublicSuffix() returns incorrect values when the domain contains letters with uppercase

According to RFC4343 I believe PublicSuffix() should be case insensitive.

package main

import (
    "fmt"
    "golang.org/x/net/publicsuffix"
)

func main() {
    suffix, icann := publicsuffix.PublicSuffix("abc.CoM.Br")
    fmt.Printf("suffix: %s\nicann: %v\n", suffix, icann)
}

What did you expect to see?

suffix: com.br
icann: true

What did you see instead?

suffix: Br
icann: false
meirf commented 6 years ago

Test cases from the official PSL repo:

// Mixed case.
checkPublicSuffix('COM', null);
checkPublicSuffix('example.COM', 'example.com');
checkPublicSuffix('WwW.example.COM', 'example.com');

This indicates that PublicSuffix should be case insensitive.

Note these test cases are missing from x/net/publicsuffix. I wonder if they were omitted intentionally. @vdobler @nigeltao

vdobler commented 6 years ago

Yes this is deliberate because the tests would not pass.

EffectiveTLDPlusOne returns the eTLD+1 of its argument in the original input capitalisation, it doesn't lowercase the input.

EffectiveTLDPlus1("b.DOmaiN.BiZ") == "DoMaiN.BiZ"
OrBaruk commented 6 years ago

I think the EffectiveTLDPlusOne should preserve the original capitalisation, but it should not give different results depending on how the input was capitalised for example:

Current:

publicsuffix.EffectiveTLDPlusOne("SuBdOmAiN.aBc.BlOgSpOt.cOm") == "BlOgSpOt.cOm"
publicsuffix.EffectiveTLDPlusOne("subdomain.abc.blogspot.com") == "abc.blogspot.com"

Expected:

publicsuffix.EffectiveTLDPlusOne("SuBdOmAiN.aBc.BlOgSpOt.cOm") == "aBc.BlOgSpOt.cOm"
publicsuffix.EffectiveTLDPlusOne("subdomain.abc.blogspot.com") == "abc.blogspot.com"

These different results was how I originally found the issue.

vdobler commented 6 years ago

@OrBaruk

The unexpected result you see with mixed case inputs is addressed in two TODOs

The current implementation of package publicsuffix requires arguments to PublicSuffix and EffectiveTLDPlus1 to be: Lowercases, free from leading and trailing dots and punycoded. As the main use of package publicsuffix is in package cookiejar and cookiejar calls PublicSuffix only with lowercased, dot-free, punycoded arguments and ensuring this once more in package publicsuffix is wasteful.

But of course this is unintuitive and not documented.

You demonstrated un-intuitive behavior with uppercase domain names. How should EffectiveTLDPlusOne behave on IDNs? How on full qualified domain names (trailing dot)? Should it strip leading dots (still common in the domain attribute of a cookie)=

publicsuffix.EffectiveTLDPlusOne("世界.食狮.中国") = ??
publicsuffix.EffectiveTLDPlusOne("www.foo.com.") = ??
publicsuffix.EffectiveTLDPlusOne(".example.com") = ??

@nigeltao I think a natural expectation would be that package publicsuffix

What do you think?

nigeltao commented 6 years ago

I understand the "works well" expectation, but I think that there's a bigger question of how does Go generally handle IDNs (and upper/mixed case domains)? I think any answer needs to be consistent across net, net/http, net/http/cookiejar, net/url, x/net/publicsuffix, etc, and not just a tactical modification only to x/net/publicsuffix.