Open eminano opened 6 years ago
CC @nigeltao
Hmm...
CC @vdobler @weppos
The current behavior of x/net/publicsuffix
can be considered either correct or incorrect depending on the interpretation (and the library implementation).
@eminano is correct, the suffix doesn't exists in the list.
// za : http://www.zadna.org.za/content/page/domain-information
ac.za
agric.za
alt.za
co.za
However, the recommended PSL algorithm also encourages to use *
as default rule if no match is found. I guess the x/net/publicsuffix
implements this recommendation. As a side effect, when the rule for za
is not found, since gli.za
doesn't match any sub-rule, then *
is used which means gli.za
becomes a domain where za
is the suffix.
Until this point, the behavior is technically correct, according to the current PSL definition. Which is why it is not necessarily a bug.
From here my assumption is that at this point *
is considered an IANA rule by x/net/publicsuffix
and hence the ICANN flag is set to true
. This is nor correct nor incorrect, as the PSL definition doesn't explicitly define how this case should be considered. It's debatable which case is correct.
What I agree with is that this behavior can be surprisingly, and it's a subtle side effect. In my Go implementation of the PSL I leave it possible to disable the default *
parsing:
package main
import (
"fmt"
"github.com/weppos/publicsuffix-go/publicsuffix"
)
func main() {
r := publicsuffix.DefaultList.Find("gli.za", &publicsuffix.FindOptions{DefaultRule: nil})
fmt.Println(r)
}
This code returns a nil
Rule:
➜ gotest go run pls.go
<nil>
which makes it easier to understand the rule doesn't match. At that point, the consumer has full power to decide what to do.
Likewise, by default my library behaves exactly like the x/net/publicsuffix
lib:
package main
import (
"fmt"
wpsl "github.com/weppos/publicsuffix-go/net/publicsuffix"
xpsl "golang.org/x/net/publicsuffix"
)
func main() {
ws, we := wpsl.EffectiveTLDPlusOne("foo.gli.za")
xs, xe := xpsl.EffectiveTLDPlusOne("foo.gli.za")
fmt.Println(ws, we)
fmt.Println(xs, xe)
}
➜ gotest go run pls.go
gli.za <nil>
gli.za <nil>
The main difference is that weppos/publicsuffix-go
doesn't return any information about whether it was an Icann suffix. And the reason is that if you leave the default options, at that point you let the algorithm decide. If you don't want to use non-ICANN suffixes, you should probably not use them in first instance (I believe golang.org/x/net/publicsuffix
also has a way to ignore on-demand non-ICANN suffixes during parsing).
@eminano I would be interested to learn a little bit more about what you are using the list for, and how you came to this issue. Specifically, what you use the ICANN for.
I opened a specific issue to clarify the intent also in the list itself: https://github.com/publicsuffix/list/issues/570
Your feedback would help to have a better understanding and perhaps provide you an alternative.
@weppos Thanks for your reply! I am working on a custom implementation of the publicsuffix library, so I was aiming to mirror the same behaviour. I found these cases when running fuzzing comparing the output from both.
You are right, there's nothing specified about the ICANN flag, so I will restrict my comparison/mirroring to the public suffix returned.
I am working on a custom implementation of the publicsuffix library, so I was aiming to mirror the same behaviour.
@eminano out of curiosity, any reason to not use x/net/publicsuffix
or https://github.com/weppos/publicsuffix-go? Are you missing any specific feature?
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?go version go1.9.2
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?goos: darwin goarch: amd64
What did you do?
I tested domains such as "gli.za" not present in the public suffix list. https://play.golang.org/p/DaJICtp00J
What did you expect to see?
I expected to have icann flag set to false, since the domain is not in the public suffix list.
// za : http://www.zadna.org.za/content/page/domain-information ac.za agric.za alt.za co.za edu.za gov.za grondar.za law.za mil.za net.za ngo.za nis.za nom.za org.za school.za tm.za web.za
What did you see instead?
icann flag set to true
I added debugging lines locally to check why the flag was being set and to see if there was a rule found. I saw that the code was going through the following steps (https://github.com/golang/net/blob/master/publicsuffix/list.go#L47):
Is this the expected behaviour? It seems like a corner case, given "za" is one of the rare domains in the list that does not include the base level ("za"), but only 2 level domains.