DaveAKing / guava-libraries

Automatically exported from code.google.com/p/guava-libraries
Apache License 2.0
0 stars 0 forks source link

Public suffix list is used as though it were designed to be exhaustive, but it's not #1618

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
"""
The issue appears to be in the API of InternetDomainName.findPublicSuffix() - 
https://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/com
mon/net/InternetDomainName.java?spec=svn8f062a880a3102d6d8f909c0754d3d2cd468c9dd
&r=ab29b173055a1ff647516848b176265fc6792ba0#167

The issue appears to be that this class is disregarding Step 2 of "The 
Algorithm", described at http://publicsuffix.org/list/ - that is, "If no rules 
match, the prevailing rule is *".

In this model, any domain *not* on the list is assumed to be registerable at 
the second level. For example, "au" is not included in the PSL. This should 
cause "foo.au" to fail to match any rules, and thus fall into the default 
wildcard rule. In the default wildcard rule, the public suffix is ".au" - and 
CSIRO is treated as a registerable name.

This is especially important with the many new registries that ICANN is 
approving; a decision has not been made to automatically add them to the PSL, 
and so I fear this may cause issues for Java applications in validating these 
domains.

If the goal is to ensure a name is "valid" (that is, assigned/approved by 
ICANN), then IANA has a data file that is updated twice daily at 
http://data.iana.org/TLD/tlds-alpha-by-domain.txt that contains all 
IANA-assigned gTLDs. It may make sense to incorporate this data into the PSL 
trie to have a proper "fail open" behaviour.

...

For plausability checks, then the IANA list is a much better resource, for sure.
For security checks, the PSL is the best source of data for this.

...

The point of the PSL is not to replace the IANA list but to further reduce 
scope of registerable labels.

There would be no benefit to the PSL's including the full IANA list, and real 
performance harm, since step 2 of the algorithm implicitly covers these domains.
"""

What would change in InternetDomainName? I would want to talk more to the 
original bug reporter and to others, but here are some guesses:

- topPrivateDomain() would remain
- isTopPrivateDomain() would remain (though we might take the opportunity to 
look at users and see whether it's worth having when it's so easy to roll your 
own)
- hasPublicSuffix() would be replaced by hasTld()
- publicSuffix() could remain, but from a quick survey of users, I get the 
impression that most either want tld() or could get by with topPrivateDomain() 
just as easily
- isPublicSuffix() would be removed... or replaced by isTld(), but I've always 
been fuzzy on how the original method was to be used, and it's easy to roll 
your own
- isUnderPublicSuffix() would be removed... or replaced by isUnderTld(), but 
that seems to have all the concerns of isPublicSuffix() and more, since a 
domain can be under a TLD but not a public suffix

Original issue reported on code.google.com by cpov...@google.com on 18 Dec 2013 at 5:30

GoogleCodeExporter commented 9 years ago

Original comment by kevinb@google.com on 18 Dec 2013 at 6:12

GoogleCodeExporter commented 9 years ago
This issue has been migrated to GitHub.

It can be found at https://github.com/google/guava/issues/<issue id>

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:10

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:17

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 3 Nov 2014 at 9:08