StevenBlack / hosts

🔒 Consolidating and extending hosts files from several well-curated sources. Optionally pick extensions for porn, social media, and other categories.
MIT License
26.47k stars 2.19k forks source link

Properly handle domains ending with "`.`" #2456

Open StevenBlack opened 11 months ago

StevenBlack commented 11 months ago

This arises from...

We should

iam-py-test commented 11 months ago

Could we strip the . off the end? example.com. and example.com are the same, right? Thanks

StevenBlack commented 11 months ago

@iam-py-test my feeling is, if a domain ends in ., like any of the following, we should reject the domain.

Otherwise we risk Type I errors with sentence text and other potential garbage in source hosts files.

# reject the domain
example.com.

# reject the domain
127.0.0.1 example.com.

# reject the domain, allowing the other 3
127.0.0.1 example.com. a.com b.com c.com
127.0.0.1 a.com b.com example.com. c.com
127.0.0.1 a.com b.com c.com example.com.
iam-py-test commented 11 months ago

Ah, I didn't think of that. Thanks

krystian3w commented 11 months ago

I used to wonder about this myself: https://github.com/FiltersHeroes/KADhosts/issues/28#issuecomment-607294633

It is best to check the RFCs where such a dot makes the page not blocked, e.g. AdBlock and Adblock Plus for years do not know how to address this (you have to double entry (cosmetic hide sections of page) and write separate regexes, e.g. French list to block ads at network level).

Embracing multiple domains in a line should not be difficult, you would look for a dot and a white character and then do a validation of the public suffix (for minimizing the chance that something has the wrong "TLD").