Closed navidada closed 9 years ago
The simplest solution I can see is to build a complete (or somewhat complete) list of such special domain.
Wikipedia has comprehensive (if not complete) lists in corresponding articles: uk mx au. But I couldn't find a list of country-level domains with such a strange naming scheme (other than these three).
There are also other countries that share this naming scheme: .jp, .il, .no, .ru, etc. I can try and make a list from Wikipedia, but I don't know how to code the Rule and how to prevent it from interfering with it's original aim.
Also one needs to think of the algorithm complexity so it won't slow down the add-on too much (if it tries to check all possibilities. That's why I thought about the option that each user will enter the country domains that are relevant to him [.jp] and then the rule will only look for 9 possibilities - .co.jp, .ac.jp,...)
Anyway, will it help if I make this list even if I don't know coding?
@futpib "(...)I thought about the option that each user will enter the country domains that are relevant to him(...)"
I'd prefer if it just works. Normally co.uk isn't relevant to me, but who knows where I am redirected to? I won't be able to say in advance that I never have to visit a co.jp address.
Can this be solved by a different pattern matching? Probably it can, I'm just not good at regular expressions. Like catching a TLD that is led by two letters which are led by more than two characters.
Edit: Supposing there are no domains with just two characters.
Update: Too bad, there are some: https://en.wikipedia.org/wiki/Single-letter_second-level_domain
I know at least one German.
@bastik-tor The problem is, there are sites with two-letter domains (ya.ru, vk.com) and there probably exist two-letter sites that are hosting other sites (like ya.ru could host whatever.ya.ru), so this can't be solved by clever matching on the domain-string alone.
Again, I think we'll have to maintain a list of second-level domains that are no sites sited on their own. Or leave it as it is, heh, after all "Same second-level domain" does what it says.
@navidada It will help. Performance-wise, there are hashmaps with constant-time lookup on average. This is what "Persistent" and "Temporary" rulesets are built upon.
@navidada No need for making a list anymore.
I've just been emailed with a link to this wiki page which points to what seems to be a complete list of the domains we need.
@futpib Great finding, nice list but long, very long... I wonder if it wouldn't be wiser for the user of Policeman to simply disable the Same second-level domain rule set. A bit of extra settings occasionally but would clear out this problem, one of the worst in coding I guess as are those handling exceptions out of any logical process. A real pain.
@futpib I also encountered this list but it seems it's not fully complete. For example check .il for Israel (and not Illinois,US): The old Mozilla site (https://wiki.mozilla.org/TLD_List) used to mention ac.il co.il org.il net.il k12.il gov.il muni.il idf.il - Those are still relevant (https://en.wikipedia.org/wiki/.il) I don't know why the new site is not updated. I wanted to 'submit amendments' but didn't know how to create what they refer to as 'unified diff'.
It seems that Firefox itself uses this list in order to handle cookies and other things. So if Firefox "understand" how to parse each web-address, maybe Policeman can use that data directly from Firefox? (and thus save all the computational hassle)
@kafene Huge thanks.
When using the rule "Same second-level domain", policeman will treat all domains from the same country code as the same domain. For example: http://www.news.com.au/
policeman will assume that the domain is 'com.au' and thus allow requests to other Australian sites such as: foxsports.com.au newsapi.com.au
The country code is irrelevant, it also happens in .co.uk, for example: http://www.dailymail.co.uk will allow requests to 'we.and.co.uk' although it doesn't put it in the same category as dailymail.co.uk, like it did with news.co.au. And I assume it will also happen in other sites such as .gov.uk or .org.mx and so on.
Is there a way to fix that rule? There are a lot of permutations so maybe the solution is that everyone will enter specific country codes that are relevant to them [.uk, .es, .fr] and policeman will only make exceptions for these country codes. i.e check if the address 'y.z.fr' ends with one of these specific suffixes and then allow it to request for 'x.y.z.fr' and not to 'a.z.fr'.