bongochong / CombinedPrivacyBlockLists

Ad-blocking hosts files, IP block lists, PAC filters, ABP / uBO / ADG subscriptions, and a whole lot more. All merged from multiple reputable sources, combined with my own research. Also, script-based utilities to help you create such things yourself. Updated at least once every two weeks, often more frequently.
Other
224 stars 25 forks source link

[RESOLVED] Invalid domain #4

Closed whoot closed 4 years ago

whoot commented 4 years ago

Hey, there seems to be an invalid domain in the newhosts-final list:

  [i] Target: https://raw.githubusercontent.com/bongochong/CombinedPrivacyBlockLists/master/newhosts-final.hosts
  [✓] Status: Retrieval successful
  [i] Received 90908 domains, 1 domains invalid!
      Sample of invalid domains:
      - 6×66.com

You may want to change or delete it.

Regards

bongochong commented 4 years ago

@whoot Sorry for the initial response (now deleted). Was away for a few days and read your comment the wrong way while on my phone, which led me to believe it was an unconventional variety of spam. My mistake.

In regard to this issue, I have debated - on and off again - between leaving entries which contain unicode characters in my hosts lists, converting said entries to punycode, or filtering them out altogether.

The presence of domains with unicode characters in them (like some IDNs) should not prevent your hosts file from working correctly on most operating systems. Furthermore, the sources from which my lists are compiled usually don't contain more than 2 or 3 entries that have unicode characters in them (if any at all).

All of that being said, I recognize that having a system in place for converting such entries to punycode, or filtering them out altogether, would be more consistent, and probably is best practice. I will carefully examine said approaches and choose one in the very near future. Consider this a 'will-fix', and again, apologies for making an incorrect assumption.

whoot commented 4 years ago

dont worry, I understand. It seems like this is more of an issue of pi.hole (which i am using your list for) and not your list as using unicode characters seems to be a common way of making to look the url leggit

bongochong commented 4 years ago

Just updated my lists again and there are no unicode-specific characters in them. From now on, my backend scripts will alert me to the presence of such characters in my hosts lists, and on the very rare occasion where an entry contains one, said domain will be converted to punycode via the idn CLI utility. Thank you for the feedback, and rest assured that you can use my lists without issue on your pi-hole setup.

bongochong commented 4 years ago

Update: You'll be happy to know that the routine I added to my backend scripts for converting Unicode IDN domains to ASCII Punycode entries, is now included in the hosts updater scripts shared in this repo. See this commit for reference.

whoot commented 4 years ago

Thanks!