StevenBlack / hosts

🔒 Consolidating and extending hosts files from several well-curated sources. Optionally pick extensions for porn, social media, and other categories.
MIT License
26.19k stars 2.18k forks source link

Windows script issues #1166

Closed XhmikosR closed 4 years ago

XhmikosR commented 4 years ago

After #1157 lands, one of the issues I face on Windows will be fixed. These leaves us with two more issues I've noticed so far:

  1. Using results in broken Unicode characters. Example: Christian Martínez becomes Christian Martínez
  2. The script results in backslashes used in Readme URLs:
    -Unified hosts **+ fakenews** | [Readme]( | [link]( | 52,621 | [link](
    +Unified hosts **+ fakenews** | [Readme](\ | [link](\hosts) | 52,628 | [link](\hosts)
welcome[bot] commented 4 years ago

Hello! Thank you for opening your first issue in this repo. It’s people like you who make these host files better!

funilrys commented 4 years ago

2 ==> #1165 1 ==> It comes from the readme_template. It's probably because you're not reading in UTF-8 ?

ScriptTiger commented 4 years ago

I got frustrated with this and just started my own repo, but it's good to see some work being done on this again.

XhmikosR commented 4 years ago

@funilrys about the second point, there's still at least once we end up with backslashes, see

About the UTF issue, I didn't do anything myself, I just ran the scripts.

XhmikosR commented 4 years ago

@StevenBlack there's still an issue with backslashes, see my comment above

XhmikosR commented 4 years ago

BTW what's the exact scripts/command you run @StevenBlack to generate the files? I'm asking because whenever I try it on Windows I get too many changes in each file

StevenBlack commented 4 years ago

@XhmikosR I haven't used Windows in at least 10-years. I'm on MacOS and Ubuntu.

I run which generates all the variants, in turn.

What do you mean, "I get too many changes in each file"?

XhmikosR commented 4 years ago

@StevenBlack understandable, but please re-open the issue until we manage to fix everything; we are so close :)

C:\Users\xmr\Desktop\hosts>git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean

Updating source data\ from
Updating source data\add.2o7Net from
Updating source data\add.Dead from
Updating source data\add.Risk from
Updating source data\add.Spam from
Updating source data\Badd-Boyz-Hosts from
Updating source data\hostsVN from
Updating source data\KADhosts from
Updating source data\ from
Updating source data\ from
Updating source data\ from
Updating source data\StevenBlack from
Updating source data\tiuxo from
Updating source data\UncheckyAds from
Updating source data\ from
Updating source extensions\fakenews from
Updating source extensions\gambling from
Updating source extensions\porn\clefspeare13 from
Updating source extensions\porn\sinfonietta from
Updating source extensions\porn\sinfonietta-snuff from
Updating source extensions\porn\tiuxo from
Updating source extensions\social\sinfonietta from
Updating source extensions\social\tiuxo from
==>fe00::0 ip6-localnet<==
==>ff00::0 ip6-mcastprefix<==
==>ff02::2 ip6-allrouters<==
==>ff02::3 ip6-allhosts<==
Success! The hosts file has been saved in folder alternates/gambling
It contains 54,006 unique entries.
==>fe00::0 ip6-localnet<==
==>ff00::0 ip6-mcastprefix<==
==>ff02::2 ip6-allrouters<==
==>ff02::3 ip6-allhosts<==
Success! The hosts file has been saved in folder alternates/porn
It contains 67,654 unique entries.
==>fe00::0 ip6-localnet<==
==>ff00::0 ip6-mcastprefix<==
==>ff02::2 ip6-allrouters<==
==>ff02::3 ip6-allhosts<==
Success! The hosts file has been saved in folder alternates/social
It contains 54,153 unique entries.
==>fe00::0 ip6-localnet<==
==>ff00::0 ip6-mcastprefix<==
==>ff02::2 ip6-allrouters<==
==>ff02::3 ip6-allhosts<==
Success! The hosts file has been saved in folder alternates/fakenews
It contains 52,627 unique entries.
==>fe00::0 ip6-localnet<==
==>ff00::0 ip6-mcastprefix<==
==>ff02::2 ip6-allrouters<==
==>ff02::3 ip6-allhosts<==
Success! The hosts file has been saved in folder alternates/fakenews-gambling
It contains 54,948 unique entries.
==>fe00::0 ip6-localnet<==
==>ff00::0 ip6-mcastprefix<==
==>ff02::2 ip6-allrouters<==
==>ff02::3 ip6-allhosts<==
Success! The hosts file has been saved in folder alternates/fakenews-porn
It contains 68,596 unique entries.
==>fe00::0 ip6-localnet<==
==>ff00::0 ip6-mcastprefix<==
==>ff02::2 ip6-allrouters<==
==>ff02::3 ip6-allhosts<==
Success! The hosts file has been saved in folder alternates/fakenews-social
It contains 55,095 unique entries.
==>fe00::0 ip6-localnet<==
==>ff00::0 ip6-mcastprefix<==
==>ff02::2 ip6-allrouters<==
==>ff02::3 ip6-allhosts<==
Success! The hosts file has been saved in folder alternates/gambling-porn
It contains 69,975 unique entries.
==>fe00::0 ip6-localnet<==
==>ff00::0 ip6-mcastprefix<==
==>ff02::2 ip6-allrouters<==
==>ff02::3 ip6-allhosts<==
Success! The hosts file has been saved in folder alternates/gambling-social
It contains 56,474 unique entries.
==>fe00::0 ip6-localnet<==
==>ff00::0 ip6-mcastprefix<==
==>ff02::2 ip6-allrouters<==
==>ff02::3 ip6-allhosts<==
Success! The hosts file has been saved in folder alternates/porn-social
It contains 70,121 unique entries.
==>fe00::0 ip6-localnet<==
==>ff00::0 ip6-mcastprefix<==
==>ff02::2 ip6-allrouters<==
==>ff02::3 ip6-allhosts<==
Success! The hosts file has been saved in folder alternates/fakenews-gambling-porn
It contains 70,917 unique entries.
==>fe00::0 ip6-localnet<==
==>ff00::0 ip6-mcastprefix<==
==>ff02::2 ip6-allrouters<==
==>ff02::3 ip6-allhosts<==
Success! The hosts file has been saved in folder alternates/fakenews-gambling-social
It contains 57,416 unique entries.
==>fe00::0 ip6-localnet<==
==>ff00::0 ip6-mcastprefix<==
==>ff02::2 ip6-allrouters<==
==>ff02::3 ip6-allhosts<==
Success! The hosts file has been saved in folder alternates/fakenews-porn-social
It contains 71,063 unique entries.
==>fe00::0 ip6-localnet<==
==>ff00::0 ip6-mcastprefix<==
==>ff02::2 ip6-allrouters<==
==>ff02::3 ip6-allhosts<==
Success! The hosts file has been saved in folder alternates/gambling-porn-social
It contains 72,442 unique entries.
==>fe00::0 ip6-localnet<==
==>ff00::0 ip6-mcastprefix<==
==>ff02::2 ip6-allrouters<==
==>ff02::3 ip6-allhosts<==
Success! The hosts file has been saved in folder alternates/fakenews-gambling-porn-social
It contains 73,384 unique entries.
==>fe00::0 ip6-localnet<==
==>ff00::0 ip6-mcastprefix<==
==>ff02::2 ip6-allrouters<==
==>ff02::3 ip6-allhosts<==
Success! The hosts file has been saved in folder
It contains 51,685 unique entries.

which results in a diff like this:

 alternates/fakenews-gambling-porn-social/hosts     | 80242 +++++++++---------
 alternates/fakenews-gambling-porn-social/ |   184 +-
 alternates/fakenews-gambling-porn/hosts            | 80242 +++++++++---------
 alternates/fakenews-gambling-porn/        |   180 +-
 alternates/fakenews-gambling-social/hosts          | 80216 +++++++++---------
 alternates/fakenews-gambling-social/      |   176 +-
 alternates/fakenews-gambling/hosts                 | 80216 +++++++++---------
 alternates/fakenews-gambling/             |   172 +-
 alternates/fakenews-porn-social/hosts              | 80244 ++++++++++---------
 alternates/fakenews-porn-social/          |   182 +-
 alternates/fakenews-porn/hosts                     | 80244 ++++++++++---------
 alternates/fakenews-porn/                 |   178 +-
 alternates/fakenews-social/hosts                   | 80216 +++++++++---------
 alternates/fakenews-social/               |   174 +-
 alternates/fakenews/hosts                          | 80216 +++++++++---------
 alternates/fakenews/                      |   170 +-
 alternates/gambling-porn-social/hosts              | 80242 +++++++++---------
 alternates/gambling-porn-social/          |   182 +-
 alternates/gambling-porn/hosts                     | 80242 +++++++++---------
 alternates/gambling-porn/                 |   178 +-
 alternates/gambling-social/hosts                   | 80216 +++++++++---------
 alternates/gambling-social/               |   174 +-
 alternates/gambling/hosts                          | 80216 +++++++++---------
 alternates/gambling/                      |   170 +-
 alternates/porn-social/hosts                       | 80240 +++++++++---------
 alternates/porn-social/                   |   180 +-
 alternates/porn/hosts                              | 80240 +++++++++---------
 alternates/porn/                          |   176 +-
 alternates/social/hosts                            | 80216 +++++++++---------
 alternates/social/                        |   172 +-
 data/Badd-Boyz-Hosts/hosts                         |     2 +-
 data/KADhosts/hosts                                |     6 +-
 data/                              |     4 +-
 data/hostsVN/hosts                                 |     2 +-
 data/                     |     5 +-
 data/                                |     2 +-
 hosts                                              | 80216 +++++++++---------                                          |   168 +-
 readmeData.json                                    |     2 +-
 39 files changed, 643446 insertions(+), 643057 deletions(-)

The changes are not in the line endings, they are in the way the entries are sorted, which I can see in the Readme too:

Host file source | Description | Home page | Raw hosts | Update frequency | License | Issues
Steven Black's ad-hoc list | Additional sketch domains as I come across them. |[link]( | [raw]( | occasionally | MIT  | [issues]( 
Malware Domain List | Malware Domain List is a non-commercial community project. |[link]( | [raw]( | weekly | 'can be used for free by anyone'  | [issues]( 
add.Dead | Dead sites based on []( content. |[link]( | [raw]( | occasionally | GPLv3+  | [issues]( 
hostsVN | Hosts block ads of Vietnamese |[link]( | [raw]( | occasionally | MIT  | [issues]( 
add.Spam | Spam sites based on []( content. |[link]( | [raw]( | occasionally | GPLv3+  | [issues]( 
Dan Pollock – [someonewhocares]( | How to make the internet not suck (as much). |[link]( | [raw]( | frequently | non-commercial with attribution  | [issues]( 
MVPS hosts file | The purpose of this site is to provide the user with a high quality custom HOSTS file. |[link]( | [raw]( | monthly | CC BY-NC-SA 4.0  | [issues]( | Blocking with ad server and tracking server hostnames. |[link]( | [raw]( | frequently |   | [issues]( 
Mitchell Krog's - Badd Boyz Hosts | Sketchy domains and Bad Referrers from my Nginx and Apache Bad Bot and Spam Referrer Blockers |[link]( | [raw]( | weekly | MIT  | [issues]( 
UncheckyAds | Windows installers ads sources sites based on content. |[link]( | [raw]( | occasionally |   | [issues]( 
add.2o7Net | 2o7Net tracking sites based on []( content. |[link]( | [raw]( | occasionally | GPLv3+  | [issues]( 
KADhosts | Fraud/adware/scam websites. |[link]( | [raw]( | frequently | CC BY-SA 4.0  | [issues]( 
AdAway | AdAway is an open source ad blocker for Android using the hosts file. |[link]( | [raw]( | occasionally | CC BY 3.0  | [issues]( 
add.Risk | Risk content sites based on []( content. |[link]( | [raw]( | occasionally | GPLv3+  | [issues]( 
Tiuxo hostlist - ads | Categorized hosts files for DNS based content blocking |[link]( | [raw]( | occasional | CC BY 4.0  | [issues]( 


Host file source | Description | Home page | Raw hosts | Update frequency | License | Issues
AdAway | AdAway is an open source ad blocker for Android using the hosts file. |[link]( | [raw]( | occasionally | CC BY 3.0 | [issues](
add.2o7Net | 2o7Net tracking sites based on []( content. |[link]( | [raw]( | occasionally | GPLv3+ | [issues](
add.Dead | Dead sites based on []( content. |[link]( | [raw]( | occasionally | GPLv3+ | [issues](
add.Risk | Risk content sites based on []( content. |[link]( | [raw]( | occasionally | GPLv3+ | [issues](
add.Spam | Spam sites based on []( content. |[link]( | [raw]( | occasionally | GPLv3+ | [issues](
Mitchell Krog's - Badd Boyz Hosts | Sketchy domains and Bad Referrers from my Nginx and Apache Bad Bot and Spam Referrer Blockers |[link]( | [raw]( | weekly | MIT | [issues](
hostsVN | Hosts block ads of Vietnamese |[link]( | [raw]( | occasionally | MIT | [issues](
KADhosts | Fraud/adware/scam websites. |[link]( | [raw]( | frequently | CC BY-SA 4.0 | [issues](
Malware Domain List | Malware Domain List is a non-commercial community project. |[link]( | [raw]( | weekly | 'can be used for free by anyone' | [issues](
MVPS hosts file | The purpose of this site is to provide the user with a high quality custom HOSTS file. |[link]( | [raw]( | monthly | CC BY-NC-SA 4.0 | [issues](
Dan Pollock – [someonewhocares]( | How to make the internet not suck (as much). |[link]( | [raw]( | frequently | non-commercial with attribution | [issues](
Steven Black's ad-hoc list | Additional sketch domains as I come across them. |[link]( | [raw]( | occasionally | MIT | [issues](
Tiuxo hostlist - ads | Categorized hosts files for DNS based content blocking |[link]( | [raw]( | occasional | CC BY 4.0 | [issues](
UncheckyAds | Windows installers ads sources sites based on content. |[link]( | [raw]( | occasionally |  | [issues]( | Blocking with ad server and tracking server hostnames. |[link]( | [raw]( | frequently |  | [issues](
StevenBlack commented 4 years ago

@XhmikosR please explain the problem here. I don'y understand. That diff is perfectly normal. What do you expect? All hosts files, all readme files, get re-generated. This is 100% by design.

StevenBlack commented 4 years ago

The order of sources listed is not determinate; it never was. WTF cares? I certainly don't. readmeData.json is just a JSON structure and everything comes from that.

XhmikosR commented 4 years ago

When you push a patch which updates the data, not every line changes. On Windows all lines change, but not because of line endings, but because of the order the folders are traversed and thus the data are processed/output. You can see this in the Readme part I pasted above.

I'm not saying it matters, it just doesn't make any sense, though.

StevenBlack commented 4 years ago

Look, about Windows... I don't mean to be unkind in any way, but people who use Windows have 99 other problems.

This repo is meant to be a sysadmin thing. It makes hosts files. Honestly, I don't care what diffs Windows users get as long as the hosts files generate properly.

You know what curation is, in practice? Curation means, saying "no".

I don't care about this.

XhmikosR commented 4 years ago

@funilrys it seems the README Unicode issue is back (or was never fixed completely) 🙁

For example:

-**Windows XP**: Start → Run → `cmd`
+**Windows XP**: Start → Run → `cmd`

-* [ViHoMa]( is a Visual Hosts file Manager, written in Java, by Christian Martínez.  Check it out!
+* [ViHoMa]( is a Visual Hosts file Manager, written in Java, by Christian Martínez.  Check it out!

-* [Blocking ads and malwares with unbound]( "Blocking ads and malwares with unbound") – [Unbound]( "Unbound is a validating, recursive, and caching DNS resolver.") is a validating, recursive, and caching DNS resolver.
+* [Blocking ads and malwares with unbound]( "Blocking ads and malwares with unbound") – [Unbound]( "Unbound is a validating, recursive, and caching DNS resolver.") is a validating, recursive, and caching DNS resolver.
StevenBlack commented 4 years ago

Honesty, Windows is such a shitshow. I know this doesn't help this issue; sometimes I just need to vent. @XhmikosR @funilrys

XhmikosR commented 4 years ago

There's only one last issue on Windows after #1296 is merged.

readmeData.json still contains 2 backslashes at the end of the location strings, for example:

  "fakenews-gambling-porn": {
    "location": "alternates/fakenews-gambling-porn\\",

I tried to fix it without success so far. Maybe @funilrys you have some idea.

That being said, finally everything is the same on Windows after #1296. 🙂