StevenBlack / hosts

πŸ”’ Consolidating and extending hosts files from several well-curated sources. Optionally pick extensions for porn, social media, and other categories.
MIT License
26.87k stars 2.23k forks source link

Whitelist not working for minecraft-hosts and AdGuard CNAME #1687

Closed alagos closed 3 years ago

alagos commented 3 years ago

After a long time, I've pulled the last repo changes and I've realised click.redditmail.com is part of AdGuard CNAME, so I added it into my personal whitelist, but it's not working. After taking a look, I realized not only this list but also minecraft-hosts aren't working when you add one of the domains to the whitelist and both are in the format:

thedomain.com

instead of the rest of lists that come as:

0.0.0.0 thedomain.com

or

127.0.0.1 thedomain.com

so I'm thinking that maybe it's a script bug, also considering that both lists are fairly new in the project and probably it's a format that wasn't considered to be whitelisted before?

welcome[bot] commented 3 years ago

Hello! Thank you for opening your first issue in this repo. It’s people like you who make these host files better!

StevenBlack commented 3 years ago

Hi Alter @alagos all our hosts files lines either start with #, 0.0.0.0, or blank. Raw lists of domains are processed into hosts format before amalgamation.

Closing.

alagos commented 3 years ago

@StevenBlack I just mentioned that format as a possible cause, but yeah, I have no idea how the script works internally. Anyway, that's not the point, the thing is: the whitelist is not working for the lists I mentioned. Here it is a full test case using the project from scratch:

$ git clone https://github.com/StevenBlack/hosts.git
Cloning into 'hosts'...
remote: Enumerating objects: 42174, done.
remote: Counting objects: 100% (657/657), done.
remote: Compressing objects: 100% (288/288), done.
remote: Total 42174 (delta 387), reused 630 (delta 369), pack-reused 41517
Receiving objects: 100% (42174/42174), 226.62 MiB | 12.74 MiB/s, done.
Resolving deltas: 100% (25751/25751), done.
$ cd hosts
$ git log -1
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
commit 2512291d86af28d4ff479316e311ddb696972779 (HEAD -> master, origin/master, origin/HEAD) ┃
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┻
Author: Steven Black <steveb@stevenblack.com>
Date:   Thu Jun 17 16:23:34 2021 -0400

    Add google analytics domains from https://github.com/lightswitch05/hosts/issues/287
$ echo "# These will be whitelisted
# Domain only found in adaway.org
api.247-inc.net
# Domain only found in Badd-Boyz-Hosts
www.01apple.com
# These won't be whitelisted
# Domain only found in AdGuard CNAME
a8.01cloud.jp
# Domain only found in minecraft-hosts
auxilium.ftb.team" > whitelist
$ python3 updateHostsFile.py --auto --replace
Updating source data/StevenBlack from https://raw.githubusercontent.com/StevenBlack/hosts/master/data/StevenBlack/hosts
Updating source data/adaway.org from https://raw.githubusercontent.com/AdAway/adaway.github.io/master/hosts.txt
Updating source data/add.2o7Net from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/add.2o7Net/hosts
Updating source data/add.Dead from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/add.Dead/hosts
Updating source data/add.Risk from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/add.Risk/hosts
Updating source data/add.Spam from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/add.Spam/hosts
Updating source data/Adguard-cname from https://raw.githubusercontent.com/AdguardTeam/cname-trackers/master/combined_disguised_trackers_justdomains.txt
Updating source data/Badd-Boyz-Hosts from https://raw.githubusercontent.com/mitchellkrogza/Badd-Boyz-Hosts/master/hosts
Updating source data/hostsVN from https://raw.githubusercontent.com/bigdargon/hostsVN/master/option/hosts-VN
Updating source data/KADhosts from https://raw.githubusercontent.com/PolishFiltersTeam/KADhosts/master/KADhosts.txt
Updating source data/MetaMask from https://raw.githubusercontent.com/MetaMask/eth-phishing-detect/master/src/hosts.txt
Updating source data/minecraft-hosts from https://raw.githubusercontent.com/jamiemansfield/minecraft-hosts/master/lists/tracking.txt
Updating source data/mvps.org from https://winhelp2002.mvps.org/hosts.txt
Updating source data/orca.pet from https://orca.pet/notonmyshift/hosts.txt
Updating source data/osint.digitalside.it from https://raw.githubusercontent.com/davidonzo/Threat-Intel/master/lists/latestdomains.piHole.txt
Updating source data/shady-hosts from https://raw.githubusercontent.com/shreyasminocha/shady-hosts/main/hosts
Updating source data/someonewhocares.org from https://someonewhocares.org/hosts/zero/hosts
Updating source data/tiuxo from https://raw.githubusercontent.com/tiuxo/hosts/master/ads
Updating source data/UncheckyAds from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/UncheckyAds/hosts
Updating source data/URLHaus from https://urlhaus.abuse.ch/downloads/hostfile/
Updating source data/yoyo.org from https://pgl.yoyo.org/adservers/serverlist.php?hostformat=hosts&mimetype=plaintext&useip=0.0.0.0
Updating source extensions/fakenews from https://raw.githubusercontent.com/marktron/fakenews/master/fakenews
Updating source extensions/gambling from https://raw.githubusercontent.com/Sinfonietta/hostfiles/master/gambling-hosts
Updating source extensions/porn/clefspeare13 from https://raw.githubusercontent.com/Clefspeare13/pornhosts/master/0.0.0.0/hosts
Updating source extensions/porn/sinfonietta from https://raw.githubusercontent.com/Sinfonietta/hostfiles/master/pornography-hosts
Updating source extensions/porn/sinfonietta-snuff from https://raw.githubusercontent.com/Sinfonietta/hostfiles/master/snuff-hosts
Updating source extensions/porn/tiuxo from https://raw.githubusercontent.com/tiuxo/hosts/master/porn
Updating source extensions/social/sinfonietta from https://raw.githubusercontent.com/Sinfonietta/hostfiles/master/social-hosts
Updating source extensions/social/tiuxo from https://raw.githubusercontent.com/tiuxo/hosts/master/social
Success! The hosts file has been saved in folder
It contains 80,645 unique entries.
Moving the file requires administrative privileges. You might need to enter your password.
Password:
$ ack "api.247-inc.net" /etc/hosts # won't be found, which is good
$ ack "www.01apple.com" /etc/hosts # won't be found, which is good
$ ack "a8.01cloud.jp" /etc/hosts # will be found, which is not good
0.0.0.0 a8.01cloud.jp
$ ack "auxilium.ftb.team" /etc/hosts # will be found, which is not good
0.0.0.0 auxilium.ftb.team

Worth to mention, I'm using Big Sur

dnmTX commented 3 years ago

Steve @StevenBlack might be some issue with the script. Let's call the service technician to assess πŸ˜‰ PING @funilrys

alagos commented 3 years ago

No idea why nobody pay some attention on this, looks like a legit issue to me. Anyway, for anyone having this issue, after cloning try with version:

git checkout 3.5.3

and it will whitelist correctly, as it was in 3.6.0 when this buggy non-whitelistable list was introduced.

StevenBlack commented 3 years ago

Tank you Alter @alagos let's revisit this.

Reopening.

The thing that hurts this issue is, I can't find a neat expression of the actual problem. There are words here. What I need is a clear statement that says, here's the problem.

When things are written in a form where readers have to "figure it out", or decypher some meaning, often that doesn't happen.

StevenBlack commented 3 years ago

Alter @alagos I just re-read this thread and I still have no idea what you're talking about. I'm normally really, really good at inferring meaning. But in this case, I'm stumped.

Please start over. Clearly state the problem below, avoiding all references that aren't necessary to understanding the problem.

alagos commented 3 years ago

Hey, thanks for reopening this. The issue is, if you add any of the domains included in minecraft-hosts or AdGuard CNAME lists to you whitelist file, those won't be whitelisted. To reproduce it, add any domain from the adguard-cname list into your whitelist file (in my case, I found the issue because click.redditmail.com wasn't being whitelisted), then run python3 updateHostsFile.py --auto --replace and the domain will be added anyway to the host file, regardless it has been included in the whitelist file. And yes, this is not happening with domains from other lists. In this comment I tried to explain the steps to reproduce it, but I suppose it was too verbose to be understandable. Anyway, a simplified version to reproduce the issue will be:

git clone https://github.com/StevenBlack/hosts.git
cd hosts
echo "# These domains will be whitelisted
# Domain only found in adaway.org
api.247-inc.net
# Domain only found in Badd-Boyz-Hosts
www.01apple.com
# These won't be whitelisted
# Domain only found in AdGuard CNAME
a8.01cloud.jp
# Domain only found in minecraft-hosts
auxilium.ftb.team" > whitelist
python3 updateHostsFile.py --auto --replace
grep -rnw /etc/hosts -e "api.247-inc.net" /etc/hosts # won't return anything, which is good
grep -rnw /etc/hosts -e "www.01apple.com" /etc/hosts # won't return anything, which is good
grep -rnw /etc/hosts -e "a8.01cloud.jp" /etc/hosts # domain will be included in the host file, that's the issue
grep -rnw /etc/hosts -e "auxilium.ftb.team" /etc/hosts # this also will be included
avatartw commented 3 years ago

Whitelist working for hosts format list, not working for domain format lists(e.g. AdguardTeam cname trackers, minecraft-hosts)

StevenBlack commented 3 years ago

Thanks Alter @alagos that's clear now.