StevenBlack / hosts

πŸ”’ Consolidating and extending hosts files from several well-curated sources. Optionally pick extensions for porn, social media, and other categories.
MIT License
26.88k stars 2.23k forks source link

Many 127.0.0.1 entries in some hosts files #2463

Closed pcl closed 1 year ago

pcl commented 1 year ago

Creating a separate issue to avoid chatting in the wrong PR too much.

Some of your hosts files have thousands of 127.0.0.1 entries. Here's a quick analysis, looking at hosts files containing 10 or more 127.0.0.1 entries:

$ grep -c 127.0.0.1 $(find * -name hosts -a -type f) | egrep -v ':.$' | sort -n -k 2 -t : -r
data/adaway.org/hosts:7346
extensions/social/sinfonietta/hosts:2840
data/URLHaus/hosts:376
extensions/porn/sinfonietta-snuff/hosts:23

All four of those files contain no 0.0.0.0 entries.

Perhaps this doesn't matter; just seemed a bit fishy when I was looking at the results of my changes in that other PR.

welcome[bot] commented 1 year ago

Hello! Thank you for opening your first issue in this repo. It’s people like you who make these host files better!

StevenBlack commented 1 year ago

Hello Patrick @pcl thank you for this observation.

This is all in the readme maybe you could consult that.

In short, we are an aggregator of hosts files from active and reputable curators of hosts files, and we package these amalgamated hosts files in various ways.

In order to do that we

  1. download the latest version of hosts files from curators, then
  2. store them locally, then
  3. we merge and de-duplicate them and package them in various ways.

So the folders you're looking at are intermediate data structures that we cache because we use them multiple times for multiple derived works.

The cache is also useful when we have an issue with the latest version from curators β€” typically due to errors or controversial new additions β€” and in this case we use the cached version, not the latest version, until the issue is cleaned-up.

I hope this clarifies it.

Closing.