StevenBlack / hosts

🔒 Consolidating and extending hosts files from several well-curated sources. Optionally pick extensions for porn, social media, and other categories.
MIT License
26.64k stars 2.21k forks source link

do you plan to include these hosts in future? #445

Closed udit-001 closed 6 years ago

udit-001 commented 6 years ago

All the sub categories available under this : hpHosts

welcome[bot] commented 6 years ago

Hello! Thank you for opening your first issue in this repo. It’s people like you who make these host files better!

ScriptTiger commented 6 years ago

We only include hosts which are actively curated to keep our list as streamlined and efficient as possible, keeping in mind this list has also seen use in some integrated systems. So if those hosts should happen upon our actively curated lists, we will include them. Otherwise, we don't import bulk lists that do not maintain a high degree of curation, as this is counter to the mission statement of the repository.

udit-001 commented 6 years ago

The classified files over there are refreshed far more often rather than the month to month hosts.txt files that they upload, you can incorporate those smaller files. Furthermore, indeed, you're right about excluding the bulk ones that are not frequently updated.

ScriptTiger commented 6 years ago

A simple test we could perform is diff the lists from month to month to get a better feel for the curation happening here. Most of the time they are simply adding and not checking or pruning older invalid entries, which is the key to our list.

There would also be a terms of use issue we would have to work out, as any redistribution of their data requires their expressed permission.

udit-001 commented 6 years ago

Besides, would you be able to reveal some insight into how would you check for invalid entries and more seasoned ones? Do you do it physically or utilize some kinda program or a product?

ScriptTiger commented 6 years ago

And just to pose a possible solution to the above problems, after contacting Malwarebytes about the redistribution of their data for an open-source project and procuring their permission, I am sure a diff study over a reasonable period of time would be welcomed. A "reasonable period of time" being defined by Steven. After the said study is conducted and the results posted here, we can then have more to work with and make a definitive decision backed up by the facts. So if anyone is willing to take this on, I am sure it would be a welcomed project.

Entries are checked via a variety of means: randomly, automatically, via funilrys' funceable script (https://github.com/funilrys/funceble), combinations of checks, socially/crowd-sourcing, and more. Steven personally vets the sources and some of those threads are still here if you want to dig into the closed issues and see what the conversations have gone like for a source to get approved and how they have described their curation process to the community. I am sure when Steven becomes available he can provide a better explanation of his exact process from his viewpoint. From my experience, having talked with many of the sources personally, many of them own and operate large-scale networks and are constantly gathering live data from live traffic and this is a large part of how they curate their lists to keep them up to date with how their traffic is trending.

udit-001 commented 6 years ago

All things considered, I investigated a few different issues, and Steven doesn't appear to be keen to incorporate hpHosts as it is tremendous and doesn't have a similar level of curation, yet I was simply hoping that was this specific classification (https://hosts-file.net/pup.txt) incorporated into the Unified hosts record? That is to say, particularly the Potentially Unwanted Programs Category.

ScriptTiger commented 6 years ago

Just as a note, you can also always add custom entries to the myhosts file in the root directory of the repository and they will be automatically added to your build of the list. If you're using my Unified Hosts AutoUpdate for Windows, you can just add them anywhere you want in your hosts file as long as they are outside of the "#### BEGIN UNIFIED HOSTS ####" and "#### END UNIFIED HOSTS ####" markings and they won't be touched during a hosts update. This might be helpful in the future for any situation in which you wish to include entries not supported by this project.

StevenBlack commented 6 years ago

Hi @udit-001! The main problem with HP hosts is its outsized collection. Hundreds of thousands of hosts.

Some operating systems, especially the most popular one, degrade notably once the hosts file reaches a certain size. Mobile devices, with their limited processing power, also seem to have issues dealing with very large host files.

HP Hosts was once included here. Not for very long; it led to an avalanche of complaints ⛄️

udit-001 commented 6 years ago

Oh! I get it has an extreme number of false calls possibly. Also, coincidentally, do you intend to incorporate domains that utilize our PC assets to mine cryptographic money coins?

Some lists like this most likely: ZeroDot1 - CoinBlocker Lists Adblock No-Coin List

StevenBlack commented 6 years ago

@udit-001 I suspect that, in due course, those hosts will end up in one or more of our curated sources.