4skinSkywalker / Anti-Porn-HOSTS-File

Hosts file for "blocking" porn sites
MIT License
149 stars 63 forks source link

Potential improvements #77

Closed AndhikaWB closed 1 year ago

AndhikaWB commented 1 year ago

While this is a great project and I really appreciate it, this project is not really maintainable. As a user, currently you can only add new sites based on the compiled (final) hosts file. I think it can be improved in many ways, such as:

  1. Add sources list Where do you get all the filter? How to differentiate it from manually added filter (from PRs)? There should be clear distinction (e.g. separate filter files/sources) so people can suggest new filter sources without adding sites list manually (if it's already included in other source that is currently not used in this project), potentially keeping it always up to date.
  2. Whitelist based on category Whitelist should not be enforced to everyone. For example, some people may not use torrent sites to look at porn, but some other does. The same applies to image sharing services and social medias. There should be separate and unified hosts file based on all these categories (e.g. torrent sites, image sharing services, video sharing services, social medias).
  3. Automation script If (1) and (2) are already achieved, this project can be automated so it's easier to maintain and customize. I think most of the porn/ad filter out there are already automated (e.g. StevenBlack). With automation script, it's also possible to continue other project that is currently dead (e.g. Energized Porn) and automate it on our own without relying from them.

I can help with some of these but I'm currently focusing on my final year of college (maybe I can in the next 4-5 months). Just sharing my thoughts early.

4skinSkywalker commented 1 year ago
  1. They are pretty much all from PRs or my own algorithmic ideas (e.g. starting from one porn index make a breadth first search and jump from link to link);
  2. I hate both porn and ads, so I mostly work on them and I think you are right about that and I should differentiate those with an automated machine learning script that's capable of distinguish between the two (I cannot afford the time of doing it by hand);
  3. I don't rely on any of the sources you have mentioned, what I can do is to classify a really big dataset of porn vs non-porn sites with similar layouts and train a machine learning model to distinguish between the two and run an automatic mega-search bot that uses breath first search from various points on the Internet of Porns and recursively analyzes links of sites and put new matches into the file.