chadmayfield / my-pihole-blocklists

Create custom pi-hole blocklists
GNU General Public License v3.0
334 stars 72 forks source link

porn subdomains are not blocked #3

Closed Mamak2000 closed 5 years ago

Mamak2000 commented 6 years ago

Hi Chad,

Since the pi-hole team told me that wildcard blocking was not working with lists, I open an issue just in case you find time to fix your huge list... (https://discourse.pi-hole.net/t/subdomains-not-being-blocked-v3-1-4/4787/3)

On my pi-hole (V3.1.4) the URL like "porn-domain.com" is blocked but if I enter "www.porn-domain.com" the page loads. The pi-hole log confirms this : Sep 8 23:29:25 dnsmasq[4743]: query[A] myfreecams.com from 192.168.50.131 Sep 8 23:29:25 dnsmasq[4743]: /etc/pihole/gravity.list myfreecams.com is 192.168.50.55 Sep 8 23:29:25 dnsmasq[4743]: query[A] myfreecams.com from 192.168.50.131 Sep 8 23:29:25 dnsmasq[4743]: /etc/pihole/gravity.list myfreecams.com is 192.168.50.55 Sep 8 23:29:25 dnsmasq[4743]: query[AAAA] myfreecams.com from 192.168.50.131 Sep 8 23:29:25 dnsmasq[4743]: forwarded myfreecams.com to 192.168.50.1 Sep 8 23:29:25 dnsmasq[4743]: query[AAAA] myfreecams.com from 192.168.50.131 Sep 8 23:29:25 dnsmasq[4743]: forwarded myfreecams.com to 192.168.50.1 Sep 8 23:29:25 dnsmasq[4743]: reply myfreecams.com is NODATA-IPv6 Sep 8 23:29:30 dnsmasq[4743]: query[A] www.myfreecams.com from 192.168.50.131 Sep 8 23:29:30 dnsmasq[4743]: forwarded www.myfreecams.com to 192.168.50.1 Sep 8 23:29:30 dnsmasq[4743]: reply www.myfreecams.com is 207.229.73.118 Sep 8 23:29:30 dnsmasq[4743]: reply www.myfreecams.com is 207.229.73.117 Sep 8 23:29:30 dnsmasq[4743]: query[A] www.myfreecams.com from 192.168.50.131 Sep 8 23:29:30 dnsmasq[4743]: cached www.myfreecams.com is 207.229.73.117 Sep 8 23:29:30 dnsmasq[4743]: cached www.myfreecams.com is 207.229.73.118

So perhaps doubling the number of entries by adding www. on top of every domain would do the trick. But this would lead to a tremendously huge list...

chadmayfield commented 6 years ago

Thanks for the report. As I commented on my blog. I am seeing the same behavior. I picked a dozen domains and tried them as the base domain and with the www subdomain. About half of them were blocked and the other half were not, so there is definitely issues with the list. I'll do some research and try to get this fixed when I have some time (I'm really busy right now so it might take some time).

Mamak2000 commented 6 years ago

I also have to add a Feature Request at pi-hole in order to ask for a wildcard implementation while using lists.

mettix commented 6 years ago

I have ran a script to add the www. in front of the domains. I only have done this with the light version. : )

chadmayfield commented 6 years ago

The problem with doing this is that you will miss the base domain too. So you'll need two entries for each domain listed. One for the www. domain and one for the base domain. This increases the list by two fold, but that is where is gets tricky. It doesn't happen will all domains. This is on my list to enhance the list generation and actually test the blocking of each domain and add the www subdomain if it is detected that it is not blocked.

mettix commented 6 years ago

I know, I have the original list as well. Thats why I only did it with the small list. The list is double the size, but still not as big as the heavy one. That will do for now! Thanks for your great work on this btw!

chadmayfield commented 6 years ago

Perfect! Sorry I haven't been able to fix it yet, it will be the first thing I have to fix when I get back to this in the near future. I have been slammed at work and home, and just got back into the country after being gone for two weeks. So it is definitely on my radar. Thanks for using it, I welcome any suggestions to the lists.

chadmayfield commented 5 years ago

@Mamak2000, sorry it has taken me so long to close this issue. I have created a branch that I will be working to fix this the correct way. For now, I have put a temp hack in place to add the www to any domain in the Alexa top1m that matches the upstream domain list. I have not done it for the Heavy, 1.8m domain list, since it will make the list very large (approx 3.4m domains) and I don't want to push my Pi that hard, yet.

OLD: Light blocklist created: pi_blocklist_porn_top1m.list (14107 lines) NEW: Light blocklist created: pi_blocklist_porn_top1m.list (28214 lines)

I have not merged into master because I want to test some things first, which could take a couple of weeks on my current schedule. Right now I am creating a test to actually hit each domain and check that it is blocked. If it isn't it will check to make sure it's the site uses the www 'subdomain' by grabbing server headers and HTTP codes. So in the end I'll have a curated list, both light and heavy, with each domain tested to make sure it blocks on the Pi-Hole.

Without something like subdomain enumeration it is possible that a user could have a direct link (to say a video or image sitting on a CDN subdomain) that could slip through the Pi-Hole. That is out of scope of what I wanted to do with this list. Using the plain domain and the www 'subdomain' will be good enough for most users.