Ultimate-Hosts-Blacklist / dev-center

The place to talk about our infrastructure or everything related to the Ultimate Hosts Blacklist project.
MIT License
11 stars 2 forks source link

Regex-support for dnsmasq #46

Open ghost opened 4 years ago

ghost commented 4 years ago

@Somebodyisnobody commented on Mar 27, 2020, 2:19 PM UTC:

PiHole FTL and dnsmasq does support regex. We could shrink the file size save ourselves the www. entries. Are there any discussions at the moment about it?

This issue was moved by funilrys from mitchellkrogza/Ultimate.Hosts.Blacklist#564.

funilrys commented 4 years ago

Hi @Somebodyisnobody, Interesting. Actually not but we could implement it in near the future!


Hey @dnmTX @smed79 @mitchellkrogza @Ultimate-Hosts-Blacklist/contributors what do you think of this feature request?

Stay safe and healthy!

spirillen commented 3 years ago

Just to give a little feedback on what you are asking for.

What you'll need to achieve this is minimum one list which hold all domain fir which you are able to blacklist by a wildcard, I did start this once, also to be working for this list (forgot where it is) but that will be quit some work (days of work), on the other hand, you could use any existing wildcard list such as mine https://github.com/mypdns/matrix/tree/master/source, but as I also know you are not "prepared" to use all of them as is I would suggest you hold those against a whitelist before running the process this could safe you a lot of time, and at the same time you could help contributing to those added into my project, we would all then be benefiting from each others time and knowledge.

Somebodyisnobody commented 3 years ago

What you'll need to achieve this is minimum one list which hold all domain fir which you are able to blacklist by a wildcard

So you mean that wildcard blocking would not give the possibility to exempt some subdomains? And that we have no index of active subdomains if we want to block only 99 of 100 subdomains of example.com? You fear that the last of 100 would be also blocked? I don't want block in wildcard style, I want to use regex. And for hermless sites like newspaperexample.com only ads.newspaperexample.com could be blocked without using regex.

I think for something like this: (ads|advertising|tracking|mining).newspaperexample.com instead of:

ads.newspaperexample.com
advertising.newspaperexample.com
tracking.newspaperexample.com
mining.newspaperexample.com

newspaperexample.com itself would still be callable.

When I understand it right @funilrys plans to redo the backend with a shadowdatabase. A central database where lists are indexed, sorted and where black- and whitelister can add/remove entries and where in the end a new list is generated from the database. In the last step, the generation could create such a regex-list: database: id domain source cur_revision ro_locked_by_admin
1 ads.newspaperexample.com 1 a518b8fb-83fc-4c17-86a2-6791c82e691a
2 advertising.newspaperexample.com 1 d1964bad-b252-4410-8416-402174341b8d
3 tracking.newspaperexample.com 3 d1964bad-b252-4410-8416-402174341b8d
4 mining.newspaperexample.com 2 3d631393-cccc-47cb-9c5e-2cf0e23566a3
5 www.badsite.com 17 a6250a84-3eac-42e1-835a-32fc52a24bc5 1
6 badsite.com 17 fb298064-28d8-43cb-8a98-dcdad0489f91 1

generation result:

^(ads|advertising|tracking|mining).newspaperexample.com$
^(www\.)?badsite.com$

or *.badsite.com depending how we interpret the target of blocking www.badsite.com and badsite.com

So if I understand you right then I don't see the logic problem for automatic generation of such a list.

spirillen commented 3 years ago

I think for something like this: (ads|advertising|tracking|mining).newspaperexample.com

That's one it new to me in conjunction with dnsmasq :smiley: but clearly improvement

When I understand it right @funilrys plans to redo the backend with a shadowdatabase. A central database where lists are indexed

Yes, you can read a bit more about that here: https://www.mypdns.org/project/view/15/, and please do feel free to come up with any idea, as you usually have some good once :smiley:

generation result:

^(ads|advertising|tracking|mining).newspaperexample.com$
^(www\.)?badsite.com$

But what about example google-analytics.com, 207.net or 2o7.net would you append all subdomains to them as well? or would you rather just wildcard them out?

In either way, you would need a list (DB) which hold all exception (WhiteList) and what should just be using the wildcard feature *.google-analytics.com

rusty-snake commented 3 years ago

I think for something like this: (ads|advertising|tracking|mining).newspaperexample.com

That's one it new to me in conjunction with dnsmasq smiley but clearly improvement

Because dnsmasq (upstream) has no regex support. Only patched versions like this.

Somebodyisnobody commented 3 years ago

Because dnsmasq (upstream) has no regex support. Only patched versions like this.

Oh really? You're blowing up my dreams... But at least Pihole supports regex: https://docs.pi-hole.net/ftldns/regex/tutorial/

But what about example google-analytics.com, 207.net or 2o7.net would you append all subdomains to them as well? or would you rather just wildcard them out?

Depends on how we're defining for example google-analytics.com. If there's no subdomain worth to be whitelisted I would say *.google-analytics.com or in regex-form ^\w*\.google-analytics\.com$. In doubt we always have a list of bad subdomains otherwise and can do ^(...|...|...).google-analytics.com$

spirillen commented 3 years ago

Hey @Somebodyisnobody

I'm sorry to be the sure berry in your basket, however if you should make a RegexList it should be holding more suffiticated rules like them I have in my (Private at home) DNSDist

addAction(RegexRule("(^|[.])(android|google|connectivitycheck[.]gstatic|cloudconfig[.]googleapis|play[.]googleapis|2ctcysy2xi[.]execute-api[.]us-west-1[.]amazonaws)[.][a-z]{2,5}(([.][a-z]{0,2})?)$"), SpoofAction('192.168.1.1'))
addAction(RegexRule("(^|\\.)(207|2o7|admob|cookiebot|cxense|doubleclick|firebaseapp|google(-)?analytics|googleapis|googletagmanager|gstatic)\\.[a-z]{2,5}((\\.[a-z]{0,2})?)$"), RCodeAction(DNSRCode.NXDOMAIN))

And what you are requesting, beside the list itself, is actually already invented.... It's called RPZ (Response Policy Zones) and it support personal WhiteList, which to me is the way to go, as a lot of WhiteListed shit never should have been whitelisted, therefore I'm against the Globally Whitelisting idea as such.

And I'll recommand you are using these days to play with RPZ on PowerDNS Recursor behind a DNSDist (Regex)

I have made a simple install configuration starter kit here https://www.mypdns.org/source/dns-rpz-integration/

Try it, and let's see what you think.... I'll bet a :banana: that you'll never go back to anything elsee :smiling_imp:

Somebodyisnobody commented 3 years ago

What do you mean with global whitelisting?

spirillen commented 3 years ago
.. glossary:
    Globally Whitelisting
    A global whitelist is a desisting system which hold the the purpose of making Unwanted where available to the public, against the declared purpose of a BlackListing 

In other words, if you have a list that removes (Spy/TrackWare)Ware such as Google or Facebook from a list which should exactly protect against such, but because it breaks things, well then you have a global whitelist which removes these from any sources to ensure these can continue running there spyware on any unknowing human.

And here it turns really ugly...

Let's say I was a 15 yo who got my first computer and would like to protect my self from all the creeps out there.

Now I find a number of lists who claims to help me stay protected against such thins, as example SB's list. Now I have installed these in good faith and believes everything is good and I'm better protected against these wolfs, while in fact you are exactly the oppose as many list holders is scared of be "unserious" if they actually do blacklist the suckers for what they are.

WHY: Because we all together is to scared to actually help educate out next of kind and learn them how creepy site like google, fakebook and twitter is.

This make me think of a scene from Men In Black where Tommy Lee Jones and Will Smith are sitting on a bench after Will is about to be recruited.

The person is smart, people are dumb...

This is here the we will find the needs for:

  1. Education
  2. The personal whitelist
  3. Ditching "global" whitelists to unblocking shit, leaving users with a false hope of protection.
  4. We need to get things changed to work in the right direction.
    1. You have now been told this x.y domain is considered bad
    2. Thanks for the warning
    3. but I chose to whitelist it, because my redneck uncle told me too

Sorry for the bit long reply, but you did hit a little red button :smiley: Hoping you find the answer in my reply :wave:

Somebodyisnobody commented 3 years ago

WHY: Because we all together is to scared to actually help educate out next of kind and learn them how creepy site like google, facebook and twitter is.

Yes, I explained to my mom why she cannot click on: grafik She didn't give a shit, she wanted to click on it. The " dumb user " are simply too comfortable when buying a vacuum cleaner. They do not want to inform themselves independently but consciously accept the non-neutral offer of advertising. She said "I don't want your blocking thing [ps: pihole] anymore". In result I whitelisted googleadservices.com for her device in my pihole 😒


But back to the roots:

In other words, if you have a list that removes (Spy/TrackWare)Ware such as Google or Facebook from a list which should exactly protect against such, but because it breaks things, well then you have a global whitelist which removes these from any sources to ensure these can continue running there spyware on any unknowing human.

But that's exactly the current concept with Ultimate-Hosts-Blacklist/whitelist...

I said:

If there's no subdomain worth to be whitelisted I would say *.google-analytics.com or in regex-form ^\w*\.google-analytics\.com$.

With "worth to be whitelisted" I didn't propose a whitelist where you can exempt bad-tracking.domain, I mean a whitelist for false positives (but didn't specify it above). google-analytics was just the first blacklisted domain in my mind. So and you say that this false positive for example "1drv.ms" (onedrive redirecting service for shares) is a bad tracking domain. That's right, I think Microsoft will use this shortener for counting the clicks but if we'ld keep it blocked onedrive shares are not callable for the "dumb user". And here comes your proposal which results in a new idea:

So indirectly you propose a interface where the user can add and remove some predefined and community-managed whitelists. The interface then generates a RPZ-rule (or a pihole-working regex list like described above or format foo) applying only the selected whitelists onto the blacklist. Here the user can decide which whitelist(s) he want to take. With a db in the backend we could link additional comments to whitelist revisions/lines to inform the user why a specific domain was whitelisted and who whitelisted it.

In conclusion: We need a system that allows the user to decide which whitelist he want to apply. The system could then export the list in format foo and provide it to the user. Of course with API integration for automated refreshing in the deployment. How do you feel about that?

spirillen commented 3 years ago

Yes, I explained to my mom why she cannot click on:

WHAAAAAAAAAAAAAAAAAAAAAAAAAT, send her to me and I'll teach her a bit of S/M :rofl: :laughing: :older_woman:

Ultimate-Hosts-Blacklist/whitelist

Yep and it contain fakebook.com so it was in my mind :smirk:

With "worth to be whitelisted" I didn't propose a whitelist

It was purely in ref to your Q about Global whitelist

"1drv.ms" (onedrive redirecting service for shares) is a bad tracking domain. That's right, I think Microsoft will use this shortener for counting the clicks but if we'ld keep it blocked onedrive shares are not callable

Here the RPZ is your friend again... you can bypass the middle lookup I would have loved posting the example here, but couldn't "just" find it, but it is somewhere on https://mypdns.org/

I use it for example to redirect Windows update to the EU servers bypassing the tracking domain, this gives all users the GDPR "protection" and they can claim all data collected deleted :wink: nothing is ever routed to any destination outside EU :+1:

So indirectly you propose a interface where the user can add and remove...

Yep's That it part of the Pyramide top (end coal) for the matrix.rocks project

additional comments to whitelist revisions/lines to inform the user why a specific domain was whitelisted and who whitelisted it

Part of https://mypdns.org/my-privacy-dns/issues/-/issues/2686

PS: You should be contributing with all your idea's, we do agree about more than not :smiley: and @funilrys is working his pants of to get us to the @pyfunceble v4 so we can get started on the next part of that project, so please, add your suggestions and idea's. :de: :denmark: