Subdomain matching / regex support

0xERR0R / blocky

Fast and lightweight DNS proxy as ad-blocker for local network with many features

https://0xERR0R.github.io/blocky/

Apache License 2.0

4.8k stars 210 forks source link

Subdomain matching / regex support #12

Closed dtjm closed 3 years ago

dtjm commented 4 years ago

Hello,

It looks like the current implementation does exact matching on the domain name. I'm wondering if you would be open to a change that adds suffix matching, such that a blacklist entry of foo.com would also block *.foo.com, *.*.foo.com, etc.

I would probably implement it using a trie data structure to minimize the memory cost and lookups would probably be O(n) where n is the length of the search string.

This behavior could be configurable so that the default behavior remains the same unless this feature is enabled.

0xERR0R commented 4 years ago

Blocky supports currently black and whitelists in hosts format. This format does not allow to define wildcards etc., only whole domain names. There are lot of lists with domain names for ads in this format. Disadvantage is: one record must exist for each subdomain. There is also another common format of blacklists in the wild: adblock format. Lists in this format are more flexible, you can define subdomains and use wildcards. Another option is a regex format, like pihole is using. I think, it would be nice to have an option to block a domain with wildcard. Maybe a regex is more powerful than adblock or just wildcard.

anaschaudhary33 commented 3 years ago

I also want to see this feature in blocky.

0xERR0R commented 3 years ago

I see the need of this feature, but I'm not sure, what is the best approach to implement it:

Simple wildcard for subdomains: user can define a custom list entry with wildcard for subdoman, e.g. *.youtube.com. This would block only the subdomains like "www.youtube.com" and "m.youtube.com". The implementation is simple and the performance impact is very low. But this is not very flexible, for example you can't define `*youtube* to block "m.youtube.com" and "youtube-otherdomain.de".
Regex approach: user can define a regex entry like ^.*youtube.*$. Very flexible approach, but it needs more CPU resources for each request. It is necessary to introduce some "magic" character to distinguish between a "normal" host name and a reges (e.g. ^ or like AdguardHome /
Some fancy lists (adblock format)

Any ideas?

anaschaudhary33 commented 3 years ago

I think adblock format would be best. As it is much popular nowadays and give more flexibility.

0xERR0R commented 3 years ago

This is correct, but adblock format was designed for client side blocking (where you have the whole url). Here we have only the domain name, therefore it is only a sub set and you can't use all predefined lists

anaschaudhary33 commented 3 years ago

Yes. But we can use the manually defined list. And for predefined lists we can rely on the host format.

I was previously using adguard home and it also works with Adblock rule.

shahbazkhan777 commented 3 years ago

I will vote in favour of regex format if you want manually defined blocklists. It would be more powerful than simple wildcard or adblock plus format.

0xERR0R commented 3 years ago

I tend to the solution with regex too, and I hope this will not have much performance impact (every request must be checked against all defined regex, but the number of regex should be negligible. I also prefer AdguardHome approach: each regex entry must be enclosed in "/", for example /^banners?[_.-]/

LexterS999 commented 3 years ago

So it's pihole regex style or adguard regex style?

0xERR0R commented 3 years ago

Hey, not sure about pihole, but adguard format works. For example: https://github.com/mmotti/adguard-home-filters/blob/master/regex.txt