Closed dtjm closed 3 years ago
Blocky supports currently black and whitelists in hosts format. This format does not allow to define wildcards etc., only whole domain names. There are lot of lists with domain names for ads in this format. Disadvantage is: one record must exist for each subdomain. There is also another common format of blacklists in the wild: adblock format. Lists in this format are more flexible, you can define subdomains and use wildcards. Another option is a regex format, like pihole is using. I think, it would be nice to have an option to block a domain with wildcard. Maybe a regex is more powerful than adblock or just wildcard.
I also want to see this feature in blocky.
I see the need of this feature, but I'm not sure, what is the best approach to implement it:
*.youtube.com
. This would block only the subdomains like "www.youtube.com" and "m.youtube.com". The implementation is simple and the performance impact is very low. But this is not very flexible, for example you can't define `*youtube*
to block "m.youtube.com" and "youtube-otherdomain.de".^.*youtube.*$
. Very flexible approach, but it needs more CPU resources for each request. It is necessary to introduce some "magic" character to distinguish between a "normal" host name and a reges (e.g. ^
or like AdguardHome /
Any ideas?
I think adblock format would be best. As it is much popular nowadays and give more flexibility.
This is correct, but adblock format was designed for client side blocking (where you have the whole url). Here we have only the domain name, therefore it is only a sub set and you can't use all predefined lists
Yes. But we can use the manually defined list. And for predefined lists we can rely on the host format.
I was previously using adguard home and it also works with Adblock rule.
I will vote in favour of regex format if you want manually defined blocklists. It would be more powerful than simple wildcard or adblock plus format.
I tend to the solution with regex too, and I hope this will not have much performance impact (every request must be checked against all defined regex, but the number of regex should be negligible. I also prefer AdguardHome approach: each regex entry must be enclosed in "/", for example /^banners?[_.-]/
So it's pihole regex style or adguard regex style?
Hey, not sure about pihole, but adguard format works. For example: https://github.com/mmotti/adguard-home-filters/blob/master/regex.txt
Hello,
It looks like the current implementation does exact matching on the domain name. I'm wondering if you would be open to a change that adds suffix matching, such that a blacklist entry of
foo.com
would also block*.foo.com
,*.*.foo.com
, etc.I would probably implement it using a trie data structure to minimize the memory cost and lookups would probably be
O(n)
wheren
is the length of the search string.This behavior could be configurable so that the default behavior remains the same unless this feature is enabled.