[Feature] Use Redis as primary database

shizunge commented 1 year ago

Currently Blocky supports "blocking" and "fast" start strategy. (And there is a bug of "fast" strategy in 0.20 fixed in #804 )

Both result in a downtime if blocky re-starts and there is only a single blocky instance.

For blocking, no DNS is available.
For fast, DNS is available, but not blocking.

Does it make sense that we save the processed blocking list to Redis and load them at blocky starting (when Redis is configured). I assume loading the processed blocking list from Redis is much faster than download and process them from internet. This helps reduce the down time between blocky restarts.

kwitsch commented 1 year ago

Thought about it as well. 👍 Would also speedup processing lists in a multi instance environment.

0xERR0R commented 1 year ago

Using Redis as storage would definitively solve some problems.

I have the following concerns/thoughts/ideas:

Currently all blocky instances are independent. Each instance loads the lists from the network and keeps them in an optimised cache. This cache supports TTL. This cache is also used for the search (lookup). If we use Redis to store the lists, then we need the memory twice: Blocky holds the list in memory (each instance) and Redis.

Just fantasising further: we could remove Blocky's cache altogether and just use Redis as the list cache. Redis already supports TTL and search. I.e. when a request comes in, blocky makes a Redis request and checks if the name is on the list. Similarly, we can also do the response cache and prefetching completely in Redis. Of course, there would be some latency with each request, but in a gigabit network this should be negligible. For this, we would not need a cache implementation in Blocky, no more sync between the instances (Redis pub/sub), no more duplicate data storage. We could also keep more things in Redis later, e.g. statistics or configuration.

To roughly sketch it out: Download part periodically downloads the list and stores it directly in Redis. For each request it checks (in Redis) if the result is already cached. If not, check against blacklist (in Redis), lookup via upstream, store result in Redis.

I think Redis is a solid and mature system and it offers many features that are well suited for blocky and we should think about whether it might be a viable way forward. Of course, we also have features that can't be mapped with Redis (e.g. regex search), so that needs to be looked at in more detail.

What do you think about it?

kwitsch commented 1 year ago

@0xERR0R just to be sure: You propose to remove the internal blocky caches in favor of external Redis ones?

I would be in for it since it would solve some problems(as you already said).

Things to consider:

backwards compatibility for setups without Redis
Redis cluster support
speedtest to evaluate response times of Redis vs blocky internal cache
way to migrate regex mappings to Redis query syntax

0xERR0R commented 1 year ago

@0xERR0R just to be sure: You propose to remove the internal blocky caches in favor of external Redis ones?

Yes, it is just an idea. Blocky uses currently some 3rd party systems (postgres, prometheus, redis, ...), why not to use the power of redis and remove "duplicate" caching code from blocky? I think, blocky should do one "thing" really good and for all other things we can use other tools.

I would be in for it since it would solve some problems(as you already said).

Things to consider:

backwards compatibility for setups without Redis

Redis cluster support

speedtest to evaluate response times of Redis vs blocky internal cache

way to migrate regex mappings to Redis query syntax

Of course, this is a complex topic and it is something for blocky v1.0. We should make POC (proof of concept) to check the feasibility and evaluate the performance. And if it is ok, we should check, how can we migrate all features to Redis and maybe use new features (for example we can put the config in redis, this will allow the check the config on-the-fly).

If we make redis mandatory, this will also lead in a more complex installation (for non dockerized platforms, like edge devices), but I think it is ok if we can provide more features.

kwitsch commented 1 year ago

Agreed 👍

I'll try to make a POC for speed testing the query cache. If the results won't be satisfying we should reconsider this idea as everything else is less critical regarding speed.

0xERR0R commented 1 year ago

I'll try to make a POC for speed testing the query cache. If the results won't be satisfying we should reconsider this idea as everything else is less critical regarding speed.

Do you mean the query result cache (cachingResolver)? I can try to test the speed for blocking resolver (put some blacklists into redis and change the lookup logic)

kwitsch commented 1 year ago

I'll try to make a POC for speed testing the query cache. If the results won't be satisfying we should reconsider this idea as everything else is less critical regarding speed.

Do you mean the query result cache (cachingResolver)? I can try to test the speed for blocking resolver (put some blacklists into redis and change the lookup logic)

Yes, in my opinion the query result cache would be most time critical regarding the overall speed.

shizunge commented 1 year ago

We are signficant extending the scope of the bug I original want to solve.

Making blocky instances are independent sounds attractive, even though I am not currently use it on an edge device. I don't mind setup the extra components, yet someone else may do so.

On the other hand, if blokcy must depend on some external services, why it must be redis? Why not sqlite 3 (no need external service) or postgresql?

kwitsch commented 1 year ago

We need a fast object storage and not an SQL database. There is a significant difference between them.

Besides SQLite is a really bad idea if used in a multi threaded environment.

ThinkChaos commented 1 year ago

I don't have Redis at all in my setup and would like to avoid making it mandatory if possible. I do like the idea of removing some caching logic from blocky.

This module seems like it would be a good fit so we can outsource the cache to Redis, with an in memory fallback already implemented: https://pkg.go.dev/github.com/go-redis/cache/v8

EDIT: For regex rules, that module doesn't have anything, but neither does Redis.
Maybe only supporting wildcards ( *.domain.tld) would be enough to cover most use cases. Redis supports that, and we could make a PR to the module.

0xERR0R commented 1 year ago

We are signficant extending the scope of the bug I original want to solve.

Making blocky instances are independent sounds attractive, even though I am not currently use it on an edge device. I don't mind setup the extra components, yet someone else may do so.

On the other hand, if blokcy must depend on some external services, why it must be redis? Why not sqlite 3 (no need external service) or postgresql?

I would say your issue was just a trigger for a new discussion. I have been thinking about this for a long time and we have a lot of features where redis would fit well: We can use redis as cache, as list storage with fast search and as event bus (we do it already partially but only as fallback in background). And we could implement new features (e.g. adding new blacklist entry on-the-fly).

0xERR0R commented 1 year ago

I don't have Redis at all in my setup and would like to avoid making it mandatory if possible. I do like the idea of removing some caching logic from blocky.

This module seems like it would be a good fit so we can outsource the cache to Redis, with an in memory fallback already implemented: https://pkg.go.dev/github.com/go-redis/cache/v8

EDIT: For regex rules, that module doesn't have anything, but neither does Redis. Maybe only supporting wildcards ( *.domain.tld) would be enough to cover most use cases. Redis supports that, and we could make a PR to the module.

I think, the first step should be the evaluation. Performance, overall speed etc. We should also think about, how we can implement existing features with redis (e.g. client lookup, config etc). We should also think about usage without redis (if it can be possible or maybe it is possible but without some features). redis-cache looks interesting. I found also https://github.com/ledisdb/ledisdb, if it is 100% compatible, maybe we can use it as embedded database for users who don't have redis.

0xERR0R commented 1 year ago

We can discuss redis topic here: https://github.com/0xERR0R/blocky/discussions/905

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 1 year ago

This issue was closed because it has been stalled for 5 days with no activity.

0xERR0R / blocky

[Feature] Use Redis as primary database #898