Open aleksandarmomic opened 2 years ago
Hi, thanks for the suggestion.
Yes, caching would definitely be a nice feature, plus the data are quite suitable to be cached.
I'm more concerned about the cache duration, since the bouncer does not own the data, nor can be notified on change. I guess the ban duration is a first step? Though, a scanner would be allowed for far too long. Maybe also a cache eviction with a number or call?
About the cache location, I'm thinking of memory first: can't be faster and easier to garbage collect in case of bug. Lastly, disk IO are a pain that I don't want to launch myself into... Second would be Redis, well known and battle tested.
What do you think?
Hello! I leave what I think:
1) CacheTime: Configurable environment variable 2) Redis is the best for that.
Hi, thanks for the suggestion.
Yes, caching would definitely be a nice feature, plus the data are quite suitable to be cached.
I'm more concerned about the cache duration, since the bouncer does not own the data, nor can be notified on change. I guess the ban duration is a first step? Though, a scanner would be allowed for far too long. Maybe also a cache eviction with a number or call?
About the cache location, I'm thinking of memory first: can't be faster and easier to garbage collect in case of bug. Lastly, disk IO are a pain that I don't want to launch myself into... Second would be Redis, well known and battle tested.
What do you think?
@fbonalair
With a quick look at the crowdsec api values returned from the decisions endpoint include the decision duration
and the until
per one decision. I believe those can be used to control the cache lifetime. This looks pretty promising for Redis as a cache as it is possible to control the lifetime per ip which is not the case for the in-memory or file based caching as you would have to control the deletion of expired decisions.
@fbonalair With a quick look at the crowdsec api values returned from the decisions endpoint include the
decision duration
and theuntil
per one decision. I believe those can be used to control the cache lifetime.
That is what I was thinking of using for cache lifetime. Though, I'm still worried about first offenders getting unlimited access for the cache duration. I guess while looking for a solution I will put the caching system as non default with a warning.
Hello! I leave what I think:
1. CacheTime: Configurable environment variable
Depending on the caching solution parameters, some will be available through environment variables. Thanks for the suggestion.
Any updates to this issue?
I had to shut down my traefik-crowdsec-bouncer. My server would randomly went unresponsive even over ssh because the bouncer-container got unstable and jamed up the cpu. My initial guess was also that this is caused by some sort of overload issue with to many requests and therefore to many calls to the crowdsec-LAPI via the bouncer-middleware.
I read some of the decumentation over at crowdsec an found, that the official nginx-bouncer has two operation modes:
That sounds like a solid solution in my estimation. Wouldn't that be also beneficial for the traefik-bouncer, especially in a more demanding environment or with limited ressources.
Hello,
I've been following this project for a while and wanted to contribute somehow.
I've implemented a local cache using the library go-cache
It is configurable using 2 environnement variables:
CROWDSEC_BOUNCER_ENABLE_LOCAL_CACHE
- Configure the use of a local cache in memory. Default to falseCROWDSEC_DEFAULT_CACHE_DURATION
- Configure default duration of the cached data. Default to "4h00m00s"When the cache is enabled, the first time an IP has to be checked, it is first looked up in the local cache. This can produce 2 outcomes:
Cache invalidation is provided by the library, a background job will remove from the cache every entry which are not valid anymore. This background job runs every 5 min (could be configured), and the default cache validity is 4h and can be overrided by using CROWDSEC_DEFAULT_CACHE_DURATION.
I've got some idea on how to implement a redis configurable version as well to mix cache with the streaming mode which could greatly improve performance.
What do you think about this ? @el-joseppe @fbonalair
Hello,
I've been following this project for a while and wanted to contribute somehow.
I've implemented a local cache using the library go-cache
It is configurable using 2 environnement variables:
CROWDSEC_BOUNCER_ENABLE_LOCAL_CACHE
- Configure the use of a local cache in memory. Default to falseCROWDSEC_DEFAULT_CACHE_DURATION
- Configure default duration of the cached data. Default to "4h00m00s"When the cache is enabled, the first time an IP has to be checked, it is first looked up in the local cache. This can produce 2 outcomes:
- the IP was found (was it considered malicious or not ?) -> we can continue without asking crowdsec
- the IP was not found -> we have to ask crowdsec and cache the result after the first request
Cache invalidation is provided by the library, a background job will remove from the cache every entry which are not valid anymore. This background job runs every 5 min (could be configured), and the default cache validity is 4h and can be overrided by using CROWDSEC_DEFAULT_CACHE_DURATION.
I've got some idea on how to implement a redis configurable version as well to mix cache with the streaming mode which could greatly improve performance.
What do you think about this ? @el-joseppe @fbonalair
I've just finished working on the streaming mode, it works pretty well.
At the start it takes all known banned IP and cache it in local-cache, and then every minute local cache is updated with the new information only. I used the robfig/cron library for the recurrent job
It can also be configured with env variables:
CROWDSEC_LAPI_ENABLE_STREAM_MODE
- Enable streaming mode to pull decisions from the LAPI. Will override CROWDSEC_BOUNCER_ENABLE_LOCAL_CACHE and enable it. Default to "true"CROWDSEC_LAPI_STREAM_MODE_INTERVAL
- Define the interval between two calls to LAPI. Default to "1m"I've took the liberty to enable it by default. any feedback appreciated @fbonalair
I took the liberty only to review the #33 PR since it's written to be based on the #32. Anyway, many thanks for the work! I have put some comments as reviews.
About the default mode, couple of thoughts:
To prepare for a redis cache or other caches, I would be nice to externalize the cache logic into it's own file / service / folder. And depending on user configuration, the right one would initialized in bouncer.go . It was my rough start in branch feat/cache.
Though it's not mandatory in a first cache implementation.
Currently every request gets forwarded to crowdsec one by one and it is slow and resource intensive. In my setup I have setup mariadb additionally and calling crowdsec on every request results in a call to the db. All this can be avoided with a single json file with cached ip addresses on the bouncers side. Similar to how cloudflare bouncer is caching them. This also results in pretty big mariadb binary logs. Simple cache mechanism would save space and increase performance by having less impact on the system. File based caching (like json) would be enough, but redis would be awesome.