crowdsecurity / cs-firewall-bouncer

Crowdsec bouncer written in golang for firewalls
MIT License
103 stars 41 forks source link

Reactivity is below's fail2ban #18

Open ririsoft opened 3 years ago

ririsoft commented 3 years ago

Hello,

My understanding is that cs-firewall-bouncer polls the local api for changes every 10s by default. This basically mean that the worth response time to an attack is 10s which is too late for me.

I am running fail2ban and crowdsec in parallel. Crowdsec generally discover an attack sooner than fail2ban (couple of milliseconds, can be a second or 2 sometimes), but fail2ban always bans immediately while cs-firewall-bouncer has to wait until the next poll.

Polling is a no-go for me compared to what I have with fail2ban. Are you considering using a "publisher-subscriber" architecture in the near future ? Or am I missing something ?

Thanks in advance for your help.

buixor commented 3 years ago

Hello,

Polling is a no-go for me compared to what I have with fail2ban.

I understand that the default 10s delay might be blocking for you. However, did you try decreasing the delay to a second or such? (LAPI access log might, however, increase a lot in size, we should figure it out)

Are you considering using a "publisher-subscriber" architecture in the near future?

We want to stick to http for bouncers -> LAPI communications (for both ease of integration into existing environments, and to keep bouncer's implementation complexity low), but yes the idea of the stream mode is to mimic a pub/sub approach. If you have suggestions about this, don't hesitate! (long living GET http requests for ex)

Or am I missing something ?

No I don't think so. However, I'd be curious to have your feedback on a lowered pull delay? It won't achieve fail2ban reaction's time, but I think it's the price to pay for the stream mode (for now).

ririsoft commented 3 years ago

I'd be curious to have your feedback on a lowered pull delay? It won't achieve fail2ban reaction's time, but I think it's the price to pay for the stream mode (for now).

When you want to block a botnet brute forcing a user credential every millisecond count. I am not going to trade Fail2ban for something which is less reactive, sorry. Crowdsec is faster at detecting but lower at reacting, loosing its advantage over Fail2ban.

Also I believe that polling LAPI does not scale and the logging volume issue you mentioned is an early sign of it. What if I have a farm of 10th of thousands of servers (My case at work) poll a single LAPI instance every second ? Will LAPI scale ? What about such waste of costs also (CPU, Network, Logs ...), cloud is expensive, every single euro saved matters.

Last I believe it does not scale for very small deployment too, like a Raspberry Pie for a home server or Internet of Things. On such small deployment I want the Raspberry Pie to sleep as much as possible and consume as low power as possible, and generate as low volume of logs as possible. On such deployment I am expecting all my services to be running asynchronous and event based.

We want to stick to http for bouncers -> LAPI communications (for both ease of integration into existing environments, and to keep bouncer's implementation complexity low), but yes the idea of the stream mode is to mimic a pub/sub approach. If you have suggestions about this, don't hesitate!

What about having bouncers exposing a HTTP API to LAPI ? Bouncers would register their URL to LAPI. LAPI would call the url (HTTP POST) to ban/unban some IPs each time an event occur.

This is how Prometheus do it and it works very well. This is still polling as far as Prometheus is concerned, but the other way round from server to clients. In the case of Crowdsec this won't be polling anymore but an event based (publisher/subscriber) architecture which leads to a drastic saving of resources and the best possible reactivity. This also scales very well. You can trigger 10th of thousands of HTTP POST from one server asynchronously, Go really shine at this.

buixor commented 3 years ago

Hello,

Also I believe that polling LAPI does not scale and the logging volume issue you mentioned is an early sign of it. What if I have a farm of 10th of thousands of servers (My case at work) poll a single LAPI instance every second ? Will LAPI scale ?

Somehow (but you don't much more than my words for it :p) it should scale. On a t2.medium we benched LAPI at a bit more than 1.5K EP/s. However, I do see your point and agree with it.

What about having bouncers exposing a HTTP API to LAPI ? Bouncers would register their URL to LAPI. LAPI would call the url (HTTP POST) to ban/unban some IPs each time an event occur.

This is something that we have discussed in the past and pushed further on the roadmap. However, with the new upcoming bouncers, having a pub/sub (webhook) makes more and more sense.

We are going to have a workshop about it this week, and I'll let you know soon how/when we put it on the roadmap.

Thanks for your valuable feedback as always!

ririsoft commented 3 years ago

This is something that we have discussed in the past and pushed further on the roadmap. However, with the new upcoming bouncers, having a pub/sub (webhook) makes more and more sense.

We are going to have a workshop about it this week, and I'll let you know soon how/when we put it on the roadmap.

I understand this kind of change requires time and long discussions. Have a nice workshop ... and take time to rest and take care of yourself and your family !

A merry Christmas to you and all the team.

g00g1 commented 12 months ago

Any updates on this issue? I am interested in this too.

adam-ah commented 10 months ago

I agree with @ririsoft that the delay is quite unexpected. It appears that setting a low bucket capacity may not serve much purpose, given the slow banning process. In my current log, I can see about a page and a half of attacks with a bucket size of 4, and by the time a ban actually takes effect, the bucket could have been overfilled multiple times.

Although there's a significant amount of effort and innovative thinking behind CrowdSec, it seems like the creators may not have fully considered the strengths of fail2ban before building a different product. It's a bit disappointing considering the hard work that has gone into developing CrowdSec.

W1zzardTPU commented 1 month ago

+1 on this .. this makes Crowdsec useless for large deployments ..

implementing long polling HTTP for the API's streaming mode should be pretty easy? GRPC would be the modern way

Maybe as workaround until they implement this (if ever), the HTTP notification plugin could be adapted to notify a list of servers to refresh their lists