REQUEST-912-DOS-PROTECTION paranoia level 2 sets ip.dos_block=1 outside of dos_burst_time_slice window

jelsdon commented 3 years ago

Describe the bug

I've experienced issues whereby dos-protection is kicking in, without the request count exceeding a threshold within the given time slice window. This appears when the client is slowly sending traffic regularly over a period of time.

Premise is that paranoid level 2 should take into account dos_burst_time_slice before blocking an address, and it is not.

I've tried to follow the logic through https://github.com/coreruleset/coreruleset/blob/v3.3/master/rules/REQUEST-912-DOS-PROTECTION.conf

and looking for validation or correction of my understanding.

With paranoia level set to 2, IP:DOS_COUNTER is able to exceed TX:DOS_COUNTER_THRESHOLD and set ip.dos_block=1 without consideration of account dos_burst_time_slice

id:912160 or id:912161 can reset ip:dos_counter.
id:912160 sets ip.dos_burst_counter=1 irrespective of time taken for ip.dos_burst_counter to reach tx.dos_counter_threshold
id:912171 with ip.dos_burst_counter set to 1, ip.dos_block is set to 1
id:912120 bocks request

With paranoia of 1 this is not an issue as id:912170, both id:912160 and id: 912161 and this flow takes into account tx.dos_burst_time_slice.

https://github.com/coreruleset/coreruleset/blob/v3.3/master/rules/REQUEST-912-DOS-PROTECTION.conf

Steps to reproduce

Set paranoia to 2 and send a slow rate of requests that reach tx.dos_counter_threshold within modsecurity's overall capture time, but less than dos_counter_threshold

Expected behaviour

I expect paranoia 2 to set ip.dos_block=1 only when ip:dos_counter has exceeded tx.dos_counter_threshold within a given tx.dos_burst_time_slice

Actual behaviour

ip.dos_block is set to 1 within id:912171 and traffic is blocked, without ip:dos_counter reaching tx.dos_counter_threshold within tx.dos_burst_time_slice

Additional context

Your Environment

CRS version: 3.3.0
Paranoia level setting: 2

dune73 commented 3 years ago

The plugin pull request is about to be merged. This will allow to take the DOS rules and transform them into an official CRS plugin as has been our plan for quite some time. I will also address this issue here when making that transformation.

Thank you for reporting and sorry for not responding earlier.

jacen05 commented 3 years ago

I think this affects Paranoia level 1 also. The issue was already described here.

I've tried to raise the dos_counter_threshold, but it's just delaying the problem : it seems that dos_burst_time_slice does not work as intended. I'm also using CRS v3.3.0

For now I've disabled the rule, but I'll be happy to test a correction.

dune73 commented 3 years ago

Thank you for this complementary remark, @jacen05.

dune73 commented 2 years ago

Status: This is clearly on my turf and I am postponing this constantly while life interferes. Sorry. Yet I am none the wise and can not give a clear date, I'm behind on many fronts.

diablodale commented 2 years ago

related bug moved to https://github.com/coreruleset/coreruleset/issues/2413 as requested.

azurit commented 2 years ago

@diablodale Thanks for digging into this! Can you be so kind and open a new issue? Thank you.

theseion commented 2 years ago

I think there is a misconception here w.r.t. how DOS blocking works (admittedly, it's not intuitive). TX:DOS_BURST_TIME_SLICE is only relevant for paranoia level 1, because it is applied to IP:DOS_BURST_COUNTER, not to IP:DOS_COUNTER. Hence, at paranoia level 2 the rules block immediately whenever a burst is detected.

It is possible that this was never the intention. To fix it I see two options:

Make PL 2 depend on TX:DOS_BURST_TIME_SLICE:
1. PL 1 should block at IP:DOS_BURST_COUNTER >= 3
2. PL 2 should block at IP:DOS_BURST_COUNTER >= 2
Expire IP:DOS_COUNTER after TX:DOS_BURST_TIME_SLICE, not IP:DOS_BURST_COUNTER (or both, for that matter).

Option two is probably the right way to go.

dune73 commented 2 years ago

Digging in my memory and reporting from family holidays in Calanca, GR, Switzerland, where I used the early morning hours to document and clean up the DOS rules, I can report the following:

The idea was indeed to see PL2 as strict sibling of the burst counter rule in PL1. I think I remember that it was intended to block regardless of burst counter.

I am not convinced this is correct or smart, but this is what I thought was the intention of the rules and one way to add a stricter sibling. A new concept back in the day.

Can you elaborate some more what option 2 would mean and how the behaviour would change.

If we would opt for options 1, then I think there is nothing wrong with this approach. Maybe add a PL3 rule retains the current PL2 behaviour: blocking on the first burst.

theseion commented 2 years ago

I think option 2 is semantically correct. It is not the burst that becomes invalid after a certain time. We actually want some way to define what we consider a burst to look like. I think that we should use two different timeout variables, one for counting requests toward a burst, and one to expire bursts.

What I'm struggling with is how to properly define a sliding window. Static windows would be easy to do but aren't optimal. In theory, an attacker could send requests to determine the window size and then bombard the service such that it never registers enough requests to go into blocking mode. With a sliding window, the attacker would at least have to wait for an entire window before resuming the attack.

However, as pointed out elsewhere, CRS isn't the proper tool to defend against serious DoS attacks, so maybe static windows are good enough.

With the current approach, almost every IP will trigger the first burst, since the request counter is only reset once the threshold has been reached. To retain the current behaviour, we would actually need block at PL1 after the first burst, however, I think it would be better to wait until we've identified a second burst. So, for both PL1 and PL2, adding a timeout to the request counter means that the rules will block later than they would now (one timeout window later) but I think that is actually a good thing and will reduce the number of FP's (I had a couple of issues with FP's and had to increase the timeout / disable the rules for certain IP's).

Technically speaking, we would introduce one new variable TX:DOS_COUNTER_TIME_SLICE which would be initialized to the same value as TX:DOS_BURST_TIME_SLICE by default. Then we'd add a new rule that would expire IP:DOS_COUNTER with the value of TX:DOS_COUNTER_TIME_SLICE if it is unset. When TX:DOS_COUNTER_THRESHOLD has been reached, we unset IP:DOS_COUNTER (instead of using "unset" as the initialization state we could also use the value 0 of the counter; this could be safer to use with the other rules).

To summarise, I think option 2 would be cleaner and semantically correct, however, it also introduces additional complexity into a system that is very hard to understand in its current state. Option 1 would certainly be the "simple" fix.

theseion commented 2 years ago

@dune73 I've created a proof of concept PR https://github.com/coreruleset/dos-protection-plugin-modsecurity-v2/pull/3.

github-actions[bot] commented 1 year ago

This issue has been open 120 days with no activity. Remove the stale label or comment, or this will be closed in 14 days

diablodale commented 1 year ago

ping alive

github-actions[bot] commented 1 year ago

This issue has been open 120 days with no activity. Remove the stale label or comment, or this will be closed in 14 days

diablodale commented 1 year ago

ping alive

fzipi commented 1 year ago

@diablodale Did you tested what @theseion created?

diablodale commented 1 year ago

hi @fzipi, I have not tested the proof of concept. I doubt I can provide solid testing feedback as I do not have deep firewall/attack experience to test it. I can read the code, yet that's not the same. :-/

dune73 commented 1 year ago

I sat down and looked at the proof of concept PR by @theseion linked above.

There are several issues and I eventually gave up on this.

Here is a list:

rule ID 9514110 used twice
dos-protection-before.conf, line 221, single quote not closed
dos-protection-config.conf, line 36, single quote not closed

When fixing these, apache starts.

But then comes the next problem:

The central rule 9514120 runs in phase 1 and thus before REQUEST-901 that initialized the IP collection. 9514120 references the IP collection that is only initialized afterwards. So this can not work and rules ought to be rearranged.

I think we would fare best if IP collection initialization would be removed from main CRS alltogether and the plugin should come with the init that can also be switched off via a config item. Explain in the README that you have to pay attention that not more than one plugin initializes the collection etc. Then it should work.

theseion commented 10 months ago

I've updated the PR with fixes to the format issues you found. Do we have a resolution in collection initialisation by now? I vaguely recall that we had discussed something in a meeting.

RedXanadu commented 10 months ago

@theseion Found it:

CRS no longer needs collections, but plugins may want to work with collections -> Decisions: CRS initializes them for the plugins, off by default

crs-setup.conf has (commented out) rule 900130 which contains the action setvar:tx.enable_default_collections=1 to enable collection initialisation.

theseion commented 10 months ago

I just added the other issue to the list of discussions for tonight. initcol actions must run before plugin setup.

fzipi commented 1 month ago

@theseion What's next here?

theseion commented 1 month ago

We had decided that:

🔵 Decision: @theseion with help of @dune73 and @airween will try to check for initcol. If the collection is initialized by the plugin -> remove the 901 rule via ctl statement and do initcol yourself.

fzipi commented 2 weeks ago

Should we move this one to https://github.com/coreruleset/dos-protection-plugin-modsecurity ? Or the ctl will be for the core instead?

theseion commented 2 weeks ago

Yes

coreruleset / dos-protection-plugin-modsecurity