fix: prevent invalid commands matches on 5 characters or less (932230 PL-1, 932235 PL-1, 932236 PL-2, 932237 PL-3, 932239 PL-2, 932250 PL-1, 932260 PL-1)

EsadCetiner commented 3 months ago

There have been many reports of false positives with the 932260 family of rules, many of these stem from invalid commands being matched (such as idendity matching id). This pr solves most of these issues by preventing matches of invalid commands on 5 characters or less, I didn't prevent invalid matches for all commands to avoid the regex size exploding in size. I've modified the definition of @ from [\s<>&|)] to (?:[\s<>&|)]|$) and adding @ to all commands with 5 characters or less. This results in either matching a command without arguments, or an command with an argument without matching permutations of a commands (For example, sudo matching sudoaaaa).

Since this is a big change, I likely may have missed some commands that were intended to be matched (For example, ip matching iptables and ip6tables) but were shortened for better performance.

Some new attacks are detected (See: 932230-38, 932230-39, 932232-4, 932237-2, 932237-6, 932237-7, 932237-16, 932250-4 tests)

Fixes false positives with english words: idendity unique time express

This should fix false positives with UUID, session cookies, and tokens.

closes: https://github.com/coreruleset/coreruleset/issues/3711 https://github.com/coreruleset/coreruleset/issues/3631 https://github.com/coreruleset/coreruleset/issues/3725

Alternative to https://github.com/coreruleset/coreruleset/pull/3727

This PR may introduce new false positives if for example there's a paramater named mail or a paramater only contains mail. Although these false positives are mostly at PL-2 and are less common than the false positives this PR fixes.

theseion commented 3 months ago

I'll need a couple of days before I can look at this PR in detail.

fzipi commented 3 months ago

Since this is a big change, I likely may have missed some commands that were intended to be matched (For example, ip matching iptables and ip6tables) but were shortened for better performance.

Just as a note: ip is a command, not a shortened version.

fzipi commented 3 months ago

(?:[\s<>&|)]|$)

Maybe using a word boundary instead? Like (?:[\s<>&|)])?\b.

EsadCetiner commented 3 months ago

@fzipi

Maybe using a word boundary instead? Like (?:[\s<>&|)])?\b

I've further simplified this by just using a word boundry and getting rid of the optional non-capturing group. The effect is pretty much the same, detection might be slightly better, but the generated regexes have reduced in size dramatically with this change (especially 932236).

For some reason only using a word boundry for rule 932237 crashes Apache, I couldn't figure out the cause so I just added a workaround and comment.

theseion commented 3 months ago

Thank you very much @EsadCetiner for this PR. I think the idea of looking for word boundaries is great. What I don't like is that we now need to add @ everywhere. It was bad enough already.

I have an idea for a different approach that I think makes more sense in the long run. I think we should simplify the command line processor in crs-toolchain and remove the logic for @ and ~. Instead, a block processed with the command line processor should end with a \b. Consider the following block:

##!> cmdline unix
  gcc
  sudo
##!<

The result would look as follows (simplified):

(?:gcc|sudo)\b

This still leaves ~. I think we can take care of that by using \w+ instead. Consider the following:

##!> cmdline unix
  python\w+
##!<

This would result in the following (simplified):

(?:python(?:\w+))\b

This would match python2 and python3, for example, but not python. I would make these replacements in the list of commands directly, as there are only a couple of them anyway.

In conclusion:

we would get rid of @ and ~
we could remove the replacements for @ and ~ in the assembly files for rules that don't use the command line processor
we would fix most of the false positives that have been reported over recent weeks where a command is matched as a prefix
we would reduce the sizes of many regular expressions significantly

@EsadCetiner Please open a separate PR with the additions (I saw a couple svn related commands).

theseion commented 3 months ago

I realise your PR solves an issue we have now and will prevent more issues from being filed. I propose to go ahead with your PR and implement my proposal later on.

To complete your PR, @EsadCetiner, you'll have to also adapt toolchain.yaml.

EsadCetiner commented 2 months ago

@theseion

I have an idea for a different approach that I think makes more sense in the long run. I think we should simplify the command line processor in crs-toolchain and remove the logic for @ and ~. Instead, a block processed with the command line processor should end with a \b.

Agreed, I had a general idea of something like this but I wasn't sure on how to approach this.

I realise your PR solves an issue we have now and will prevent more issues from being filed. I propose to go ahead with your PR and implement my proposal later on.

4.5.0 is coming out in 3 weeks or so, I'll try and address all of your feedback before then. I'd rather get it out of the way now so it's not forgotten.

Please open a separate PR with the additions (I saw a couple svn related commands).

These commands are already detected in the current release, but they stopped being detected once I prevented the invalid match. Shouldn't those additions be part of this PR so those commands are still covered, therefore avoiding some regressions?

To complete your PR, @EsadCetiner, you'll have to also adapt toolchain.yaml.

I'm not entirely clear on what this does, I see what looks to be@ here, but I don't see ~.

  anti_evasion_suffix:
    # - <>: redirection, e.g., `cat<foo`
    # - ,: brace expansion, e.g., `""{nc,-p,777}`
    ## - &|: logical operators in headers, e.g., `a=nc&&$a -nlvp 555`
    unix: |
      [\s<>,&|)].*

I assume you want me to modify the unix evasion suffix to this, then remove all references to @.

    unix: |
-      [\s<>,&|)].*
+      \b

I think your saying here:

Instead, a block processed with the command line processor should end with a \b

This still leaves ~. I think we can take care of that by using \w+ instead.

To modify the affected regex assembly files to something like this, while keeping ~:

##!> assemble
  ##!> cmdline unix
    ##!> include-except unix-shell-upto3 unix-shell-fps-pl1 -- ~ \w+
##!<

I hope this wasn't too convoluted, is this the general idea of what you want me to do?

theseion commented 2 months ago

These commands are already detected in the current release, but they stopped being detected once I prevented the invalid match. Shouldn't those additions be part of this PR so those commands are still covered, therefore avoiding some regressions?

Yes.

I'm not entirely clear on what this does, I see what looks to be@ here, but I don't see ~.

~ is anti_evasion_no_space_suffix. Don't replace the wildcard match .*.

I assume you want me to modify the unix evasion suffix to this, then remove all references to @.

Modify, yes. We still need @, @ will tell the toolchain to add that suffix.

I think your saying here:

Instead, a block processed with the command line processor should end with a \b

This still leaves ~. I think we can take care of that by using \w+ instead.

To modify the affected regex assembly files to something like this, while keeping ~:

!> assemble

!> cmdline unix
##!> include-except unix-shell-upto3 unix-shell-fps-pl1 -- ~ \w+
!<

You can ignore that part. That's my proposal for the change of the processor. It has nothing to do with your PR.

EsadCetiner commented 2 months ago

@theseion done, ready for review

EsadCetiner commented 2 months ago

I've gone back to my original solution of resolving this false positive without a word boundry (?:[\s<>&|)]|$), the word boundry was causing a fair few false positives. Added tests to make sure those false positives don't come up again.

EsadCetiner commented 2 months ago

I've tested this PR in production for a few days, the PR does fix most false positives with invalid command matches, session tokens, UUID and such but this PR introduced new false positives.

There's a whole bunch of new false positives with paramater names or with values such as file for 932236. A few examples I encountered:

ARGS:query[orderby]=date
ARGS:avatar_rating=PG
ARGS:_method=POST
ARGS:itemType=file
ARGS:_task=mail
ARGS:configValue=yes

ARGS_NAMES:ss
ARGS_NAMES:screen
ARGS_NAMES:dir
ARGS_NAMES:sort
ARGS_NAMES:source

Existing users will need to retune rule exclusions for 932236, and nothing's really solved except the false positives have been moved around.

I do have a few ideas but I'm not sure on what's the best path, I'm open to other ideas.

Introduce a stricter sibling for 932236 at PL-3 that will block both commands with and without arguments, and have 932236 require an argument for a command before blocking it.
Remove detection for low risk/harmless commands, this probably won't have much of an impact on reducing false positives but I do see some commands like yes which don't do anything.
Require arguments for select commands like file and mail that are too false positive prone to block without arguments, I don't think this is realisticly possible but maybe somebody has some ideas.

This PR currently does perform better against natural english language, so maybe that's more important than the new false positives this introduces.

theseion commented 2 months ago

Thanks @EsadCetiner. I'll try to get my head back into this soon.

EsadCetiner commented 2 months ago

After some thinking I've come up with adding a chained rule that resolves all of the false positives I've reported in the example above:

SecRule MATCHED_VAR "!@rx (?i)^(?:cron|date|dir|file|get|head|mail|null|pg|php|post|screen|sort|ss|source|view|yes)$" \

This pretty much resolves all of the new false positives I've encountered, except one or two. It won't make detection any worse than before this PR.

I'm going to do a little bit more testing on false positives to see if I've missed anything else, then push this change to the PR.

EsadCetiner commented 1 month ago

Ready for review

theseion commented 1 month ago

Alright, I'm tackling this one...

theseion commented 1 month ago

I don't think we need your chain rule. I've updated the word lists as they were supposed to be updated and added your false positives to the concerned lists.

Note that I adjusted the list update scripts to automatically suffix all commands of less than 5 characters length.

I'm not certain that your changes to the tests are correct, but I'll have to wait for the bug bounty issues, since we don't have enough information about the tests at the moment.

In general, I think your proposal is reasonable and it will probably take care of many issues we've had. I'm hopeful that we can create an LTS release with this and address the other improvements later.

Could you explain why the word boundaries didn't work for you?

coreruleset / coreruleset

fix: prevent invalid commands matches on 5 characters or less (932230 PL-1, 932235 PL-1, 932236 PL-2, 932237 PL-3, 932239 PL-2, 932250 PL-1, 932260 PL-1) #3735

!> assemble

!> cmdline unix

!<