Open EsadCetiner opened 3 months ago
I'll need a couple of days before I can look at this PR in detail.
Since this is a big change, I likely may have missed some commands that were intended to be matched (For example, ip matching iptables and ip6tables) but were shortened for better performance.
Just as a note: ip
is a command, not a shortened version.
(?:[\s<>&|)]|$)
Maybe using a word boundary instead? Like (?:[\s<>&|)])?\b
.
@fzipi
Maybe using a word boundary instead? Like (?:[\s<>&|)])?\b
I've further simplified this by just using a word boundry and getting rid of the optional non-capturing group. The effect is pretty much the same, detection might be slightly better, but the generated regexes have reduced in size dramatically with this change (especially 932236).
For some reason only using a word boundry for rule 932237 crashes Apache, I couldn't figure out the cause so I just added a workaround and comment.
Thank you very much @EsadCetiner for this PR. I think the idea of looking for word boundaries is great. What I don't like is that we now need to add @
everywhere. It was bad enough already.
I have an idea for a different approach that I think makes more sense in the long run. I think we should simplify the command line processor in crs-toolchain
and remove the logic for @
and ~
. Instead, a block processed with the command line processor should end with a \b
. Consider the following block:
##!> cmdline unix
gcc
sudo
##!<
The result would look as follows (simplified):
(?:gcc|sudo)\b
This still leaves ~
. I think we can take care of that by using \w+
instead. Consider the following:
##!> cmdline unix
python\w+
##!<
This would result in the following (simplified):
(?:python(?:\w+))\b
This would match python2
and python3
, for example, but not python
. I would make these replacements in the list of commands directly, as there are only a couple of them anyway.
In conclusion:
@
and ~
@
and ~
in the assembly files for rules that don't use the command line processor@EsadCetiner Please open a separate PR with the additions (I saw a couple svn
related commands).
I realise your PR solves an issue we have now and will prevent more issues from being filed. I propose to go ahead with your PR and implement my proposal later on.
To complete your PR, @EsadCetiner, you'll have to also adapt toolchain.yaml
.
@theseion
I have an idea for a different approach that I think makes more sense in the long run. I think we should simplify the command line processor in crs-toolchain and remove the logic for @ and ~. Instead, a block processed with the command line processor should end with a \b.
Agreed, I had a general idea of something like this but I wasn't sure on how to approach this.
I realise your PR solves an issue we have now and will prevent more issues from being filed. I propose to go ahead with your PR and implement my proposal later on.
4.5.0 is coming out in 3 weeks or so, I'll try and address all of your feedback before then. I'd rather get it out of the way now so it's not forgotten.
Please open a separate PR with the additions (I saw a couple svn related commands).
These commands are already detected in the current release, but they stopped being detected once I prevented the invalid match. Shouldn't those additions be part of this PR so those commands are still covered, therefore avoiding some regressions?
To complete your PR, @EsadCetiner, you'll have to also adapt toolchain.yaml.
I'm not entirely clear on what this does, I see what looks to be@
here, but I don't see ~
.
anti_evasion_suffix:
# - <>: redirection, e.g., `cat<foo`
# - ,: brace expansion, e.g., `""{nc,-p,777}`
## - &|: logical operators in headers, e.g., `a=nc&&$a -nlvp 555`
unix: |
[\s<>,&|)].*
I assume you want me to modify the unix evasion suffix to this, then remove all references to @
.
unix: |
- [\s<>,&|)].*
+ \b
I think your saying here:
Instead, a block processed with the command line processor should end with a \b
This still leaves ~. I think we can take care of that by using \w+ instead.
To modify the affected regex assembly files to something like this, while keeping ~
:
##!> assemble
##!> cmdline unix
##!> include-except unix-shell-upto3 unix-shell-fps-pl1 -- ~ \w+
##!<
I hope this wasn't too convoluted, is this the general idea of what you want me to do?
These commands are already detected in the current release, but they stopped being detected once I prevented the invalid match. Shouldn't those additions be part of this PR so those commands are still covered, therefore avoiding some regressions?
Yes.
I'm not entirely clear on what this does, I see what looks to be@ here, but I don't see ~.
~
is anti_evasion_no_space_suffix
. Don't replace the wildcard match .*
.
I assume you want me to modify the unix evasion suffix to this, then remove all references to @.
Modify, yes. We still need @
, @
will tell the toolchain to add that suffix.
I think your saying here:
Instead, a block processed with the command line processor should end with a \b
This still leaves ~. I think we can take care of that by using \w+ instead.
To modify the affected regex assembly files to something like this, while keeping ~:
!> assemble
!> cmdline unix
##!> include-except unix-shell-upto3 unix-shell-fps-pl1 -- ~ \w+
!<
You can ignore that part. That's my proposal for the change of the processor. It has nothing to do with your PR.
@theseion done, ready for review
I've gone back to my original solution of resolving this false positive without a word boundry (?:[\s<>&|)]|$)
, the word boundry was causing a fair few false positives. Added tests to make sure those false positives don't come up again.
I've tested this PR in production for a few days, the PR does fix most false positives with invalid command matches, session tokens, UUID and such but this PR introduced new false positives.
There's a whole bunch of new false positives with paramater names or with values such as file
for 932236. A few examples I encountered:
ARGS:query[orderby]=date
ARGS:avatar_rating=PG
ARGS:_method=POST
ARGS:itemType=file
ARGS:_task=mail
ARGS:configValue=yes
ARGS_NAMES:ss
ARGS_NAMES:screen
ARGS_NAMES:dir
ARGS_NAMES:sort
ARGS_NAMES:source
Existing users will need to retune rule exclusions for 932236, and nothing's really solved except the false positives have been moved around.
I do have a few ideas but I'm not sure on what's the best path, I'm open to other ideas.
yes
which don't do anything.file
and mail
that are too false positive prone to block without arguments, I don't think this is realisticly possible but maybe somebody has some ideas.This PR currently does perform better against natural english language, so maybe that's more important than the new false positives this introduces.
Thanks @EsadCetiner. I'll try to get my head back into this soon.
After some thinking I've come up with adding a chained rule that resolves all of the false positives I've reported in the example above:
SecRule MATCHED_VAR "!@rx (?i)^(?:cron|date|dir|file|get|head|mail|null|pg|php|post|screen|sort|ss|source|view|yes)$" \
This pretty much resolves all of the new false positives I've encountered, except one or two. It won't make detection any worse than before this PR.
I'm going to do a little bit more testing on false positives to see if I've missed anything else, then push this change to the PR.
Ready for review
Alright, I'm tackling this one...
I don't think we need your chain rule. I've updated the word lists as they were supposed to be updated and added your false positives to the concerned lists.
Note that I adjusted the list update scripts to automatically suffix all commands of less than 5 characters length.
I'm not certain that your changes to the tests are correct, but I'll have to wait for the bug bounty issues, since we don't have enough information about the tests at the moment.
In general, I think your proposal is reasonable and it will probably take care of many issues we've had. I'm hopeful that we can create an LTS release with this and address the other improvements later.
Could you explain why the word boundaries didn't work for you?
There have been many reports of false positives with the 932260 family of rules, many of these stem from invalid commands being matched (such as idendity matching id). This pr solves most of these issues by preventing matches of invalid commands on 5 characters or less, I didn't prevent invalid matches for all commands to avoid the regex size exploding in size. I've modified the definition of
@
from[\s<>&|)]
to(?:[\s<>&|)]|$)
and adding@
to all commands with 5 characters or less. This results in either matching a command without arguments, or an command with an argument without matching permutations of a commands (For example, sudo matching sudoaaaa).Since this is a big change, I likely may have missed some commands that were intended to be matched (For example, ip matching iptables and ip6tables) but were shortened for better performance.
Some new attacks are detected (See: 932230-38, 932230-39, 932232-4, 932237-2, 932237-6, 932237-7, 932237-16, 932250-4 tests)
Fixes false positives with english words: idendity unique time express
This should fix false positives with UUID, session cookies, and tokens.
closes: https://github.com/coreruleset/coreruleset/issues/3711 https://github.com/coreruleset/coreruleset/issues/3631 https://github.com/coreruleset/coreruleset/issues/3725
Alternative to https://github.com/coreruleset/coreruleset/pull/3727
This PR may introduce new false positives if for example there's a paramater named
mail
or a paramater only containsmail
. Although these false positives are mostly at PL-2 and are less common than the false positives this PR fixes.