SigmaHQ / sigma-specification

Sigma rule specification
Other
106 stars 36 forks source link

Regular Expression matching #41

Open maederm opened 3 years ago

maederm commented 3 years ago

Hi

How does sigma expect regex to be applied to fields? Does the regex need to apply to the whole field? I couldn't find a definition in the spec.

Take for example rules/windows/process_creation/win_regini.yml

        CommandLine|re: ':[^ \\]' # to avoid intersection with ADS rule

If I translate that with sigmac I'll get a query string that requries a full match on the field.

$ tools/sigmac  rules/windows/process_creation/win_regini_ads.yml -c winlogbeat-modules-enabled -t es-qs
(process.executable.keyword:*\\regini.exe AND process.command_line.keyword:/:[^ \\]/)

I propose to define that behavior in the sigma specification and thought of these two possibilities:

Solution A: Sigma Spec defines partial match

If only a partial match is required I can try to make a pull request that would translate it to (process.executable.keyword:*\\regini.exe AND process.command_line.keyword:/.*:[^ \\].*/)

Solution B: Sigma Spec defines full match

If a full field match is required I could make a pull request to rewrite the rule to

        CommandLine|re: '.*:[^ \\].*' # to avoid intersection with ADS rule

Best Regards, maederm

frack113 commented 3 years ago

Hi, The modifier re check if it is a valid regex and give it to the backend. Not every backend can handle regex. Some have they way to deal with regex :

A way can be to have 2 modifiers:

maederm commented 3 years ago

A way can be to have 2 modifiers:

* |re       (no change)

* |re_in   ( backend add `.*` or what it is need to work)

@frack113: If I understand you correctly this means the spec should be updated to say that |re must match the whole field, right?

frack113 commented 3 years ago

currently it is the backend that manages the regex. So the way es-qs manages it is a full match because elactic is fullmatch. Test in Kibana

My proposal is to clarify this point. So the author specifies in the search his regex is full or partial ,but the backend still has to handle it ... in my mind re_in is like contains perhaps more re_contains

thomaspatzke commented 1 year ago

As field matches are always full matches on the whole value, this should be the same for regular expressions to maintain consistency.

Res260 commented 1 year ago

@thomaspatzke I'm not sure it's officially in the specification, but I disagree with your comment. Full-matching regexes can have important performance implications for SIEMs:

It is discussed here https://www.loggly.com/blog/five-invaluable-techniques-to-improve-regex-performance/ image

At my org, using leading and trailing .*s in use cases is only used when absolutely necessary, as a bad regex that's ran on 20k events per second can have very negative performance impacts!