SigmaHQ / pySigma-backend-splunk

pySigma Splunk backend
GNU Lesser General Public License v2.1
33 stars 17 forks source link

adding Oring regexes #37

Closed arblade closed 6 months ago

arblade commented 6 months ago

PR Introduction

Currently, when there is an OR operator between fields concerned by regexes, the backend is returning an exception saying that Oring regexes are not supported.

Issue description

Until now, this has been an issue on the splunk backend given the fact that, in splunk, regexes need to be handled with an operator preceded by a pipe, like | regex fieldX=value. These pipes are making an implicit AND on the whole preceding query and are "ending the query", to continue it, we would need to add a | search which cannot be combined with parenthesis crossing/englobing pipes. So when regexes have only AND operators in their parents, this is not an issue and they can be appended at the end of the query with their implicit AND. But when they are concerned by an OR operator, as they are applying this implicit AND on the whole preceding query and "ending" it, there is no way to handle it with this method. So, something like (fieldX=test OR (index=* | regex fieldY=test) is returning an error (parenthesis englobing a pipe), as something like this : index=* | regex fieldX=test OR fieldY=test (query following the | regex).

Solution presentation

This PR offers to fix this limitation by computing all regexes at the begining, and then appending the query to handle value comparison conditions instead of regex condtions. So for this PR offers to handle a rule like this :

sel1:
    fieldA|re:  foo.*bar
self2:
    fieldB|re: foo.*bar
condition: sel1 or sel2

with the following query :

| rex field=fieldA "(?<fieldAMatch>foo.*bar)"
| eval fieldACondition=if(isnotnull(fieldAMatch), "true", "false") 
| rex field=fieldB "(?<fieldBMatch>foo.*bar)" 
| eval fieldBCondition=if(isnotnull(fieldBMatch), "true", "false") 
| search fieldACondition="true" OR fieldBCondition="true"

Implementation

Implementation is passing through an new deferred class SplunkDeferredORRegularExpression, and the redefinition of finalize_query which is just checking if there is an ORing regex case, and if so, is calling the super().finalize_query with the query preceded by the whole train of | rex ...|eval ....

Cases of multiple regexes on the same field

When multiple regexes are on the same field, with an OR operator between them, i implemented the ability to add a number at the end of fieldXMatch and fieldXCondition to differentiate between them.

sel1:
    fieldA|re:  foo.*bar
self2:
    fieldA|re: foo.*foo
condition: sel1 or sel2

is handled with :

| rex field=fieldA "(?<fieldAMatch>foo.*bar)"
| eval fieldACondition=if(isnotnull(fieldAMatch), "true", "false") 
| rex field=fieldB "(?<fieldAMatch2>foo.*foo)" 
| eval fieldACondition2=if(isnotnull(fieldAMatch2), "true", "false") 
| search fieldACondition="true" OR fieldACondition2="true"

Notes on performance

This implementation offers to handle the rules using regexes with OR operators, but has a major drawback on the performance side : each regex will be processed on all logs before applying the query logic, this can take some time.