falcosecurity / libs

libsinsp, libscap, the kernel module driver, and the eBPF driver sources
https://falcosecurity.github.io/libs/
Apache License 2.0
227 stars 162 forks source link

[Feature] Supporting field transformers in filtering language #1789

Closed jasondellaluce closed 4 months ago

jasondellaluce commented 5 months ago

Action plan

We plan to execute the changes as follows:

All steps will require in-depth tests.

Motivation

Over time, we collected plenty of requests in the context of the filtering language of libsinsp. This little DSL (domain specific language) is the basis on which Falco rules are developed and executed, and also serves other use cases across the different adopters of the Falco libs. Feedback from adopters always indicated that the language is simple and expressive, but we acknowledge that it also suffers from some limitations. To list some:

Here's a non-comprehensive collection of issues from our repositories related to the topic:

The general feeling is that changing the nature of the language, or making it extra complex, would defeat the simplicy principles that made the rules language widely adopted and easy to learn. Moreover, the grammar of the filtering language is quite fragile and does not leave much space to edits without the risk of introducing breaking changes of vast magnitude.

However, we also argue that there are minor feasible changes could make the language far more expressive and powerful.

Feature

I want to share an R&D project that me and @Andreagit97 spent some time on over the past weeks.

Our proposal is to update the filtering language with the notion of Field transformers. Transformers are declarative transformations that can be applied to filter fields (e.g. proc.name, etc...) with the purposes of supporting new detection scenarios and filtering capabilities.

The proposed syntax is as follows (all fields and scenarios are random simple examples):

Here are some properties of field transformers:

The grammar of the filtering language (current state: https://github.com/falcosecurity/libs/blob/eadccc563aa42baea827b7782a2159033a52d765/userspace/libsinsp/filter/parser.h#L27) will evolve in the following backward-compatible and non-ambiguous way:

Productions (EBNF Syntax):
    Expr                    ::= OrExpr
    OrExpr                  ::= AndExpr ('or' OrExprTail)*
    OrExprTail              ::= ' ' AndExpr
                                | '(' Expr ')'
    AndExpr                 ::= NotExpr ('and' AndExprTail)*
    AndExprTail             ::= ' ' NotExpr
                                | '(' Expr ')'
    NotExpr                 ::= ('not ')* NotExprTail
    NotExprTail             ::= 'not(' Expr ')'
                                | Check
    Check                   ::= Field Condition
                                | FieldTransformer Condition 
                                | Identifier
                                | '(' Expr ')'
    FieldTransformer        ::= FieldTransformerType FieldTransformerTail
    FieldTransformerTail    ::= FieldTransformerArg ')'
    FieldTransformerArg     ::= FieldTransformer
                                | Field
    FieldTransformerOrVal   ::= FieldTransformer
                                | FieldTransformerVal Field ')'
    Condition               ::= UnaryOperator
                                | NumOperator (NumValue | FieldTransformerOrVal)
                                | StrOperator (StrValue | FieldTransformerOrVal)
                                | ListOperator (ListValue | FieldTransformerOrVal)
    ListValue               ::= '(' (StrValue (',' StrValue)*)* ')'
                                | Identifier
    Field                   ::= FieldName('[' FieldArg ']')?
    FieldArg                ::= QuotedStr | FieldArgBareStr 
    NumValue                ::= HexNumber | Number
    StrValue                ::= QuotedStr | BareStr

Supported Check Operators (EBNF Syntax):
    UnaryOperator           ::= 'exists'
    NumOperator             ::= '<=' | '<' | '>=' | '>' 
    StrOperator             ::= '==' | '=' | '!='
                                | 'glob ' | 'iglob '
                                | 'contains ' | 'icontains ' | 'bcontains '
                                | 'startswith ' | 'bstartswith ' | 'endswith '
    ListOperator            ::= 'intersects' | 'in' | 'pmatch' 
    FieldTransformerVal     ::= 'val('
    FieldTransformerType    ::= 'tolower(' | 'toupper(' | 'b64('

Tokens (Regular Expressions):
    Identifier              ::= [a-zA-Z]+[a-zA-Z0-9_]*
    FieldName               ::= [a-zA-Z]+[a-zA-Z0-9_]*(\.[a-zA-Z]+[a-zA-Z0-9_]*)+
    FieldArgBareStr         ::= [^ \b\t\n\r\[\]"']+
    HexNumber               ::= 0[xX][0-9a-zA-Z]+
    Number                  ::= [+\-]?[0-9]+[\.]?[0-9]*([eE][+\-][0-9]+)?
    QuotedStr               ::= "(?:\\"|.)*?"|'(?:\\'|.)*?'
    BareStr                 ::= [^ \b\t\n\r\(\),="']+

Additional context

The val(<field>) transformer is a special no-op transformer that's needed at the language parser level in order to disambuate field references from raw string values. For clarity:

/milestone 0.17.0

poiana commented 5 months ago

@jasondellaluce: The provided milestone is not valid for this repository. Milestones in this repository: [0.16.0, 0.17.0, TBD, next-driver]

Use /milestone clear to clear the milestone.

In response to [this](https://github.com/falcosecurity/libs/issues/1789): >**Motivation** > >Over time, we collected plenty of requests in the context of the filtering language of libsinsp. This little DSL (domain specific language) is the basis on which Falco rules are developed and executed, and also serves other use cases across the different adopters of the Falco libs. Feedback from adopters always indicated that the language is simple and expressive, but we acknowledge that it also suffers from some limitations. To list some: >- Small modifications to existing fields mandate adding new fields >- Minor changes in the semantics of a comparison operator mandate adding new operators (e.g. `istartswith` and `iglob`, which are just case insensitive versions of already-existing operators) >- Field-to-field comparisons are not possible >- Interpolation, composition, or small runtime transformations of existing types are not possible > >Here's a non-comprehensive collection of issues from our repositories related to the topic: >- https://github.com/falcosecurity/libs/pull/1547 >- https://github.com/falcosecurity/libs/issues/1627 >- https://github.com/falcosecurity/falco/issues/2612 >- https://github.com/falcosecurity/falco/issues/2496 >- https://github.com/falcosecurity/falco/issues/2484 >- https://github.com/falcosecurity/falco/issues/2403 > >The general feeling is that changing the nature of the language, or making it extra complex, would defeat the simplicy principles that made the rules language widely adopted and easy to learn. Moreover, the grammar of the filtering language is quite fragile and does not leave much space to edits without the risk of introducing breaking changes of vast magnitude. > >However, we also argue that there are minor feasible changes could make the language far more expressive and powerful. > >**Feature** > >I want to share an R&D project that me and @Andreagit97 spent some time on over the past weeks. > >Our proposal is to update the filtering language with the notion of **Field transformers**. Transformers are declarative transformations that can be applied to filter fields (e.g. `proc.name`, etc...) with the purposes of supporting new detection scenarios and filtering capabilities. > >The proposed syntax is as follows (all fields and scenarios are random simple examples): > >* `fd.name startswith "/etc"`: Traditional use case, which will be supported as usual >* `tolower(fd.name) startswith "/etc"`: Lower case conversion for string field types >* `toupper(fd.name) startswith "/ETC"`: Upper case conversion for string field types >* `b64(evt.buffer) bcontains deadbeef`: base64 decoding for string and bytebuf field types >* `proc.name != val(proc.pname)`: field-to-field comparisons >* `tolower(proc.name) != tolower(proc.pname)`: field-to-field comparisons, with transformers >* `toupper(b64(fd.name)) = TESTFILE`: base64 decoding for string and bytebuf field types > >Here are some properties of field transformers: >- Implemented as an additional feature of the language, thus **not introducing any breaking change** from to the current state of things >- Have strong typing, thus non-ambiguous >- Easy to implement new ones for future use cases, making them future proof >- Are composable (e.g. `toupper(b64(fd.name))`) > >The grammar of the filtering language (current state: https://github.com/falcosecurity/libs/blob/eadccc563aa42baea827b7782a2159033a52d765/userspace/libsinsp/filter/parser.h#L27) will evolve in the following backward-compatible and non-ambiguous way: > >``` >Productions (EBNF Syntax): > Expr ::= OrExpr > OrExpr ::= AndExpr ('or' OrExprTail)* > OrExprTail ::= ' ' AndExpr > | '(' Expr ')' > AndExpr ::= NotExpr ('and' AndExprTail)* > AndExprTail ::= ' ' NotExpr > | '(' Expr ')' > NotExpr ::= ('not ')* NotExprTail > NotExprTail ::= 'not(' Expr ')' > | Check > Check ::= Field Condition > | FieldTransformer Condition > | Identifier > | '(' Expr ')' > FieldTransformer ::= FieldTransformerType FieldTransformerTail > FieldTransformerTail ::= FieldTransformerArg ')' > FieldTransformerArg ::= FieldTransformer > | Field > FieldTransformerOrVal ::= FieldTransformer > | FieldTransformerVal Field ')' > Condition ::= UnaryOperator > | NumOperator (NumValue | FieldTransformerOrVal) > | StrOperator (StrValue | FieldTransformerOrVal) > | ListOperator (ListValue | FieldTransformerOrVal) > ListValue ::= '(' (StrValue (',' StrValue)*)* ')' > | Identifier > Field ::= FieldName('[' FieldArg ']')? > FieldArg ::= QuotedStr | FieldArgBareStr > NumValue ::= HexNumber | Number > StrValue ::= QuotedStr | BareStr > >Supported Check Operators (EBNF Syntax): > UnaryOperator ::= 'exists' > NumOperator ::= '<=' | '<' | '>=' | '>' > StrOperator ::= '==' | '=' | '!=' > | 'glob ' | 'iglob ' > | 'contains ' | 'icontains ' | 'bcontains ' > | 'startswith ' | 'bstartswith ' | 'endswith ' > ListOperator ::= 'intersects' | 'in' | 'pmatch' > FieldTransformerVal ::= 'val(' > FieldTransformerType ::= 'tolower(' | 'toupper(' | 'b64(' > >Tokens (Regular Expressions): > Identifier ::= [a-zA-Z]+[a-zA-Z0-9_]* > FieldName ::= [a-zA-Z]+[a-zA-Z0-9_]*(\.[a-zA-Z]+[a-zA-Z0-9_]*)+ > FieldArgBareStr ::= [^ \b\t\n\r\[\]"']+ > HexNumber ::= 0[xX][0-9a-zA-Z]+ > Number ::= [+\-]?[0-9]+[\.]?[0-9]*([eE][+\-][0-9]+)? > QuotedStr ::= "(?:\\"|.)*?"|'(?:\\'|.)*?' > BareStr ::= [^ \b\t\n\r\(\),="']+ >``` > >** Additional context ** > >The `val()` transformer is a special no-op transformer that's needed at the language parser level in order to disambuate field references from raw string values. For clarity: >- `proc.name = proc.pname`: Evaluates true for process of which comm is the `proc.pname` string, and is equivalent to `proc.name = "proc.pname"` >- `proc.name = val(proc.pname)`: Evaluates true for process of which comm is the same as its parent's comm > >** Action plan ** > >We plan to execute the changes as follows: >- [ ] Preparing ground work on libsinsp "filter checks" data structure, that evaluates filter comparisons at runtime >- [ ] Updating the filter grammar and AST (Abstract Syntax) definitions >- [ ] Supporting the new feature in the "sinsp filter compiler", which compiles filter ASTs in the filtercheck-based executable form avaluated at runtime >- [ ] Supporting the new feature in the sinsp output formatters, which are used to format Falco rules output and print-out information about event payloads and data fields >- [ ] Document all the features on falco.org > >All steps will require in-depth tests. > >/milestone 0.38.0 > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
leogr commented 5 months ago

/milestone 0.38.0

I guess you wanted to select the last libs milestone before Falco 0.38, if so: /milestone 0.17.0

leogr commented 5 months ago

Additional proposal (can be implemented later):

leogr commented 5 months ago

Additional proposal (can be implemented later):

incertum commented 5 months ago

The val() transformer is a special no-op transformer that's needed at the language parser level in order to > disambuate field references from raw string values. For clarity: proc.name = proc.pname: Evaluates true for process of which comm is the proc.pname string, and is equivalent to proc.name = "proc.pname" proc.name = val(proc.pname): Evaluates true for process of which comm is the same as its parent's comm

Understood, it will likely cause a bit of a confusion and we need to document it very clearly. If we can think of alternatives that do not require val() we should consider them as well.

jasondellaluce commented 5 months ago

Understood, it will likely cause a bit of a confusion and we need to document it very clearly. If we can think of alternatives that do not require val() we should consider them as well.

I agree with this. As part of this work, the plan is also to make the sinsp compiler emit warnings for potential mistakes with regards of this. Unfortunately, we explored many options and there is no better grammar construct we can employ that would not lead us to potential breaking changes in the filtering language and Falco rulesets out there. Although ugly-ish, this should guarantee complete backward compatibility with the status quo.

LucaGuerra commented 4 months ago

Corresponding documentation PR: https://github.com/falcosecurity/falco-website/pull/1319

FedeDP commented 4 months ago

Considering that the docs PR is open and that the 0.17.0 libs tag is out, i think we can close this one. /close

poiana commented 4 months ago

@FedeDP: Closing this issue.

In response to [this](https://github.com/falcosecurity/libs/issues/1789#issuecomment-2132795288): >Considering that the docs PR is open and that the 0.17.0 libs tag is out, i think we can close this one. >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.