One major feature of libsinsp is that it can parse filters, and be configured with one. This is the basis on top of which the Falco rule engine is built, augmenting the feature to support sets of filters (the rulesets, as we know them). However, the filter parser inside libsinsp, implemented in the sinsp_filter_compiler class, has some relevant issues:
The implementation is old, not actively maintained, and some novel changes have been left behind, such as the new filtercheck factory constructs
The concepts of textual parsing AST and boolean evaluation tree (a.k.a. filterchecks) are mixed and tied together. There is no separation between the two concepts, and filterchecks bundle some logic of text parsing. Downsides of this include:
The code is confusing and the flow hard to follow, especially for newcomer contributors
It's hard to apply new changes
Manipulating the filtercheck data structures from outside is complex or impossible. For instance, Falco is not capable of performing its macro/list expansion logic at this level
The parser is broken in some edge cases. This leads to non-predictable behavior at runtime. For example, the filter not not <true-check> never evaluates to true at runtime.
Associativity of boolean operators is not properly implemented, so the parser doesn't allow mixing expressions of and and or. For example, filters like the following are rejected: <check1> or <check2> and <check3>
String escaping is ambiguous, implicit, and sometimes broken. It is unclear, and definitely not documented, how strings are escaped. This causes lots of issues where users are unsure how to write string values to ensure that their filter is correctly matching. We have a notable example of this in the Falco default ruleset: in proc.args contains "\ " we try to match the backslash character followed by a space, but that's not consistent with some other rules and with the expected semantics
There is an absence of an explicit formal grammar for the filtering language. Although the language is simple, this creates lots of downsides and ambiguity both in the parser implementation and when writing filters.
We have no unit tests for the parser, and each new change can cause unexpected regressions
We have multiple parser implementations across the Falcosecurity projects. Due to all the points above, Falco needed to re-implement a second parser as it is impossible to load the rulesets by using the parser built in libsinsp. Morover, that parser is also written in Lua, and is the core reason why we still have Lua code in Falco, which in many instances has been pointed out as inconvenient to have and maintain. Plus, the risk is that other libsinsp consumer might have re-implemented too, of which we are not aware of.
Feature
We need to renovate, and perhaps re-implement the filter parser to address all the points above. Eventually, we may want to make Falco use the parser in libsinsp and remove the implementation replication.
Alternatives
I don't personally see the leave things as they are solution as viable. I see this as a big limiter for the project.
Motivation
One major feature of libsinsp is that it can parse filters, and be configured with one. This is the basis on top of which the Falco rule engine is built, augmenting the feature to support sets of filters (the rulesets, as we know them). However, the filter parser inside libsinsp, implemented in the
sinsp_filter_compiler
class, has some relevant issues:not not <true-check>
never evaluates totrue
at runtime.and
andor
. For example, filters like the following are rejected:<check1> or <check2> and <check3>
proc.args contains "\ "
we try to match the backslash character followed by a space, but that's not consistent with some other rules and with the expected semanticsFeature
We need to renovate, and perhaps re-implement the filter parser to address all the points above. Eventually, we may want to make Falco use the parser in libsinsp and remove the implementation replication.
Alternatives
I don't personally see the leave things as they are solution as viable. I see this as a big limiter for the project.
Additional context