Closed incertum closed 3 months ago
This is very interesting. There are many ways to attain this, and I think it will require deeper discussion (hard to summarize everything in a single message). Overall, I think we could tackle the simplest changes by either:
"Extension 3" is what scares me the most because it introduces some sort of meta-level language constructs. I don't see this happening without first cleaning up a bit the libsinsp portion of code responsible of filters.
Thank you @jasondellaluce. Perhaps the first proposal seems most realistic at first and end users may appreciate it the most as well plus it could be accomplished with syntactic sugar. We'll chat more. In general, this could be something for release 0.36?
/milestone 0.36.0
Include all eligible operators for the new lists feature, especially glob
, see https://github.com/falcosecurity/libs/issues/947. Thanks @mikescholl-sysdig!
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
I plan to dig into this (possibly with @jasondellaluce's help since he is very knowledgeable about rule syntax) immediately after we close the release.
/assign
Amazing thanks!
See also falcosecurity/libs#1627
@lclin56 please be aware that full SQL language support (e.g. groupBy aggregations etc) is not planned at the moment, because Falco currently only supports stream processing without (micro) batching. There was a similar discussion between us in this ticket https://github.com/falcosecurity/rules/issues/196. If I may ask, I would be curious to know if you had a chance to explore existing Big Data or Stream technologies to consume Falco alerts and perform further processing? Thanks in advance.
@lclin56 please be aware that full SQL language support (e.g. groupBy aggregations etc) is not planned at the moment, because Falco currently only supports stream processing without (micro) batching. There was a similar discussion between us in this ticket falcosecurity/rules#196. If I may ask, I would be curious to know if you had a chance to explore existing Big Data or Stream technologies to consume Falco alerts and perform further processing? Thanks in advance.
I'd also say that even talking of "SQL" is a bit misleading in the Falco context. Falco is a streaming engine, so it must compute filters as quickly as possible. Otherwise, it would risk dropping events (we can't just put the kernel on hold and wait for processing). Introducing data query capabilities, I think, is out of scope for Falco. So, I can't really imagine a future where Falco would allow complex filtering or aggregation options on the fly.
On the other hand, we can extend the current Falco capabilities to allow more expressiveness in the rules condition syntax, which would allow the implementation of more powerful detections. That's the scope of the initiative described in this issue. Processing Falco alert in a downstream tool is likely the best way to achieve full filtering (and querying) capabilities.
@leogr Thank you for your response. My current use case involves collecting event data from unknown samples and performing threat analysis downstream. My initial idea was to support batch processing rules within Falco, similar to supporting SQL-like batch queries and processing for certain events.
If I may ask, I would be curious to know if you had a chance to explore existing Big Data or Stream technologies to consume Falco alerts and perform further processing? Thanks in advance.
I'm also currently exploring the feasibility of using big data or stream processing technologies to consume these events, such as Apache Metron, among others. However, I haven't found a clear direction yet, and I'm hoping to share some more viable solutions with you.
I'd also say that even talking of "SQL" is a bit misleading in the Falco context. Falco is a streaming engine, so it must compute filters as quickly as possible. Otherwise, it would risk dropping events (we can't just put the kernel on hold and wait for processing). Introducing data query capabilities, I think, is out of scope for Falco. So, I can't really imagine a future where Falco would allow complex filtering or aggregation options on the fly.
@leogr I agree with your point. Based on my testing, Falco does consume a significant amount of CPU and memory resources even just for recording events of an unknown program, especially when dealing with samples exhibiting a high volume of malicious behavior. However, I'd like to mention that my consideration for introducing data query capabilities was driven by Falco's support for analyzing event dump files. Perhaps these features, which may have noticeable performance impacts, could be utilized in scenarios involving the analysis of dump files. It's likely that implementing batch processing using big data and stream processing technologies downstream is indeed the optimal approach. I will continue to explore in that direction.
However, I'd like to mention that my consideration for introducing data query capabilities was driven by Falco's support for analyzing event dump files. Perhaps these features, which may have noticeable performance impacts, could be utilized in scenarios involving the analysis of dump files.
This is a good point. Still, I guess that piping Falco output to a downstream tool for offline analysis is a better solution, since you would use specialized software for data analysis. Btw, falcosidekick allows to forward Falco alerts to a lot of 3rd-party things (some DB included). I recommend you take a look at it!
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle rotten
/remove-lifecycle rotten
Working with :point_down: on this /assign @Andreagit97 /assign @jasondellaluce
Still on designing phase. We will come up with a proposal.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle rotten
/remove-lifecycle rotten /remove-lifecycle stale
Sorry for being late on this.
Quick update: I'll soon open some proposals for the following transformers: join()
, basename()
, getopt()
and also for the introduction of something like anyof
/ oneof
/ allof
to combine comparison operators (ie. startswidth
) with lists.
Once we have the dedicated issues, we can probably close this one?
Once we have the dedicated issues, we can probably close this one?
Yep. I'll close this once I open all the GH issues I've in mind :)
Motivation
Proposing to add subtle extensions to Falco's rules expression language in order to make event filtering even more powerful and more convenient to use while not sacrificing performance.
Listing a few options in no particular order. Drawing inspiration from
Spark Scala SQL
type of filtering statements, but most Big Data frameworks support these type of expressions, all adopted from SQL.Possible Extension 1:
When a direct comparison of one field to the values in a list is not possible, currently for example
contains
statements need to be repeated. In the case of baseline detections that do a lot of substring matching or rules with convoluted exclusion filters, the rules can quickly become less readable. Here are a few example from Falco's default rules:Proposing a generalization of the current list comparisons (
fd.type in (file, directory)
) to substring matching. For exampleSpark Scala
offers the following notations. Would love seeing such a feature extension, at the same type pretty open to what notation we choose and we could put all options on the table.Possible Extension 2:
Option to customize ip sub range matching also in a more compact way than it would currently be possible with multiple
startswith
orfd.net
statements, e.g. consider the following example to generate a list-> then leverage
startswith
filtersPossible Extension 3:
Operations on more than just one field from the same event followed by filtering operations.
+, -, *, /
) between two numeric fields -> use result for subsequent filtering operationsFeature
Extend Falco's rules expression language to additional "SQL" like filtering options common in Big Data frameworks.
The intended outcome of this issue is to put multiple options on the table and then see what is in the realm of possibilities for the near-term and also longer term.
CC @jasondellaluce @leogr
See also https://github.com/falcosecurity/libs/issues/1627