falcosecurity / falco

Cloud Native Runtime Security
https://falco.org
Apache License 2.0
7.23k stars 893 forks source link

[Tracking] Opening multiple event sources in the same Falco instance #2074

Closed jasondellaluce closed 1 year ago

jasondellaluce commented 2 years ago

Motivation

The plugin system allows Falco to open new kinds of event sources that go beyond the historical syscall use case. Recently, this has been leveraged to port the k8s audit log event source to a plugin (see: https://github.com/falcosecurity/plugins/tree/master/plugins/k8saudit, and https://github.com/falcosecurity/falco/pull/1952). One of the core limitation that comes from the plugin system implementation of the libraries, is that a given Falco instance is capable of opening only one event source. In the example above, this implies that Falco instances are not able to ingest both syscalls and k8s audit logs together. This can instead be accomplished by deploying two distinct Falco instances, one for each event source.

Feature Requirements

Proposed Solution

Release Goals

~To be defined. This is out of reach for Falco 0.32.1.~ Falco 0.33.0

Terminology

Design

Technical Limitations of the Design

Technical Blockers

This is the list of things we mandatorily need to work on to see this initiative happen.

Nice to Have

Linked Discussions

jasondellaluce commented 2 years ago

I think this is something we should aim for over the next few months. So, I propose to turn this issue into a tracking issue where I'll share the though process of my proposed solution and all the steps and PRs involved. I expect this to be the place in which we can have discussions about this solution, and perhaps converge to something we all agree on.

/milestone 0.33.0

jasondellaluce commented 2 years ago

I'm gonna share the thought process that led me to the proposed solution documented in the issue description. @leogr has been involved in the early discussion of this topic.

Since the end goal is to run multiple sources at the same time, many execution models have been hypothesized, but each of them has been discarded in favor of the proposed solution. I'm gonna proceed and document briefly their design and cost/benefits tradeoffs.

Reasonably, going up in the stack the next synchronization point would be Falco itself, which is exactly the solution we proposed. This is the only design we came up with that ensure total and non-blocking parallelization of each event source processing routine. The only sequential bottleneck becomes the Falco Output Engine, which is natively designed to be thread-safe(~ish), and has the benefits that the number of alerts is orders of magnitude lower than the number of events produced by each event source (syscalls in particular), which ensure acceptable performance and negligible synchronization overhead.

araujof commented 2 years ago

Nice feature and write-up, @jasondellaluce!

For D6, have you thought about issues of fairness, etc. when the rate of system calls far exceeds the rate of other log sources? I think that some queue management will likely be necessary to avoid contention/blocking. A thread pool in the rules engine could also help with throughput, at the expense of allowing events to be exported out of order.

jasondellaluce commented 2 years ago

@araujof thanks for reviewing, those are all good points!

For D6, we can achieve good thread safety guarantees from the Rule Engine with the assumption that every thread gets assigned different event source. This stands before the Rule Engine is entirely partitioned by event source. See this for more details: https://github.com/falcosecurity/falco/pull/2082. Assuming that the Rule Engine level can provide wait-free concurrent access, I'd say fairness should be guaranteed at that level.

Then, once alerts are triggered they go through the Output Engine, where fairness and ordering are a big topic instead. Luckily, thanks to @leogr we already have a concurrent ordered queue there. There may be a fairness point if the queue gets filled-up with alerts coming from one event source, but that's an unlikely scenario (If Falco fills up the queue, it means that either the outputs are all blocked, or that some rule is really noisy). A simple workaround would be to increase the queue size. We only have one worker on the other side of the queue though, and we might consider using a thread pool maybe in the future.

What do you think? Does this address your concerns?

araujof commented 2 years ago

Hi @jasondellaluce,

What do you think? Does this address your concerns?

Sounds good to me! Thanks for the pointer to #2082.

mstemm commented 2 years ago

Thanks for the very thorough write up! I completely agree that this is the best/least-bad approach to take. Here are some more specific comments:

(D1): you're right that only falco will be opening multiple inspectors. But (B9) is going to result in a lot of changes to make opening multiple inspectors possible.

(D4): outside of test cases/ad-hoc testing, I don't think anyone would run falco on a trace file, right? Might want to treat this as a special case and not try too hard to support multiple inspectors.

(B3): I don't recall the implementation, but why can't you partition around handles or inspectors to avoid making the async readers concurrent?

(B6): Any concurrency improvements will be limited to partitioning by event source, right?

(B8): Instead of removing the token bucket, you could move it to the other side of the outputs queue, right?

(B9): You're right that these might be safe because they are read-only, but I think this would be a great time to move them into the inspector, simply as a cleanup step. I think there are some other potential globals like the dns manager singleton that need to be cleaned up as well.

jasondellaluce commented 2 years ago

Hey Mark, responding point by point:

terylt commented 2 years ago

Hi Jason, sorry, I'm late to the party, so take these comments with a grain of salt. Since the falco rules engine is pretty stateless, would it be easier to just build one engine per source? One could mitigate some of the hot loading issues by compiling the rules once, and cloning new engines in the background and only switching over when the new ones are ready. One engine per source may also help avoid contention issues, lessen drops, and make it easier to debug. Do you want to be able to overlay events from multiple sources on top of one another?

jasondellaluce commented 2 years ago

Thanks for reviewing @terylt! This is actually a very valid suggestion, and It's something we should consider. The clear benefit of this is that multiple event sources with the same name could be loaded (with the exception of syscall, that carries other technical constraints), and that falco_engine would not need to be thread-safe anymore.

The downside of this is that loads would need to be loaded multiple times (slower initialization time). Cloning the engine would not work, because each engine would be configured with a given event source so they would all be different and rely on distinct sets of available rule fields. Also, it would make hot-reload features like the one described in https://github.com/falcosecurity/falco/issues/1692#issuecomment-1110951413 harder to implement.

At the same time, this is the one change we can consider making even later into the process. So since we don't have data races in our current Engine implementation, my call would be to have a first feature release that uses only one engine, and eventually switch to multiple ones if we find limitations while experimenting. What do you think?

terylt commented 2 years ago

@jasondellaluce What takes the longest for initialization of the engine? Is it compiling the rules into a datastructure? If so, is that data structure read-only? I.e., could it be shared across multiple engines? That way you'd only have to compile it once, and share it. If it's not being modified by any of the engines, you could get away with one. If we have a list of major initialization costs, we could figure out ways to reduce each one.

In terms of multi engine vs. single engine. I don't have a strong opinion and am happy with whatever route you want to take. Just sharing from my past experience on this topic. I find pushing all events into a single pipe can be harder to get right coding-wise and can introduce overheads in context-switching and lock contention, so I tend to gravitate to avoid locking and contention issues when I can. I'm happy to help or support in anyway I can during the building of this feature, whichever approach you decide.

jasondellaluce commented 2 years ago

@terylt the initialization cost of the rule engine is related to loading the rule files themselves. If we decide to have one engine for each event source, each engine should load all the configured rule files, because the way rule files are loaded is influenced by the event sources known by the rule engine. For example, the engine throws errors if a rule uses a field that is not defined for the rule's event source, and skips rules of unknown event source. There would be nothing to share between engines, if not lists and rules abstract definitions. Plus, currently rule files reading and compilation into an evaluable data structure currently happens in one step, so this approach would require relevant refactoring at that level.

The reason why I didn't yet do deeper research on this route is that having a 1 shared rule engine still does not require any locking mechanism in our case. The rule engine is currently totally partitioned by event source, so as long as we assign 1 event source to 1 thread, everything will work as-is. The only synchronization point is the rule stats manager inside the rule engine, which can just be turned into an atomic counter.

EDIT: another point for having 1 shared engine is that we have aggregated stats by design. If we do otherwise, we'd need some way to merge the results of stats_managers coming from different engines.

dwindsor commented 2 years ago

(D1) The feature is implemented in Falco only, and mostly only affects the codebase of falcosecurity/falco. Both libsinsp and libscap will keep working in single-source mode

Also late to the party, but we have a use case for wanting support for this at the libs level (using the libs as the basis for a collector/sensor, but none of the other bits). Might not be a common use case, but figured I'd mention it 😀.

Looking at the discussion around (B9) my vote would be to go with the @mstemm's suggestion and try to find a way to cleanly handle the globals, and (perhaps?) be able to go the route of adding libs-level support?

(B9): You're right that these might be safe because they are read-only, but I think this would be a great time to move them into the inspector, simply as a cleanup step. I think there are some other potential globals like the dns manager singleton that need to be cleaned up as well.

jasondellaluce commented 2 years ago

@dwindsor thanks for reviewing!

Also late to the party, but we have a use case for wanting support for this at the libs level (using the libs as the basis for a collector/sensor, but none of the other bits). Might not be a common use case, but figured I'd mention it 😀.

No worries, we're still in the process of designing and developing this. As for the analysis of https://github.com/falcosecurity/falco/issues/2074#issuecomment-1162922363, I couldn't find an optimal way to support event source parallelism in neither the libscap nor the libsinsp levels.

I think your point is very valid, supporting this in libs could be valuable for all the libs stakeholder. Do you have something specific in mind? Given the analysis linked above, here's my proposal. We could maintain the assumption that each sinsp inspector is single-source and single-thread, and then created another library inside libs that implements the inspector parallelization logic. Such a library would sit inside something like userspace/libsinp-multi-source, and would become optionally usable for consumers that wish to run more than one inspector, with different event sources, and eventually with multi-threading. This would make the Falco code easier to write, would have no effect on current libsinsp/libscap implementations, and would also enable libs consumer to benefit from this new feature. Let me know if this would work for you, so that I can proceed with a POC.

Looking at the discussion around (B9) my vote would be to go with the @mstemm's suggestion and try to find a way to cleanly handle the globals, and (perhaps?) be able to go the route of adding libs-level support?

I agree with both you and @mstemm. My opinion here is that these (Falco multi-evt source and libs globals cleanup) should follow two separate workstreams and efforts. The former could be properly implemented even without the latter, so I'd propose to work on both, even in parallel if there is enough developer capacity. My fear is that removing sinsp's globals will be a bit tedious.

jasondellaluce commented 2 years ago

FYI, I moved the discussion of point (B3) about the Plugin SDK Go into another dedicated issue. There are some interesting challenges involved, and I felt like it was worth documenting the thought process so that everyone could jump in and share some feedback.

👉🏼 https://github.com/falcosecurity/plugin-sdk-go/issues/62

FedeDP commented 2 years ago

Just my 2c here! I agree: we can provide an abstraction library on top of libsinsp; at the very same time, i don't really think it matters: since "opening a sinsp inspector" won't require any synchronization, clients of the libs can just open multiple inspectors binding each of them to a new thread. Therefore, i can't really see the practicality of yet another library just to abstract multi-libsinsp support. In any case, supporting something like that should not be a big deal in the end.

My opinion here is that these (Falco multi-evt source and libs globals cleanup) should follow two separate workstreams and efforts. The former could be properly implemented even without the latter, so I'd propose to work on both, even in parallel if there is enough developer capacity. My fear is that removing sinsp's globals will be a bit tedious.

Agree with this also! We would be thrilled to see contributions in this sense too :)

Btw this is a huge effort and will be the feature of Falco 0.33 IMHO, and this discussion seems to confirm that! So glad @jasondellaluce started this thread :)

dwindsor commented 2 years ago

Do you have something specific in mind?

@jasondellaluce thanks for the detailed analysis 🙏. Thinking about this more, I'm not sure our use case should be addressed by adding libs-level support for ingesting plugin data.

We currently are collecting syscall telemetry using libsinsp, serializing the telemetry and dispatching it to an analysis service, which then generates alerts, etc based upon state collected.

The use case I was thinking of was to write a plugin e.g. a uprobe that attaches to libssl and returns connection-related data, and having that plugin's data routed through libsinsp so we can receive that telemetry without installing full Falco.

But, I feel that libsinsp might not be the appropriate place for handling plugin logic. sinsp, to me, is for low-level system inspection (syscalls, etc) - highly granular data that comes in at a very high rate. SSL connection data doesn't feel like it belongs in a system inspect library.

We could maintain the assumption that each sinsp inspector is single-source and single-thread, and then created another library inside libs that implements the inspector parallelization logic.

Yeah, we'd need another level of indirection here, and also account for the hugely different speeds of events coming in... I agree with @FedeDP that it wouldn't be worth it.

We're already able to ingest other forms of telemetry alongside libsinsp syscall telemetry, so no big deal! 😀

jasondellaluce commented 2 years ago

The use case I was thinking of was to write a plugin e.g. a uprobe that attaches to libssl and returns connection-related data, and having that plugin's data routed through libsinsp so we can receive that telemetry without installing full Falco.

@dwindsor I think your use case would be well suited to be implemented as a plugin. Falco plugins are actually "libs" plugins: the plugin framework is implemented at the libscap and libsinsp level. In fact, the goal of the plugin system was to allow collecting events and extracting fields from them from any kind of event source, just like we traditionally did for syscalls.

Yeah, we'd need another level of indirection here, and also account for the hugely different speeds of events coming in... I agree with @FedeDP that it wouldn't be worth it.

Agreed on avoiding another level of indirection. The "different speeds of events coming in" sort of confirms the decision of having single-source-single-thread sinsp inspectors. With this assumption, the different event rates is not relevant as you can have multiple inspectors running in parallel each with dedicated thread and event source, and then collect events from each of them whenever available. In this model, event sources do not influence each other and the only "sequential bottleneck" is the final event collection point, which will likely happen in a shared place. This is what I plan to do with Falco, with the extra step of evaluating event rules in parallel too and then collecting only matching rule alerts, which have a fairly lower rate.

In this perspective, I think the use case you described could be achieved by:

In this way, you'd collect both types of events in the same way. I think this could be valuable for the community too, I would be happy to help if you think this route makes sense to you.

jasondellaluce commented 2 years ago

This are moving forward! Just opened https://github.com/falcosecurity/falco/pull/2182, which should be the final piece of glue code that uses all the preliminary work to implement the multi-source feature.

jasondellaluce commented 1 year ago

https://github.com/falcosecurity/falco/pull/2182 just got merged! 🥳

Will keep this open until the testing phase is over, just in case some extra changes are needed!