gardener / gardener-extension-shoot-falco-service

Gardener extension controller to deploy Falco into shoot clusters.
Apache License 2.0
0 stars 4 forks source link

Add audit safe handling and processing of Falco events #110

Open marwinski opened 1 week ago

marwinski commented 1 week ago

What would you like to be added:

Report Falco events into OCM gear for audit safe documentation-, and triage purposes.

Why is this needed:

Besides attacks, we expect Falco events to fall in the following two categories: false positives and events raised due to debugging activities. Events in these two categories must be processed in an audit-safe way. Events raised due to attacks must be processed in a different way but will also be available in OCM gear.

Implementation proposal:

We can currently group events in roughly two categories:

(1) All events that supposedly belong to a single debug session, a debug session event group (2) Event groups based on similar events (see issue #99)

In a debug session event group (1), all events are relevant for reporting while in an event group based on similar events (2) only one event as placeholder is relevant for reporting; possibly enriched with information on number of events, clusters, and nodes.

For case (1) we would generate a unique random id and submit it to the OCM gear together with all events. As a result, we would expect one single ticket that can be processed, for example by attaching a ticket number and closing it. Future debug event groups would trigger the creation of a new ticket because the id will be different.

For case (2) we would generate a unique id based on the similarities in the event group. This would mean that future event groups relating to the same events, would have the same id and can be identified by the OCM gear. A human operator would evaluate the event and either classify it as false positive or as malicious. If it is classified as false positive, the OCM gear would not crate new tickets for these events even if they are raised again. More, the component that aggregates and reports Falco events can retrieve the information “false positive” from the OCM gear and propose an exception to the Falco rule that raises the event.

In this context, the granularity of reporting event to the OCM gear shall be flexible: single cluster, multiple clusters, single project, multiple projects, etc. We can envision use cases for all of them.

ccwienk commented 3 days ago

@marwinski : lgtm. I would like to emphasise that, for OCM-Gear's issue-management to correctly work, it is important for those IDs to be stable.