Open yanivagman opened 2 years ago
On events pipeline in tracee-ebpf, drop events with low priority when required
that might be too late, if we're defining the desired solution here, I think we want to drop is in eBPF (shoud_trace)
On rules engine in tracee-rules, don't send events to rules with low priority when required
isn't this redundant if we're dropping the events in tracee-ebpf?
Implement load monitoring in tracee-ebpf Implement load monitoring in tracee-rules Expose an API to provide statistics of tracee-ebpf and tracee-rules dropped events/rules
related to #887
Expose an API to set events/rules priority
related to #636
On events pipeline in tracee-ebpf, drop events with low priority when required
that might be too late, if we're defining the desired solution here, I think we want to drop is in eBPF (shoud_trace)
Yes, I was just about to update this issue with the following suggestion: set bpf map with the events to drop and drop in bpf code.
On rules engine in tracee-rules, don't send events to rules with low priority when required
isn't this redundant if we're dropping the events in tracee-ebpf?
No. There might be rules that use events with high importance, for example execve, yet the rule itself might not be that important
Implement load monitoring in tracee-ebpf Implement load monitoring in tracee-rules Expose an API to provide statistics of tracee-ebpf and tracee-rules dropped events/rules
related to #887
Expose an API to set events/rules priority
related to #636
Yes, I was just about to update this issue with the following suggestion: set bpf map with the events to drop and drop in bpf code.
So what you are suggesting here is to create a new bpf map should_drop
, and if an event is defined there then we won't call events_perf_submit
?
Yes, I was just about to update this issue with the following suggestion: set bpf map with the events to drop and drop in bpf code.
So what you are suggesting here is to create a new bpf map
should_drop
, and if an event is defined there then we won't callevents_perf_submit
?
Maybe that won't be necessary if we will use the already existing chosen_events map
Sounds good. I am just concerning regarding concurrency. I believe we should start implementing synchronization mechanics in our maps from user space at least.
After taking it with @yanivagman a first approach would be to:
DropLoad
which will: go over the events in chosen_events
and remove them if their priority is greater than the minimal one. Then update the state of current minimal priority being the old one minus 1.The API of DropLoad
can be used manually and/or in future we can have a self-healing mechanics that will get statistics from monitoring engine (open telemetry e.g.) and if the system is overwhelmed then automatically tracee-ebpf will call DropLoad
. Something similar can be done with tracee-rules.
WDYT? @itaysk
SGTM, a couple of suggestions:
we need to be able to keep track of what events the user chose (which is what chosen events originally meant to do) in addition to what events we actually trace (may change due to implicit events, or now overload). we should be able to always refer back to what the user originally asked.
the api IMO should take a target threshold instead of decrement. I'd suggest SetPriorityThreshold(int)
we need to be able to keep track of what events the user chose (which is what chosen events originally meant to do) in addition to what events we actually trace (may change due to implicit events, or now overload). we should be able to always refer back to what the user originally asked.
This is true, but remember that we already keep track of what events the user chose by t.eventsToTrace
in userspace. chosen_events
bpf map was indeed equal to this userspace map (for entries with value set to true
), but the intention was to avoid sending irrelevant events to userspace. So actually, there is no need for the bpf code to know which events were chosen by the user, but which events are required to be submitted to the perf buffer. So we might want to rename this bpf map to something like events_to_submit
and then it will be clear what is the purpose of this map.
@NDStrahilevitz this is one issue you should keep track of (for the major 'filtering improvement' effort you're handling).
When system load is high, we might be required to drop some events/rules. Currently we don't have a mechanism to prioritize events/rules, neither a mechanism to reduce load consumed by tracee-ebpf and tracee-rules. To improve system performance on high loads, the following can be implemented: