MatrixAI / Overwatch

Distributed Infrastructure Telemetry
2 stars 0 forks source link

Find out how to switch on/off BPF code for sampling purposes #10

Open CMCDragonkai opened 6 years ago

CMCDragonkai commented 6 years ago

The BCC code example appears to compile BPF code on the fly and insert it into the kernel. Where a probe function is called to receive an event. The default timeout is -1 which makes it block until it has received an event.

This means the kprobe_poll() call is blocking until there is perf event data to be read on the file descriptor.

What we want to do is to be able to sample events rather than running the monitoring code all the time. Because monitoring code adds overhead. We should quantify this and find resources on what is the performance overhead of having BPF code.

Once the BPF python program exits, it closes connections to the file descriptors and close "unloads" the BPF code bpf_module_destroy().

The kernel docs https://www.kernel.org/doc/Documentation/networking/filter.txt mentions the ability to at runtime attach filters and detach filters.

The main question is whether it's more performant to attach a filter, run our sample, detach the filter, or just leave the filter in the kernel and sample by reading the file descriptor, or not reading. Does BPF overwrite old data that hasn't been read. We need to do some benchmarks here.