coroot / coroot-node-agent

A Prometheus exporter based on eBPF that gathers comprehensive container metrics
https://coroot.com/docs/metrics/node-agent
Apache License 2.0
314 stars 56 forks source link

Map Divergence across multiple nodes #115

Closed dorkamotorka closed 1 month ago

dorkamotorka commented 1 month ago

Hey, more of a question rather than an issue. Looking at the eBPF L7 code, there's active_l7_requests eBPF map and it's shared between both the functions triggered by the "write-ing syscall" (e.g. https://github.com/coroot/coroot-node-agent/blob/c40380d5963e06c50c6e2701f489a8ca94a1c880/ebpftracer/ebpf/l7/l7.c#L305) as well as on the "reading side" (e.g. https://github.com/coroot/coroot-node-agent/blob/c40380d5963e06c50c6e2701f489a8ca94a1c880/ebpftracer/ebpf/l7/l7.c#L412).

But shouldn't the "write-ing syscalls" be triggered on the client side and the "read-ing syscalls" on the server side, and the two aren't always running on the same node? E.g. two pods from two different nodes are communicating and consequently updating/lookup-ing/deleting two different eBPF maps?

As far as I understood this should then imply that each node keeps it's own records of active L7 requests.

Did I understood this well? How does it then even work?

def commented 1 month ago

The idea is to detect an L7 request when a write call occurs. We track this request using the PID and file descriptor in active_l7_requests. When a read syscall happens, we check for an active request with that PID/FD and look for the response in the payload. This approach allows us to monitor L7 requests solely from the client side, without involving the server side.

dorkamotorka commented 1 month ago

Hey, thanks for the explanation. But if the client was on the other node that the server, when server lookups for the PID/FD the active request won't be there. Does it that case just skips and this works only for the case where the two are on the same node?

def commented 1 month ago

Imagine performing curl https://google.com. In this scenario, you can measure request latency, check its status, and gather other metrics without having direct access to Google's servers. Similarly, using eBPF, you can collect L7 request statistics exclusively from the client side.

dorkamotorka commented 1 month ago

Thanks for the patience - I think I got it :) Closing.