aurae-runtime / aurae

Distributed systems runtime daemon written in Rust.
https://aurae.io
Apache License 2.0
1.84k stars 91 forks source link

Support for `GetPosixSignalsStream` from nested daemons #366

Closed JeroenSoeters closed 1 year ago

JeroenSoeters commented 1 year ago

https://github.com/aurae-runtime/aurae/pull/336 introduces the first Observe gRPC endpoint. This endpoint produces a stream of POSIX signals traced on the host using an eBPF probe. The endpoint however doesn't take a cell name parameter just yet and therefore only works on the host daemon.

Note: if we are going to forward the Observe methods to the nested daemons we need to make sure host-level data is correctly translated to namespaced data in case the data lives in a namespace that is unshared.

dmah42 commented 1 year ago

it's vital that we do forward these calls on, as otherwise we don't get the full view of the node. how we report this might take a bit of thought beyond the namespacing (which is already an excellent point).

JeroenSoeters commented 1 year ago

For logging endpoints I totally see this. For eBPF probes it is a bit more nuanced. Take the signals endpoint, the host daemon is really the only daemon that can produce this response stream. The eBPF probe will only ever produce host PIDs, and the nested daemons would, by definition, have no means of mapping those host PIDs to their local namespace PIDs.

dmah42 commented 1 year ago

why wouldn't nested PIDs receive signals? (i may not be understanding the eBPF signals endpoint)

JeroenSoeters commented 1 year ago

They would receive signals, but the tracepoint in the kernel has no notion of namespaces, so the eBPF probe will only ever surface host PIDs. A nested daemon could listen for these events, but it would have no means of mapping those PIDs to the PIDs in its own namespace.

JeroenSoeters commented 1 year ago

Specifically for GetPosixSignalsStream, we could look up the namespace PID for every signal using the procfs crate: https://docs.rs/procfs/latest/procfs/process/struct.Status.html#structfield.nspid

JeroenSoeters commented 1 year ago

Going to scope this issue only to the GetPosixSignalsStream endpoint. It looks like the actual "forwarding" wouldn't be needed there. I'll create another issue to finish wiring up the logging, and in that issue, we could take care of the forwarding as well. Not sure if someone is actively working on those logging endpoints? It seemed like a while ago since that code has been touched.

JeroenSoeters commented 1 year ago

Closing this as this PR has been merged.