aurae-runtime / aurae

Distributed systems runtime daemon written in Rust.
https://aurae.io
Apache License 2.0
1.85k stars 91 forks source link

Allow multiple cell nesting #302

Closed future-highway closed 1 year ago

future-highway commented 1 year ago

We are having issues with nesting cells at a depth greater than 1. This PR is meant to fix that.

The issue stems from the "no internal processes" rule for v2 cgroups.

...with the exception of the root cgroup, processes may reside only in leaf nodes (cgroups that do not themselves contain child cgroups).

Essentially, our nested auraeds, themselves a process, are not allowed to create cgroups. With this PR, only the root auraed will be responsible for creating cgroups, and the nested auraed will be responsible for spawning executables.

I'm want to use this PR to also fix graceful shutdown. We haven't been shutting down the spawned executables, which is causing issues with cleaning up the cgroups.


edit: GracefulShutdown now broadcasts stop to all executables (which sends a SIGKILL), allowing the cgroup. to be cleaned up.


NOTE: The api for CellService's start and stop has been updated, making cell_name optional. A null cell_name means auraed should spawn the process in its own cgroup. For the auraed that is true pid 1, that means the executable is not in a cell (it is in the root cgroup).