eclipse-ankaios / ankaios

Eclipse Ankaios provides workload and container orchestration for automotive High Performance Computing (HPC) software.
https://eclipse-ankaios.github.io/ankaios/
Apache License 2.0
60 stars 18 forks source link

Recognize stopped workloads immediately #112

Open christoph-hamm opened 9 months ago

christoph-hamm commented 9 months ago

Description

At the moment the Ankaios agent calls podman ps once every second and uses this result to check if podman/podman-kube workloads are still running. Hence it could take one seconds between an application stopping and Ankaios being informed about the state change.

To be immediately informed, once a workload stops, Ankaios agent can get the PID of the workload from the podman ps result and use pidfd_open to get a file descriptor, which can be waited on using select, poll or epoll.

Final result

Summary

To be filled when the final solution is sketched.

krucod3 commented 9 months ago

We should clarify if this is constraining the supported OSs too much or add a fallback in case pidfd is not supported.