eclipse-ankaios / ankaios

Eclipse Ankaios provides workload and container orchestration for automotive High Performance Computing (HPC) software.
https://eclipse-ankaios.github.io/ankaios/
Apache License 2.0
60 stars 18 forks source link

Agent provides basic node resource availability #282

Open krucod3 opened 3 months ago

krucod3 commented 3 months ago

Description

To implement a dynamic scheduler as a plugin in Ankaios, one would use the operator pattern and let a workload (assume called composer) running that instructs the cluster (over the Ankaios server) to stop and start workload on distinct nodes. In order to start services where resources are available, the composer must know the resource availability on all nodes on the cluster. This can be achieved by running a monitoring workload on each node, but could also be done by providing the information by the Ankaios agent via an event sent to the server (and from there distributed to interested workloads).

The resource data could also be interesting for general monitoring and logging purposes and also during development.

Goals

Final result

Summary

To be filled when the final solution is sketched.

Tasks

krucod3 commented 3 months ago

The proposed here changes would be a good supplement to the ank get agents command planned with #155.

krucod3 commented 2 months ago

We can do the sampling according to a configuration and handle a 0 time as don't do checks. @christoph-hamm, proposed to disable it as a feature.