eclipse-ankaios / ankaios

Eclipse Ankaios provides workload and container orchestration for automotive High Performance Computing (HPC) software.
https://eclipse-ankaios.github.io/ankaios/
Apache License 2.0
60 stars 18 forks source link

Configurable workload health checks #109

Open krucod3 opened 9 months ago

krucod3 commented 9 months ago

Description

What a running workload means is a matter of interpretation and is very specific to the use-case and the workload. For some use-cases a basic container is running is enough (current behavior), for others a readiness of the application(s) running in a container is needed. On the other hand not all applications and workloads support or can be changed to support an active readiness polling or pushing.

To support all use-cases we shall extend Ankaios to support per workload health check configurations. If no specific configuration is set, the default (current) behavior is used.

Note: It is also very important to define when a workload is stopped. If we need to wait for one workload to be stopped before we can stop another one (e.g. to allow the first to write data to the second), it is important to properly mark the stopped state too.

Goals

Extend Ankaios with a configurable per workload health check support.

Final result

Summary

To be filled when the final solution is sketched.

Tasks

Initial task list

krucod3 commented 9 months ago

The current assumption is that we don't have readiness-only checks. In Ankaios the workload state must always be monitored and health checks are done as liveness probes.

krucod3 commented 9 months ago

For completeness here are the links posted by @windsource in the dependency issue:

inf17101 commented 5 months ago

For now the default is, that Ankaios treats a workload as running if the runtime shows it is running. The inter-workload dependency feature relies on the supported definitions of "running". So, if a change would be made with this issue then inter-workload dependencies must be considered.