eclipse-ankaios / ankaios

Eclipse Ankaios provides workload and container orchestration for automotive High Performance Computing (HPC) software.
https://eclipse-ankaios.github.io/ankaios/
Apache License 2.0
63 stars 23 forks source link

Unsupported runtime not handled properly #370

Open krucod3 opened 2 months ago

krucod3 commented 2 months ago

Currently setting the runtime of a workload to something wrong is not handled correctly.

Current Behavior

If a workload is initially scheduled with a wrong runtime name, the workload stays in a Pending(Initial) state. Additionally if the runtime is changed for a running workload, a retry (20 times) is started to schedule the workload to the new not existing runtime.

The ank CLI also did not exit after updating the workload back to an existing workload as it never received a removed for the old workload.

Expected Behavior

The starting of the workload shall fail.

Steps to Reproduce

  1. create a workload with a not existing runtime name, e.g., "not_existing"
  2. observe the workload states

Or

  1. update a running workload to use a not existing runtime name, e.g., "not_existing"
  2. observe the workload states

Context (Environment)

Logs

Additional Information

Final result

To be filled by the one closing the issue.

krucod3 commented 2 months ago

The initial definition of a workload with an unknown runtime can be fixed by adding the following to the runtime_manager.rs add_workload method:

            self.update_state_tx
                .report_workload_execution_state(
                    &workload_spec.instance_name,
                    ExecutionState::starting_failed(format!(
                        "Runtime '{}' not found.",
                        workload_spec.runtime
                    )),
                )
                .await;
krucod3 commented 2 weeks ago

Let's shift this to the v0.6 as it is a minor and we don't have the time for it now.