eclipse-ankaios / ankaios

Eclipse Ankaios provides workload and container orchestration for automotive High Performance Computing (HPC) software.
https://eclipse-ankaios.github.io/ankaios/
Apache License 2.0
60 stars 18 forks source link

Ank apply does not detect deleted workloads during it runs #276

Open inf17101 opened 3 months ago

inf17101 commented 3 months ago

If the ank apply command applies a workload and during the workload is deleted during the apply, then the table output of ank apply shows the workload with workload state Removed, but it hangs and the spinner is still running.

Current Behavior

Ank apply hangs when it runs and a workload it touches is deleted.

Expected Behavior

The Ank apply command shall not hang and return if the workload it touches was deleted.

Steps to Reproduce

The situation can be easily reproduced with forcing retries when initially starting a workload, because then there is enough time for a human to enter the delete command during the ank apply runs. Of course, the bug can also be reproduced without using the retry feature but with scripting.

  1. Start the Ankaios server: ./ank-server
  2. Start the Ankaios agent: ./ank-agent --name agent_A
  3. Apply the manifest invalidworkload.yml mentioned below: ./ank apply invalidworkload.yml
  4. Start a second shell and enter a delete of the workload inside the manifest: ./ank delete workload invalidworkload
  5. Switch back to the ank apply terminal window and you will see that the ank apply command still runs and stucks but the workload has already the workload state Removed because of the delete request in the other shell.

Context (Environment)

All supported platforms are affected, but detected with: Linux amd64 (64 bit) Ank apply CLI command

Logs

Hanging ank apply screenshot: image

Additional Information

# invalidworkload.yml
apiVersion: v0.1
workloads:
  invalidworkload:
    runtime: podman
    agent: agent_A
    runtimeConfig: |
      image: docker.io/busybox:latest
      commandOptions: ["--invalidoption"]

Final result

To be filled by the one closing the issue.

inf17101 commented 3 months ago

In general, it must be discussed how the ank apply reacts to changed workloads during it runs (not only when deleting a workload) and if we consider this as a bug or expected behavior.