Closed inf17101 closed 5 months ago
In the latest podman docs (v5.0.1) https://docs.podman.io/en/v5.0.1/markdown/podman-run.1.html there is not explicitly mentioned that a podman run --detach
will block until the container is definitely started.
However, when testing Ankaios with a quick run command, an immediate delete and the run command again, it seems like that the behavior is correct and the end result is correct.
Commands: Ank server:
./ank-server
Ank agent:
./ank-agent --name agent_A
Ank cli:
./ank run workload nginx_1 --runtime podman --agent agent_A --config $'image: docker.io/nginx:latest\ncommandOptions: ["-p", "8081:80"]'; ./ank delete workload nginx_1; ./ank run workload nginx_1 --runtime podman --agent agent_A --config $'image: docker.io/nginx:latest\ncommandOptions: ["-p", "8081:80"]';
It seems like the command is blocking.
If Ankaios creates a workload through the podman cli (
podman run
) and receives a fast delete of that workload then the delete is executed before the workload was started, resulting in a too early executed delete operation inside Ankaios agent.In addition, it must be checked if it is the same non-blocking behavior for podman kube and other runtimes.
Current Behavior
The implementation of workload creation in PodmanCli does not block (https://github.com/eclipse-ankaios/ankaios/blob/main/agent/src/runtime_connectors/podman_cli.rs#L284). The
podman run
returns immediately the internal workload id of podman indetached
mode.If an image download runs in the background and Ankaios receives a fast delete right after the create, the delete is executed immediately (because the create operation was already finished inside the
WorkloadControlLoop
due to the immediate return of podman cli in detached mode). The delete operation deletes the control interface, but the create is still running on podman. When podman has finished the image pull it cannot start the workload because the control interface does not exist anymore.The workload is not started at the end, but the users sees a very strange error message (no such file or directory error) in the logs.
Expected Behavior
The workload shall be deleted and not started if the delete is received. The create operations shall be blocking, so that subsequent delete commands are correctly enqueued into WorkloadControlLoop and only executed after the create was completely finished.
Steps to Reproduce
agent_A
.backend
workload:./ank delete workload backend
Context (Environment)
Ankaios agent Podman CLI, podman 4.9.2
Logs
Additional Information
startupState.yaml
Final result
Currently, no changes required. The podman-run command seems to block even in detach mode. Pls, have a look in the last comments for more details.