Closed inf17101 closed 9 months ago
The delete command currently forces Ankaios to kick out the workload from the current state. The ank CLI is depending on the workloads section in the current state of the get state
command and ignores the state only
section if the workloads don't have a corresponding entry in current state.
This is actually caused by a too simple logic in the Ankaios server. The workload should not be deleted from the current state directly, but moved to stopping
and deleted only after a state messages for the successful delete is received.
I have analyzed behavior how the server handles deleting workloads. I can see three scenarios.
ank set state
instead of ank delete workload
ank set state
, but object mask points to the workload being deleted.The case no. 1 more detailed.
currentState
. In other words asks the server to update whole state.The server:
currentState
to the newState
.ank delete workload
is only a special case of the ank set state
command. With the ank set state
we can set the object mask in such way, that it references only a part of the complete state (and therefore this step replaces only a part of the new state). See next scenarios.The case no. 2 more detailed.
This scenario is the same for the server. It differs only how the user does this in the CLI. In this case user sends ank set state
with the object mask currentState
. For the server it is the same. The server gets the same kind of request over the interface.
This case can be used when the user wants to do more changes in the config.
When the user wants to change/delete more workloads.
The case no. 3 more detailed.
requestId: ank-cli
startupState:
workloads: {}
configs: {}
cronJobs: {}
currentState:
workloads:
hello1:
agent: agent_B
name: hello1
tags:
- key: owner
value: Ankaios team
dependencies: {}
updateStrategy: AT_MOST_ONCE
restart: true
accessRights:
allow: []
deny: []
runtime: podman
runtimeConfig: |
image: alpine:latest
commandOptions: [ "--rm"]
commandArgs: [ "echo", "Hello Ankaios"]
hello2:
agent: agent_B
name: hello2
tags:
- key: owner
value: Ankaios team
dependencies: {}
updateStrategy: AT_MOST_ONCE
restart: true
accessRights:
allow: []
deny: []
runtime: podman
runtimeConfig: |
image: alpine:latest
commandArgs: [ "echo", "Hello Ankaios"]
nginx:
agent: agent_A
name: nginx
tags:
- key: owner
value: Ankaios team
dependencies: {}
updateStrategy: AT_MOST_ONCE
restart: true
accessRights:
allow: []
deny: []
runtime: podman
runtimeConfig: |
image: docker.io/nginx:latest
commandOptions: ["-p", "8081:80"]
configs: {}
cronJobs: {}
workloadStates:
- workloadName: nginx
agentName: agent_A
executionState: ExecRunning
- workloadName: hello1
agentName: agent_B
executionState: ExecRemoved
- workloadName: hello2
agentName: agent_B
executionState: ExecSucceeded
- workloadName: hello-pod
agentName: agent_B
executionState: ExecRunning
The difference (comparing to the start config) is that the workload hello-pod
is deleted.
./ank set state --file updateState.yaml currentState.workloads.hello-pod
Important is that the object mask refers to the workload hello-pod
which has been removed in the update config. This way we would like to delete the workload hello-pod
and nothing else.
Now the server must do something a bit different comparing to the previous two cases.
currentState
to the newState
.hello-pod
from the new state.The key logic described in the previous commend is in the server int he update_state.rs in the function update_state.
I agree with Kaloyan that the server shall not remove the workload directly, but set the state into stopping
instead. The tricky part is that the function update_state
can remove workload in two ways. Explicitly and implicitly.
The function deletes the workload explicitly with the code:
} else if new_state.remove(&field).is_err() {
return Err(UpdateStateError::FieldNotFound(field.into()));
}
This code is used in the scenario no.3. When the object mask points to the workload, which has been deleted in the update.
The implicit (or better to say "a silent") way of deleting workload is done with the code:
if new_state.set(&field, field_from_update.to_owned()).is_err() {
return Err(UpdateStateError::FieldNotFound(field.into()));
}
This code is used in the scenarios no.1 and 2. The object mask is set to the currentState
(i.e. to the root) and the deleted workload is only a subpart of the received complete state (update).
I have to think about it, how to fix the bug reported here and support all scenarios described here. It probably means to reimplement both functions in the update_state.rs
. The function update_state
and prepare_update_workload
.
Status update: I have discussed the current implementation with Kaloyan. The change here is dependent on other two issues:
In this ticket we have to:
StateChangeCommand::UpdateState
in the ankaios_server.rs as it is. Handling of this event is being changed in the other two pull requests.StateChangeCommand::UpdateWorkloadState
shall remain as it is here (remove the workload from the current state). Additionally we shall remove the workload state from the "workload state db".and get workloads
in the CLI. The table with workloads shall use all information from the complete state. Now the table takes only the workloads in the current state. But now we shall do this:
We had another discussion with Kaloyan and Christoph. We have agreed that we shall not change behavior of the StateChangeCommand::UpdateWorkloadState
(as described in the previous comment). The workload shall not be removed from the current state when the container has the autoremove flag.
In another words we shall do the change only in the ank get workloads
.
When a workload disappears for the runtime, it shall be handles with an extra state and not the same way as if Ankaios has deleted the workload. These changes will be made outside of the current PR.
The PR have been merged into main -> closing the ticket.
When executing "ank get workloads" during a container is in the podman state "Stopping", then Ankaios CLI does not output the workload with its state "Stopping".
If "ank get state" is executed instead, it lists ExecutionState "ExecStopping" instead during podman reports the state as "Stopping".
Related PR for improvement of execution states: #127
Current Behavior
"ank get workloads" does not output execution state "Stopping" (only "Running" is output). This is different to "ank get state".
Expected Behavior
"ank get workloads" shall output all execution states besides "Removed".
Steps to Reproduce
Use the current main branch at 8964d65
Ankaios Server startConfig.yaml:
ank-server -c /tmp/startConfig.yaml
ank delete workload hello1
to delete the workload hello1.ank get workloads
to see the absence of workload "hello1" with execution state "Stopping" (the delete lasts a few seconds so that this state shall become visible.)ank get state
instead ofank get workloads
to see that this time the execution state is output correctly during podman deletes the workload.Context (Environment)
Podman 4.6.2 Linux
Logs
Additional Information
Final result
The CLI has been fixed to show the stopping workload. See #154