Closed windsource closed 6 months ago
Interesting article about dependency management in systemd: https://unix.stackexchange.com/questions/331693/how-can-a-systemd-service-flag-that-it-is-ready-so-that-other-services-can-wait
Here an article to Configure Liveness, Readiness and Startup Probes in Kubernetes.
There are two sub-use-cases when handling the dependencies:
In order to directly start with the dependencies management, the server shall also first send the workload states before sending the list of desired workloads (reverse the current order).
We also should think about circular dependencies. If we have one, we should reject the configuration.
The standard case is relatively straight forward. We can build queues (HashMap) for workloads that don't have their dependencies met and look through the queues every time an UpdateWorkloadState
comes.
This case is more complicated. If there are workloads running, we have to take care of their dependencies too. For each found workload that is supposed to run, see if the dependencies are met:
reusing
queue and proceed with the other found workloads; mark the queue as changed; when finished with all workloads: All in all, probably the simple approach is good enough taking into account that not more then 100 Workloads will be executed on a node.
I think an open point is also "how to define/configure the inter dependencies of the workloads concretely"?
@lingnoi: the current definition is per workload a list of dependencies where each dependency is has a workload name and a execution state. Do you have some other ideas here?
To not make things over-complicated we need only the cycle detection for the server and could add something like a reverse dependency list: a hash table which stores per workload name a list of workloads that have dependencies on that workload.
I would go with the simple approach first and implement the cycle detection to check for invalid states, but without the whole graph implementations.
We first test the config if there is a cycle. When we receive new workload states of the state checker, we can go through this queue again and check if all of the workloads specified in the dependency lists are now running (when talking about the current default behavior), then we can start the workload.
I am just thinking about if an UpdateWorkload would imply some action as well when dependency management comes into play. If we have a workload A running with dependency B and C and someone updates the dependency B via Ankaios CLI, shall we do an action, too? Maybe through the update workload A is broken. Or do we want to say it is user failure if the update crashes workload A this case? (Just as additional throw-in to consider)
To not make things over-complicated we need only the cycle detection for the server and could add something like a reverse dependency list: a hash table which stores per workload name a list of workloads that have dependencies on that workload.
I was checking the WorkloadSpec and there already all information we need is inside.
I was thinking about using a Rc
I would go with the simple approach first and implement the cycle detection to check for invalid states, but without the whole graph implementations.
We first test the config if there is a cycle. When we receive new workload states of the state checker, we can go through this queue again and check if all of the workloads specified in the dependency lists are now running (when talking about the current default behavior), then we can start the workload.
This can be optimized if we can search for the workload that now has a new state, e.g., in a hash map.
I am just thinking about if an UpdateWorkload would imply some action as well when dependency management comes into play. If we have a workload A running with dependency B and C and someone updates the dependency B via Ankaios CLI, shall we do an action, too? Maybe through the update workload A is broken. Or do we want to say it is user failure if the update crashes workload A this case? (Just as additional throw-in to consider)
You are completely right here, we also need to handle properly updates and shutdowns in reverse order. This actually means that the server would also need to do some extra work of calculating delete operations for other workloads that have dependencies on the updated/deleted one.
To not make things over-complicated we need only the cycle detection for the server and could add something like a reverse dependency list: a hash table which stores per workload name a list of workloads that have dependencies on that workload.
I was checking the WorkloadSpec and there already all information we need is inside. I was thinking about using a Rc or Arc, maybe put inside a VecDeque, to reuse existing specs because they are already pushed into a data structure per runtime. Maybe we can use just a reference, no need to store the same information again. VecDeque is more suitable for Queue like data structures compared to HashMaps with the key / hash calculations.
Queues are great for preserving order or if no search operations are needed. The idea to have "a hash table which stores per workload name a list of workloads that have dependencies on that workload" is that I have as a key a workload name and as a value all it's dependencies. When we get an update workload state we can do a lookup of the Workloads we should take care of instead of making an exhaustive search.
As it seems we don't have any technical bottlenecks that would play a major role in the design of the feature. Even if we need to build a spanning forest of the dependency graph, the implementation wouldn't be that complicated and is not as such blocking in any way. I'll start collecting use-cases to completely clarify the problem space now. After that we can do it the test driven way and directly write some system (robot) tests covering the use-cases. We can then write the major design points down as requirements and after that start thinking on exact technologies for the implementation.
To not make things over-complicated we need only the cycle detection for the server and could add something like a reverse dependency list: a hash table which stores per workload name a list of workloads that have dependencies on that workload.
I was checking the WorkloadSpec and there already all information we need is inside. I was thinking about using a Rc or Arc, maybe put inside a VecDeque, to reuse existing specs because they are already pushed into a data structure per runtime. Maybe we can use just a reference, no need to store the same information again. VecDeque is more suitable for Queue like data structures compared to HashMaps with the key / hash calculations.
Queues are great for preserving order or if no search operations are needed. The idea to have "a hash table which stores per workload name a list of workloads that have dependencies on that workload" is that I have as a key a workload name and as a value all it's dependencies. When we get an update workload state we can do a lookup of the Workloads we should take care of instead of making an exhaustive search.
Ok I thought order is needed because we wanted to have a "queue" like described above, and we can then do a simple push/pop, without having the hash stuff on top. But is fine for me... I think we can discuss the implementation details later.
I understand the confusion now, queue as a place where elements are waiting, not as a data structure.
For all use-cases we should consider both the case where the workloads are on the same node and case where they are on different nodes.
@lingnoi: the current definition is per workload a list of dependencies where each dependency is has a workload name and a execution state. Do you have some other ideas here?
No, that's fine, thanks!
I have done a research about how systemd handles the dependency management and as a summary it has a fine granular dependency management with a lot options. There are two primary dependency management topics: Requirement dependencies and ordering dependencies
Requirement dependencies:
Ordering dependencies: Before/After: Order dependencies, specifying the order in which units should start or stop in relation to each other.
With only specifying requirement dependencies without an explicit ordering dependency, systemd asumes best and the services are boot in parallel (and the action of the keyword is applied). If we set an ordering dependency in addition, then of course it is according to the order.
But there are more fine-granular features which can be used when choosing a combination of those dependency management tools. As an example, when using Requires=
a unit on the right-hand side is explicitly stopped (for example through systemctl stop), then the defining unit is also stopped. Which means we it has indeed a more complex and fine-granular startup/shutdown behavior compared to just "basic" dependency management.
Here is a blog with examples: https://seb.jambor.dev/posts/systemd-by-example-part-2-dependencies/
I think we need to choose the scenarios that fits automotive use cases. No need to support all possible combinations. Hard dependencies are primary needed and then we need to discuss what we need on top.
For hard dependencies A -> B we would need to stop А before stopping B
To make think a bit simpler for the server, we could internally write to B that it is needed by A. This way the agent can decide alone that it cannot stop B before A is gone.
Edit: In the current interface this is handled by the server by sending a list of delete dependencies with the delete message. This workflow will work too, but if we implement an ordered shutdown, the server would need to send a delete messages for all workloads incl. their dependencies instead of one shutdown with all_workloads=true
.
I now also reviewed the basic configuration of crinit and really like the ideas there.
Considering the behavior of systemd and crinit, we can do the following To support both hard
and soft
dependencies:
First, we can extend the ExpectedState in the proto API with the following 2 values:
This will allow us to support the following use-cases:
... dependencies: { B:running }
A will be started only after B reached a running workload execution state
To stop B, Ankaios will first stop A, wait until A is stopped
state and only then trigger the stopping of B... dependencies: {B:starting}
A's start will be triggered by Ankaios only after the start of B is triggered at the runtime
To stop B, Ankaios will first queue in the stop of A and then queue in the stop of Bfailed
A -> B: useful to start a service only if another service failed (this is also something we can provide later)
A's config: ... dependencies: {B:failed}
A will be started only of B goes into a failed state
If B starts again, A remains thereIf we need it, we can also add the following dependency types later:
DEP_STARTING
, but ignores dependencies on shutdown DEP_RUNNING
, but ignores dependencies on shutdown To be able to handle all dependencies properly we would also need a transitional state stopping
for the workload execution states. There is currently also a bug (#123) about not handling Podman states correctly. It would be best to add the new execution state as a feature in a PR for this issue and after that fix the Podman states by mapping to the new state.
I like the idea to use hard and soft dependencies. I would not do more initially, because the other complex combinations of the services we have checked (systemd, crinit) for example does not fit automotive use case very well at the moment. Here we can learn in the future if something is needed in addition. But at a start I would go with the hard/soft dependencies with priority to hard (like you said). Dependency starting
and running
is good. For failed
I think we do not need it.
I am working on the Agent implementation of inter-workload dependencies now, meaning waiting list for added and deleted workloads. I have checked out a new branch and started with the system tests first for dependencies.
Shall we go with the following on agent restarts?
If agents are restarted and the reuse workload procedure must run, the dependencies shall not be considered when the runtime config of the workloads have not changed. In this case we resume the workloads.
If agents are restarted and the runtime config of a workload has changed we must do a replace. In this case we consider the dependencies of that workload. If all execution states are fulfilled for that workload we can immediately replace the workload. If not we put it on the internal waiting queue and wait if all dependencies are fulfilled. Considering the dependencies for changed workloads implies the following: If the workload has a dependency to a existing (reusable) workload with execution state succeeded
for example, but this workload has the wrong state for example failed
, then there are two possibilities:
I think why should not add extra logic to restart the dependency. But I am also wondering, if this problem exists. If the dependency also exists (is a reusable workload), the agent should restart this dependency anyhow.
I think why should not add extra logic to restart the dependency. But I am also wondering, if this problem exists. If the dependency also exists (is a reusable workload), the agent should restart this dependency anyhow.
If this dependency exists but in another not expected state, we resume it (reusing procedure). This means just the state checker is started and it would report the failed
state mentioned in the example above. In the reusing feature we do not restart workloads in general. We replace a workload if the runtime config has changed or we resume it otherwise. Meaning the behavior you mentioned can only happen if the dependency's runtime config has changed, too.
But if the runtime config has not changed and the dependency is just resumed then the workload depending on it would stay in the queue forever. Besides the user does an action and changes the something in the dependency so that it runs again and goes into the succeeded
execution state.
I would agree with your suggestion that the agent shall not care about this dependency in this case, because this would add more complexity to the reusing feature in combination with the inter-workload dependencies feature. In this case the user shall handle it.
Ok I have implemented the dependencies handling in the reusing feature like the following:
if an agent is restarted the startup state could contain: workloads with no runtime config change (unchanged workloads), workloads with runtime config changes (changed workloads), new workloads and it can contain less workloads than before the restart.
The unchanged workloads are resumed and the dependencies are not considered. We just starting the state checker but do nothing with those existing workloads => Resume !
For the changed workloads the dependencies are considered. Here are two sub cases:
The dependencies are considered for new workloads like without reusing feature. If a workload has dependencies it is put on the waiting queue and only started if all its dependencies are fulfilled. Workloads without dependencies are started immediately.
For sure, we ask ourselves why we cannot replace the workload having a dependency managed by the same agent immediately (point 2 above). The reason is that the existing interface to give the RuntimeManager the existing workloads on the runtime does not deliver the current workload states. It returns only the instance names. That was fine without dependency feature. To change that, the interface must be changed internally (relatively low effort), the concrete implementation of that interface of the podman runtime connector must be changed (relatively low effort) and the concrete implementation of that interface of the podman kube runtime connector must be changed (relatively high effort => we need to consider the volumes used for remembering config and specs).
So there are three possibilities:
I am also open to another way.
Since the summary was not fitting 100% anymore to the actual implementation and the past discussions, I have updated it and tried to include everything which is still important and was suitable for the context.
Description
Some workloads must only be started when other workloads are already running. The Ankaios state definition already contains a field to define dependencies for a workload but it has not been implemented yet.
It also needs to be defined what it means when a workload is up and running. Is it sufficient that a container is running or should we also support something like lifecycle hooks in Kubernetes?
Goals
Ankaios shall support dependencies of workloads such that workloads are started after all dependencies have been started.
Final result
Summary
Ankaios enables users to configure dependencies between different workloads. Since the dependencies rely on the workload states, the ExecutionStates were updated and changed. There are now major states and substates to handle the creation and deletion of workloads having inter-workload dependencies properly.
New execution states (major states):
And there are new sub states within the proto file as well:
The
running
workload state is a bit special as only getting a state from podman that the container is running does not mean that the app deployed with it is also running and ready. What doesrunning
mean must be specified and will be handled separately with #109. For nowRunning
means that the container was started and the runtime says it is running. Later this will be extended to other health checks.These dependencies can be of two types: explicit and implicit.
Explicit dependencies are configured by the user within a workload's configuration. Ankaios considers these dependencies when starting workloads, ensuring they only start when all dependencies are met. Users can define dependency types such as running, succeeded, or failed.
The so-called AddConditions were added to the Ankaios.proto:
Implicit dependencies are defined internally by Ankaios to prevent workloads from failing or entering undesired states when a dependency is deleted. These dependencies are automatically set and cannot be configured by the user. Ankaios does not stop dependent workloads of a dependency. It delays the delete until the dependent workload has reached the workload state matching the delete condition.
The proto file contains also an internal message for specifying DeleteConditions:
Ankaios ensures that manifests and workload configurations don't have cyclic dependencies, forming a directed acyclic graph. It also handles cases where workloads have dependencies that currently don't exist in the Ankaios state. In this case the workload creation is delayed.
Tasks
initial
,waiting_to_start
andwaiting_to_stop
workload states #144starting
andstopping
when receiving an add or delete workload (running shall not be sent anymore after that) #189, #125, #191, #149waiting_to_start
state; each receivedUpdateWorkloadState
triggers rechecking of the waiting workloads; if dependencies are met thestarting
state is emitted and the workload is scheduled at the runtime, #189waiting_to_stop
state; rechecking from above is also checking the to-be-deleted list; if dependencies are met thestopping
state is emitted and the workload is scheduled for deletion at the runtime, #189