argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.11k stars 3.21k forks source link

Better memory management in workflow-controller for pending workflows #8983

Open domderen opened 2 years ago

domderen commented 2 years ago

Summary

At the moment all workflows in the system are stored inside workflow-controller's memory, doesn't matter if they are in Pending, Active, Succeeded/Failed/Error states. There are solutions managing this memory usage for finished workflows, archiving them and automatically deleting them from memory.

I would like to have the possibility of better memory management for Pending workflows. As far as I understand the code, at the moment workflow-controller is creating an in-memory copy of every workflow it gets notified about from Kubernetes, even if it doesn't have the capacity to process it.

I'm wondering if there might be a better approach here, for example by keeping only a small amount of information for each pending workflow (for example only UID & priority), and getting the rest from Kubernetes when workflow moves from Pending to Active state?

Or possibly, another approach might be to offload all pending workflows into a disk based queue, rather than memory based one?

I'm creating this issue, to see what others this about such a solution. Am I the only one with this problem? Do my solution proposals sound doable? Are there other approaches to this problem that others are using?

Use Cases

I would like to be able to enqueue a large number of workflows for processing, even if this amount is MUCH larger than the processing capabilities of my Argo-Workflows cluster. We are running into a case where we might have 100k workflows created at one time, but being able to process only 1k in parallel. That means we must have a MASSIVE memory allocation for a workflow-controller, even though it is only being used to store objects that are just waiting to be processed.

We could build some kind of drip mechanism around Argo-Workflows, that adds new entries whenever there is room for it in the cluster, but it would require rebuilding not only the queueing mechanism but also priority handling. It sounds like a better approach would be to handle it in Argo-Workflows directly.

Thanks in advance for all comments!


Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.

agilgur5 commented 5 months ago

Conceptually similar to #12287

As far as I understand the code, at the moment workflow-controller is creating an in-memory copy of every workflow it gets notified about from Kubernetes, even if it doesn't have the capacity to process it.

Broadly speaking, yes, it uses k8s Informers to keep in sync with all Workflows resources in k8s. Notably, even without Argo storing these structs, the Informer and k8s control plane + etcd will still end up using a lot of memory for all those resources.

and getting the rest from Kubernetes when workflow moves from Pending to Active state?

This is theoretically possible, but it would substantially increase network I/O over the subscriptions that an informer uses. You're going to have a GET request for each of these, which will also increase processing latency. The implementation might also be a bit convoluted to achieve that, as you'd still want to be subscribed to active workflows but not pending ones.

Or possibly, another approach might be to offload all pending workflows into a disk based queue, rather than memory based one?

This has been discussed in upstream k8s before but is not currently possible with Informers (you'd need to write an entirely separate implementation, or contribute disk caches to upstream).

We could build some kind of drip mechanism around Argo-Workflows, that adds new entries whenever there is room for it in the cluster, but it would require rebuilding not only the queueing mechanism but also priority handling. It sounds like a better approach would be to handle it in Argo-Workflows directly.

Honestly, that's not a bad solution. Decoupling a load-based queue as a separate component may very well make for less complexity and more reusability than attempting to do it within Argo

Also for reference, Argo uses the upstream k8s workqueue for both of the mechanisms you mentioned.