argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
14.91k stars 3.18k forks source link

Global agent pod instead of one per workflow #7891

Open wujayway opened 2 years ago

wujayway commented 2 years ago

Summary

Hi all, I find argo workflows support plugins since 3.3. An agent pod will load plugins to serve no pods tasks. In #5544 , agent pod is designed to be one per workflow. I'm wondering if agent pod could be a global one instead of one per workflow. The agent pod could watch all workflowset and send request to the plugins. The global agent pod will save k8s resources and also improve performance.

Use Cases

In our prod, we create hundred thousands of workflows everyday. Most of workflows are short time tasks. We would use plugins to do some non-pods task(eg. email notifications or light calculation). One global agent pod can meet our requirements and also shorten workflow execution time.


Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.

alexec commented 2 years ago

One enhancement, we should not load plugins, unless workflow has plugins.

Plugins should be lightweight, if you need a heavy-weight one, then run a normal delpoyment and service, and the plugin should proxy to the service.

wujayway commented 2 years ago

Hi alexec, I understand what you concern. Maybe our k8s system is a little bit special. It is customized, and the creation of a pod may cost 10 ~ 15sec, even pod is lightweight. However, small tasks execute in seconds. So if there is a global agent to execute non-pods task, it will save the cost of creation of pod which improve a lot if the workflow consists of some small tasks. It looks like it is our special needs, since creations of pod in most k8s system is not so heavy. We would like to implement a global agent in our inner version.

alexec commented 2 years ago

Implementation

Currently the agent runs with the same service account as the workflow. This behaviour must be maintained.

So we cannot run one agent per namespace, but we can run one per service account per namespace.

Workflows could be smart, if no agent pod is in the namespace, then we it would start one.

Creation is actually upsert, and upsert uses the pod name for identity. So we can name the agent pod argo-agent-${serviceAccountName}.

https://github.com/argoproj/argo-workflows/blob/437b3764783b48a304034cc4291472c6e490689b/workflow/controller/agent.go#L75

The agent will need to be recreated if its spec changes (e.g. new plugin loaded), we can annotate the pod with a has of the spec to do this:

https://github.com/argoproj-labs/argo-dataflow/blob/6a6dbae9882ad6fd93341a81d35be709ae6c7a9a/manager/controllers/step_controller.go#L136

The way that works is simple:

Finally, when to delete the agent? We don't want to leave them running if not needed. At the end of workflow, we don't just delete it, we must also check to see if there are any other workflows that need it.

@wujayway would you like to implement this?

wujayway commented 2 years ago

Hi @alexec Thanks for guidance. The description of the lifecycle of an agent per service account per namespace is clear. There is one more question left. The agent executor bind with workflow now, I think the AgentExecutor executor should refactor to treat all workflows. Is my concern right?

alexec commented 2 years ago

Correct. You'll need to update the list watch to do this.

krmayankk commented 2 years ago

A kubernetes pod per workflow is better. We do not want one workflow to affect another workflow. It would be nice to also control in this way the resources for different workflows aka pods(excuting those workflows)

kpiroddi-yieldstreet commented 1 year ago

@wujayway was this implementation ever implemented?

kissycn commented 5 days ago

Same scenario, same needs

@alexec Does the community have plans to promote this feature?