argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.11k stars 3.21k forks source link

support cancellation #13717

Closed chengjoey closed 1 week ago

chengjoey commented 1 month ago

Summary

support cancellation for workflow

Use Cases

Cancel the running workflow. If the current user wants to cancel the executed workflow, they can only delete it, so that the current environment cannot be retained. It is expected that the user can choose to cancel the workflow.

What I thought of is to add a field cancelled in the spec. When it is true, kill all containers through the wait container

The controller injects the downward api volume into the wait container. When the user cancels the workflow, the controller injects values into the annotations (workflows.argoproj.io/cancellation) of the workflow pods. wait listens to specific files under the downward api path. When the file generates content, it calls killContainers to cancel all tasks.

draft pr #13708, test case can refer to e2e test cancel_test


Message from the maintainers:

Love this feature request? Give it a ๐Ÿ‘. We prioritise the proposals with the most ๐Ÿ‘.

Joibel commented 1 month ago

There is already a ShutdownStrategy so I'd suggest that we shouldn't add a new field, just add a new value to that.

What is the approach with exit handlers going to be?

chengjoey commented 1 month ago

There is already a ShutdownStrategy so I'd suggest that we shouldn't add a new field, just add a new value to that.

okay

What is the approach with exit handlers going to be?

exit handlers by wait container killContainers

Joibel commented 1 month ago

What is the approach with exit handlers going to be?

exit handlers by wait container killContainers

I'm not sure what you're trying to say. My question wasn't well phrased, sorry.

I was trying to ask: Will exit-handlers be run in the case of a cancelled workflow? I think there are arguments for both options, and I wondered if you'd thought through the cases and come up with a suggestion. My inclination is to say they should run.

chengjoey commented 1 month ago

hi @Joibel , I got what you mean, I havenโ€™t considered exit-handlers yet. I also tend to run exit-handlers. I havenโ€™t added corresponding e2e tests in draft-pr. I will take this into consideration when it is fully implemented.

agilgur5 commented 1 month ago

If the current user wants to cancel the executed workflow, they can only delete it,

Workflows can already be "Stopped" or "Terminated" via the API/CLI/UI.

I'm not sure how "cancelled" is different from either of those? In fact, there is an existing issue for a Canceled status, #11849 , to reflect the state after "Stop" or "Terminate". Currently the Workflow is considered Failed, which is a bit inaccurate

What is the approach with exit handlers going to be?

Alan asked this because "Stop" and "Terminate" are different based on how they interact with exit handlers, like SIGTERM and SIGKILL, respectively. See also https://github.com/argoproj/argo-workflows/pull/11626

The controller injects the downward api volume into the wait container. When the user cancels the workflow, the controller injects values into the annotations (workflows.argoproj.io/cancellation) of the workflow pods. wait listens to specific files under the downward api path. When the file generates content, it calls killContainers to cancel all tasks.

This is an interesting alternative that may make #13307 possible ๐Ÿ‘€ Although it might still be necessary to use pods/exec for killing injected sidecars

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity and needs more information. It will be closed if no further activity occurs.

github-actions[bot] commented 1 week ago

This issue has been closed due to inactivity and lack of information. If you still encounter this issue, please add the requested information and re-open.