Open hajapy opened 2 years ago
Hi there, wondering if there's been any progress on this one? (:
This is a feature we need currently for our use case. Cheers
There has been no progress on this one, but it is infact doable. Only thing is we need async support.
i have a custom task plugin which creates an AWS resource. what i want is, when i execute this task as part of workflow, and i terminate the workflow, the AWS resource should be deleted. i would assume that when a workflow is aborted a SIGTERM is sent to the pod running that task (lmk if thats not the case?) and i am trying to define a shutdown hook which will run some cleanup code.
i am not receiving a SIGTERM or a SIGINT in my code. how can i attach a shutdown hook that is run when the workflow is terminated
SIGTERM should be sent to the container: https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-terminating-with-grace
Some reasons for why you may not be catching it: https://stackoverflow.com/questions/72544506/sigterm-not-sent-on-pod-delete
It’s probably easier to just do a docker run with the same command that you can copy from the flyte Pod spec to try things locally… I would check the running processes in the container to see what’s process 1… etc. is it the shell or is it the python process
the shell command is PID 1. hmm, so the SIGTERM signal is sent to the the pyflyte_execute and its not forwarded to the python process.
Work is to be done to make pyflyte execute handle sigint and sigterm
Thanks @kumare3 for the updates. I'm not sure where the voting booth is, but I'd like to cast a ➕1 for this. I imagine orphaned resources from early workflow termination is a common occurrence.
Hello 👋, This issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! 🙏
Hello 👋, This issue has been inactive for over 9 months and hasn't received any updates since it was marked as stale. We'll be closing this issue for now, but if you believe this issue is still relevant, please feel free to reopen it. Thank you for your contribution and understanding! 🙏
This is a pretty cool idea that increases the utility of plugins. I'd like to mention that as all graceful shutdown mechanisms, this one is also best effort (so not guarantees that it will execute 100% of the time).
Hello 👋, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable. Thank you for your contribution and understanding! 🙏
Motivation: Why do you think this is important?
As a flytekit plugin author, I would like to be able to define cleanup actions (in python) that should be run when a flyte task or workflow is aborted (or terminates unexpectedly). For example, this could be used to let the plugin cancel long-running operations in external services if a user aborts a workflow execution.
Goal: What should the final outcome look like, ideally?
The flytekit PythonTask interface could expose a new method (analogous to execute) that would be called upon termination of the task via an external signal.
Describe alternatives you've considered
An effort was made to use the stdlib's signal module to register handlers from the custom plugin module, but this didn't ultimately work (the signal seemed to be intercepted by pyflyte-execute).
Propose: Link/Inline OR Additional context
This suggestion was discussed in a flyte slack thread.
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?