Open elokaac opened 1 year ago
Hmm, this seem somewhat related to the proposals around cancelling in-flight activities, doesn't it @cgillum?
@davidmrdavid yes (and terminating sub-orchestrations). One thing we hadn't talked about yet was a potential programming model for orchestrations to do this, which I think is what this issue proposes.
/cc @jviau
Yeah I would love to see this. I have not done this myself, but I think you can get pretty close to a cooperative cancellation model via orchestration events. As in, send an event which sets a CancellationTokenSource
within your orchestration implementation to Canceled
. I don't know how this will affect sub orchestrations or activities though. I suppose that is where a framework level support would be important.
I think this would be a great feature to add to DTFx. The main differentiator from Termination
here would be this is cooperative cancellation. So an orchestration can elect to perform cleanup work. Or if the orchestration is past some point of no return, it can ignore the cancellation and continue.
I came here looking for how I was expected to implement graceful shutdown in my orchestrations and activities. I have a query that takes 17 minutes and, although SIGKILL isn't the end of the world, in general, I would like to have some notice/control of shutdown when the container orchestrator decides to scale in.
Now, I've only spent an hour looking through the code so this will be rough.
TaskHubWorker already supports graceful shutdown and it's propagated down into the dispatchers. Each layer in this propagation has a CancellationTokenSource specifically for shutdown. I see two options for implementing this that won't create a breaking change:
The first is the most work and requires some thought as it appears the dispatcher just fires and forgets. I followed the second approach for activities and it looks like relatively easy change. For orchestrations it looks a little more complicated as the OrchestrationContext is created in the constructor of TaskOrchestrationExecuter. I followed that up a bit but didn't get to the point where this constructor and processWorkItem converge.
@JohnWFlaherty TaskHubWorker
shutdown / cancellation is not related to this feature. All active work items on a worker during shutdown are expected to resume on another instance, so we do not want to be propagating the cancellation token from the dispatcher to work items. Passing the shutdown token would undo the durability of these work items.
I see the original issue post asks for that, but it is not what we would go with in the end. The design would be a cancellation token available in an orchestration but triggering that cancellation would be done by explicitly sending a cancel message to the orchestration.
Similar to #565 , it would be greate if TaskActivities/TaskOrchestrations could be dispatched with a developer-provided cancellation token.
For example, if we could provide a cancellation token as part of
StartAsync
on a single worker node, that can be propagated to dispatched orchestrations/activities to allow developers have more control over termination of running activities/orchestrations.An example use case would be interface activities: