Azure / durabletask

Durable Task Framework allows users to write long running persistent workflows in C# using the async/await capabilities.
Apache License 2.0
1.47k stars 287 forks source link

Support Graceful Termination of Orchestrations and Activities using CancellationToken #784

Open elokaac opened 1 year ago

elokaac commented 1 year ago

Similar to #565 , it would be greate if TaskActivities/TaskOrchestrations could be dispatched with a developer-provided cancellation token.

For example, if we could provide a cancellation token as part of StartAsync on a single worker node, that can be propagated to dispatched orchestrations/activities to allow developers have more control over termination of running activities/orchestrations.

taskHubWorker.StartAsync(cancellationToken);

An example use case would be interface activities:

public interface IInterfaceTask
{
    Task WorkAsync(CancellationToken cancellationToken);
}

public class Orchestration : TaskOrchestration<bool, bool>
{
    public Task<bool> RunTask(OrchestrationContext context, bool input)
    {
        var interfaceTask = context.CreateClient<IInterfaceTask>();
        await interfaceTask.WorkAsync(context.CancellationToken);
    }
}
davidmrdavid commented 1 year ago

Hmm, this seem somewhat related to the proposals around cancelling in-flight activities, doesn't it @cgillum?

cgillum commented 1 year ago

@davidmrdavid yes (and terminating sub-orchestrations). One thing we hadn't talked about yet was a potential programming model for orchestrations to do this, which I think is what this issue proposes.

/cc @jviau

jviau commented 1 year ago

Yeah I would love to see this. I have not done this myself, but I think you can get pretty close to a cooperative cancellation model via orchestration events. As in, send an event which sets a CancellationTokenSource within your orchestration implementation to Canceled. I don't know how this will affect sub orchestrations or activities though. I suppose that is where a framework level support would be important.

I think this would be a great feature to add to DTFx. The main differentiator from Termination here would be this is cooperative cancellation. So an orchestration can elect to perform cleanup work. Or if the orchestration is past some point of no return, it can ignore the cancellation and continue.

JohnWFlaherty commented 5 months ago

I came here looking for how I was expected to implement graceful shutdown in my orchestrations and activities. I have a query that takes 17 minutes and, although SIGKILL isn't the end of the world, in general, I would like to have some notice/control of shutdown when the container orchestrator decides to scale in.

Now, I've only spent an hour looking through the code so this will be rough.

TaskHubWorker already supports graceful shutdown and it's propagated down into the dispatchers. Each layer in this propagation has a CancellationTokenSource specifically for shutdown. I see two options for implementing this that won't create a breaking change:

  1. propagate the design into the orchestrations/activities. That is, add StopAsync(bool isForced) to the contracts.
  2. Pass the dispatcher's shutdownCancellationTokenSource.Token to Func<T, Task> processWorkItem and add to the TaskContext/OrchestrationContext.

The first is the most work and requires some thought as it appears the dispatcher just fires and forgets. I followed the second approach for activities and it looks like relatively easy change. For orchestrations it looks a little more complicated as the OrchestrationContext is created in the constructor of TaskOrchestrationExecuter. I followed that up a bit but didn't get to the point where this constructor and processWorkItem converge.

jviau commented 5 months ago

@JohnWFlaherty TaskHubWorker shutdown / cancellation is not related to this feature. All active work items on a worker during shutdown are expected to resume on another instance, so we do not want to be propagating the cancellation token from the dispatcher to work items. Passing the shutdown token would undo the durability of these work items.

I see the original issue post asks for that, but it is not what we would go with in the end. The design would be a cancellation token available in an orchestration but triggering that cancellation would be done by explicitly sending a cancel message to the orchestration.