Open dhiaayachi opened 2 months ago
Thank you for reporting this issue.
The behavior you're describing, where exiting logic slows down workflow task retries and impacts features like Query, is a known issue and it's being actively worked on. There's no immediate workaround available, and it's recommended to use Temporal's default behavior for retrying workflow tasks until a solution is implemented.
We appreciate your patience and understanding as we work to resolve this.
Thank you for reporting this issue. The behavior you're experiencing with workflow task retries and its impact on features like Query is a known issue. We are actively working on addressing this in future releases.
In the meantime, a potential workaround is to use a custom retry policy with a NextRetryDelay
based on the number of attempts in your Activity
code. This allows you to specify the retry interval dynamically, addressing the issue with the increasing workflow task timeout.
Here's an example of how to implement this:
private const int MaxRetryAttempts = 5;
private const int InitialRetryInterval = 1000;
private const int BackoffCoefficient = 2;
private TimeSpan MaximumRetryInterval = TimeSpan.FromMinutes(5);
public async Task<string> MyActivityAsync()
{
// ... your activity logic ...
int attempt = Activity.GetExecutionContext().GetInfo().Attempt;
if (attempt < MaxRetryAttempts)
{
// Calculate dynamic retry delay
TimeSpan retryDelay = TimeSpan.FromMilliseconds(InitialRetryInterval * Math.Pow(BackoffCoefficient, attempt - 1));
retryDelay = TimeSpan.FromMilliseconds(Math.Min(retryDelay.TotalMilliseconds, MaximumRetryInterval.TotalMilliseconds));
throw new ApplicationFailure("Retry due to error", "RetryError", null, retryDelay);
}
else
{
// Handle the case when max retries are reached
throw new ApplicationFailure("Max retries reached", "MaxRetryError", null, TimeSpan.Zero);
}
}
This approach allows your Activity
to dynamically adjust the retry interval based on the number of attempts, potentially preventing the workflow task timeout from escalating unnecessarily.
We appreciate your feedback and will keep you updated on the progress of this issue.
Is your feature request related to a problem? Please describe. Exiting logic slows down workflow task retry by increasing workflow task start to close timeout (up to 10min) and rely on the fact that SDK won't respond workflow task failure if workflow task has attempt > 1.
However this does't work well with features that need to wait for pending workflow task to complete. One example is Query. If there's a pending workflow, query must wait for it to complete before it can be dispatched. Before that, query is buffered in-memory in workflow mutable state. Now since it can take workflow task a long time to complete, mutable state may get evicted from cache and query API will fail with Unavailable error, or the query API itself can timeout.
Describe the solution you'd like Essentially need a better way for handling workflow task retry
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context Add any other context or screenshots about the feature request here.