dhiaayachi / temporal

Temporal service
https://docs.temporal.io
MIT License
0 stars 0 forks source link

Use different task queue for activity and workflow task retries #334

Open dhiaayachi opened 3 weeks ago

dhiaayachi commented 3 weeks ago

Is your feature request related to a problem? Please describe. Some applications require different retry behavior based on error codes. For example (source):

Hi, we are currently trying to design retry policy for our activity. The activity is calling http server and we want to retry based on the http status code.

The default for 5xx, it will be retryable with shorter backoff of e.g. 2 sec as these are mostly intermittent. For 429 (rate limiter) error we want to retry with longer backoff e.g. 1 min. For 400 error, this might happen because a resource in the server are currently being ‘paused’ by human operator for maintenance. This maintenance is expected to be done < 10 mins, after which the resource is available again. We want to auto retry this request with long backoff e.g. 10 mins. Other 4xx will NOT be retryable.

Providing different retry options will partially help, but for rate limit error changing backoff of an individual activity is not going to help as the aggregate rate across all of them can still be high.

Describe the solution you'd like Use a different rate limited (possibly with a dynamic rate limiter) task queue to retry activities in certain scenarios. This way retries of activities that failed with a specific error code would be scheduled in a separate task queue. This allows other activities to execute and retry without being limited.

The queue might not be directly exposed to a user and be an implementation detail.

dhiaayachi commented 1 week ago

Thanks for the feature request!

This is a great idea and would provide a lot of flexibility.

In the meantime, you can create separate activities for each error code and use a Retry Policy to configure different retry behaviors for each activity.

For rate limiting, you can use a different Task Queue for each type of activity and configure rate limits per task queue.

Let us know if you have any other questions or feature requests.

dhiaayachi commented 1 week ago

Thank you for the feature request!

While Temporal currently doesn't provide direct support for different task queues based on error codes, you can achieve a similar outcome with custom logic.

Here's a workaround:

  1. Create a custom activity: This activity would handle the specific error code you want to route to a separate queue.
  2. Implement logic in your Workflow: This logic would check for the specific error code, and if found, call the custom activity.
  3. Register the custom activity with a separate Worker: This Worker would poll a different task queue designed for retrying activities that encountered the specific error code.

This approach lets you control the retry behavior for different error types and ensures that other activities are not impacted.

We're open to considering your feature request for a future release.

dhiaayachi commented 1 week ago

Thanks for the feature request! It's a great idea to have different retry behavior based on error codes.

Here's a way to work around this feature:

  1. You can use the existing Retry Policy feature to retry specific error codes with longer backoff, even though you can't use different task queues.
  2. You could also implement a custom error handler within your Activity code that analyzes the error code and retries the activity with a specific backoff duration, or even switch to a different task queue.

This approach would require a more custom implementation within your Activity code, but it could provide the desired flexibility.