Open dhiaayachi opened 3 weeks ago
CompleteByID
requests to fail Activities when a retry is pendingIs your feature request related to a problem? Please describe.
This is a follow-up to this issue regarding handling CompleteByID
requests to fail activities when a retry is pending.
Currently, if an async activity fails with a retryable error and the Temporal server receives a CompleteByID
request to fail the activity before the retry attempt starts, the server may incorrectly handle this request.
This situation can occur when other servers attempt to fail the activity while the retry is pending. The server may not be able to differentiate between a legitimate failure and a transit error, potentially leading to an unexpected activity failure and blocking the workflow.
Describe the solution you'd like
For requests to fail an activity using CompleteByID
:
Describe alternatives you've considered
Introducing a separate API or a flag for clients to indicate a force fail request for an activity is a potential alternative. However, this could lead to API complexity and potential misuse.
Additional context
This enhancement aims to provide more precise control over activity failures when a retry is pending, ensuring that the Temporal server correctly interprets and handles these scenarios. This will improve the robustness and predictability of workflow execution when dealing with asynchronous activities and potential retry situations.
Relevant references:
Thank you for reporting this issue! This feature request is related to a previous issue https://github.com/temporalio/temporal/issues/987.
As a workaround for this issue, you may create a new API to distinguish force fail requests from requests to fail the activity due to non-retryable errors.
Thanks for reporting this issue!
This is a known issue and you can find more information here: https://github.com/temporalio/temporal/pull/5724.
If you have any other questions, please let me know.
Thanks for reporting this issue.
This is a known issue related to the CompleteByID
API and retryable errors. You can find more information in the following documentation:
My understanding here is you are trying to ask for a separate API to handle the case to fail an activity due to an error.
During this async call, the client will pass in any error if they would let the server know during the client call here. Then the request will be converted to the corresponding request to fail the activity (for instance here). In this case, the server would behavior accordingly.
Is your feature request related to a problem? Please describe. This is a follow-up feature of this issue.
For an async call, the activity may fail due to a bug or design drawback after a call to an external server. However, temporal server may receive a request from
CompleteByID
API by other servers while the new attempt for a retryable error of the activity is not started yet. In this case, if the request is to complete the activity, we would complete the activity even the new attempt has not started yet so that we can unblock the workflow (refer this PR). However, if the request is to fail the activity, we may think an appropriate way to handle such cases as we are not sure the failure is to fail the activity or it is a transit error and we want to attempt the activity again.Describe the solution you'd like For the request to fail an activity: 1) If the request is to force fail an activity, we should fail the activity if the attempt for a retryable error has not started yet. 2) If the request is to fail an activity due to a non-retryable error, we should fail the activity.
Describe alternatives you've considered We may introduce a separate API or a flag for a client to tell the server that it would like a request to force fail the activity.
Additional context Add any other context or screenshots about the feature request here.