Do not let API call timeout if workflow can't be locked

dhiaayachi / temporal

Temporal service

https://docs.temporal.io

MIT License

0 stars 0 forks source link

Do not let API call timeout if workflow can't be locked #383

Open dhiaayachi opened 1 month ago

dhiaayachi commented 1 month ago

Currently if entire context timeout will be used to lock the workflow to perform operations. However, if workflow is super busy and workflow can't be locked within the given context timeout, caller side will see a context deadline exceeded error and has no clue why the API call times out.

We should return early with a special error type (or maybe just resource exhausted with workflow busy cause?) if workflow can not be locked.

Then the user latency calculation can also function properly across API calls.

dhiaayachi commented 4 weeks ago

Thank you for reporting this issue!

We understand the concern about the lack of clarity when a workflow fails to lock due to being too busy. Returning a more specific error code like "Resource Exhausted" with a "workflow busy" cause would be helpful for users to identify and troubleshoot the problem.

Currently, we don't have a specific error type for this scenario. However, you could potentially implement a workaround by introducing a custom error type within your workflow code. This error type could be raised if locking fails due to context timeout and would provide more specific information to the caller.

We will consider adding a dedicated error type for this scenario in future releases.

dhiaayachi commented 4 weeks ago

Thank you for reporting this issue! This is definitely something we want to address.

We understand the current behavior can make it challenging to troubleshoot workflow lock timeouts. We are actively exploring options to improve the error messaging in this scenario.

In the meantime, you can try these workarounds:

Increase the ContextTimeout: You can try increasing the ContextTimeout to allow more time for the workflow to lock. However, this might not be a suitable solution for all cases.
Use retries with exponential backoff: Retry the workflow operation with increasing delays. This approach can help prevent flooding the system with requests when the workflow is heavily loaded.

We'll keep you updated on the progress of addressing this issue.