MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.29k stars 21.47k forks source link

How to wait until orchestrator finishes in service bus triggered activity #109903

Open danyltsiv opened 1 year ago

danyltsiv commented 1 year ago

This page doesn't explain how to wait until orchestrator finishes in service bus triggered activity, neither it says it's impossible.

Document Details

Do not edit this section. It is required for learn.microsoft.com ➟ GitHub issue linking.

Naveenommi-MSFT commented 1 year ago

@danyltsiv Thanks for your feedback! We will investigate and update as appropriate.

cgillum commented 1 year ago

Hi @danyltsiv, there is no mention of Service Bus in this document. Also, it's not recommended to wait for an orchestration to complete as part of a queue-triggered function.

Workflows can run for long periods of time, and you don't want to hold a lock on a queue message while waiting for an orchestration to complete. Otherwise, you run the risk of losing a lock on the queue message and having it refire multiple times. Orchestrations are reliable and are backed by their own queues internally, so you don't need to rely on the trigger queue to ensure that an orchestration runs to completion. Instead, your queue-trigger function should start an orchestration like shown in the examples in this document without waiting for it to complete.

danyltsiv commented 1 year ago

hi @cgillum , thanks for the quick answer. That makes sense. But, then how to handle an exception if it's unhandled by the orchestrator? And by unhandled, I don't mean there is no try-catch block inside orchestrator, there is, but the exception still somehow slips it off.

We had recently a case when Activities threw some low-level network exception and then Orchestrator instantly collapsed as well, I still cannot figure out what those exceptions mean, they appeared for a short period of time, like 1-2 secs and disappeared: Activity threw: Worker encountered event stream error: Error: 14 UNAVAILABLE: read ECONNRESET. then Orchestrator instantly threw: DurableTask.Core.Exceptions.OrchestrationFailureException.

So even though we have try-catch in Orchestrator, it seems like it has collapsed instantly inside the catch, making the execution shut down without proper handling + without a dead letter queued message, as the starter completed it. So the outer process that expects some output from that function is stuck forever in progress.

Is there something out of the box for a Durable JavaScript functions that may help to handle such cases?

and BTW: what could possibly make Orchestrator collapse? There is nothing specific in it except calling activities. I am sure we are following Orchestrator code constraints. And what does Error: 14 UNAVAILABLE: read ECONNRESET could mean here? The only thing that is in common for those things that crashed with this message is Azure Storage used for Event Sourcing, is it possible? (Monitored SA for that period and no interruptions found)

cgillum commented 1 year ago

Thanks @danyltsiv for the additional context. Regarding the orchestration not being able to catch the exception, I'd recommend opening a bug in the Durable Functions JS repo so that the JS team can triage and investigate. The repo is here: https://github.com/Azure/azure-functions-durable-js. It's not expected that exceptions can escape try/catch, or that an orchestration can fail on its own due to internal failures. That specific error sounds like a generic network connectivity error.

Regarding this question:

how to handle an exception if it's unhandled by the orchestrator?

One option is to run your orchestrator as a sub-orchestration of a small parent. When calling the sub-orchestration, you can specify retry options that allow it to be retried multiple times in case of an unexpected failure. The same can be applied to individual activities. More information here: https://learn.microsoft.com/azure/azure-functions/durable/durable-functions-error-handling?tabs=javascript-v4#automatic-retry-on-failure.