Azure / azure-functions-durable-js

JavaScript library for using the Durable Functions bindings
https://www.npmjs.com/package/durable-functions
MIT License
126 stars 46 forks source link

Orchestrations are failing to find the orchestration but its listed in the known orchestrators #563

Closed teresahoes closed 6 months ago

teresahoes commented 6 months ago

Describe the bug Since last Thursday in our production Azure Function environment we have a percentage of orchestrations failing with an error of unable to find a specific orchestrator when it is listed in the known orchestrations. It seems like something might have changed in one of the instances in our environment but I have no idea what might trigger this error or how to further troubleshoot.

Investigative information

If deployed to Azure App Service

If you don't want to share your Function App name or Functions names on GitHub, please be sure to provide your Invocation ID, Timestamp, and Region - we can use this to look up your Function App/Function. Provide an invocation id per Function. See the Functions Host wiki for more details.

To Reproduce Steps to reproduce the behavior:

Randomly happening - seems to be affecting one or more instances but not all

While not required, providing your orchestrator's source code in anonymized form is often very helpful when investigating unexpected orchestrator behavior.

Expected behavior Orchestration would process as normal

Actual behavior A clear and concise description of what actually happened.

Screenshots Here is an example error

"orchestrationStatus": { "name": "orchestration_process_form", "instanceId": "b40d4329-578f-4337-9d42-e307f2f18039", "runtimeStatus": "Failed", "customStatus": null, "output": "Orchestrator function 'orchestration_process_form' failed: The function 'orchestration_process_form' doesn't exist, is disabled, or is not an orchestrator function. Additional info: The following are the known orchestrator functions: 'orchestration_monitor_form', 'orchestration_retry_test', 'orchestration_process_form'.", "createdTime": "2023-12-12T12:38:28Z", "lastUpdatedTime": "2023-12-12T12:38:58Z", "historyEvents": [ { "EventType": "ExecutionStarted", "Input": "" "Correlation": null, "ParentTraceContext": null, "ScheduledStartTime": null, "Generation": 0, "Timestamp": "2023-12-12T12:38:28.5641747Z", "FunctionName": "orchestration_process_form" }, { "EventType": "ExecutionCompleted", "OrchestrationStatus": "Failed", "Result": "Orchestrator function 'orchestration_process_form' failed: The function 'orchestration_process_form' doesn't exist, is disabled, or is not an orchestrator function. Additional info: The following are the known orchestrator functions: 'orchestration_monitor_form', 'orchestration_retry_test', 'orchestration_process_form'.", "FailureDetails": null, "Timestamp": "2023-12-12T12:38:58.1718479Z" } ] }

Known workarounds Just keep retrying to process and eventually it hits one of the orchestration functions that is working. But this is affecting at least 100 function runs a day so it is unteneable to reprocess these manually.

Additional context

teresahoes commented 6 months ago

after reviewing the trace logs for the orchestrations that failed, i found that they were trying to run on the staging slot where everything is disabled, so now the error makes sense. the queue function (which starts the orchestrations) must have been stuck in an 'enabled' state even though it showed disabled, and was picking up records out of the service bus queue and trying to start them on the staging environment. shutting off the staging environment entirely has cleared the issue.