Closed kanupriya15025 closed 1 year ago
Hi @kanupriya15025. I took a look at the telemetry we have for Kanupriya_Schedule_Load_30138_11:07:2022:09:30:30
(thanks for providing these details) and I was able to confirm that there was about an 18-minute gap between the time you started this workflow and the time it actually started and transitioned into the Running status.
The thing I noticed was that it seems to be a significant delay for all messaging events that go through your control queues. The work-item queue which is responsible for triggering activity functions has no delays. This makes me feel like the problem has something to do with how your app processes orchestrations. I'm also seeing a significant delay between the time our extension starts running an orchestration and the time the orchestration completes. This should also be close to zero when the system is healthy (assuming your orchestrator is following all expected rules about not doing I/O, etc.).
You mentioned that this is a Python app. Which version of Python are you using? My best guess at this point is that you're using one of the older versions, which has very strict concurrency limits enabled. This is the most common cause for the exact behavior you're seeing. More information on this can be found here.
If after researching your concurrency settings you're still not able to resolve this issue, I suggest opening a support ticket with Azure Functions. An engineer should be able to dig further into your case to understand what's happening.
Thanks @cgillum. It was my bad that the configuration for autoscaling was disabled on my app. After enabling that issue went away. However, after that there's another issue that started propagating.
Now some of the tasks are stuck in running state in storage table even though they are complete. Attaching a screenshot for the same.
Thanks for confirming, @kanupriya15025.
The new issue you're describing is commonly seen when tasks are started in an orchestrator function but not completed. For example, if you have a durable timer, an activity, or a sub-orchestrator task that was started and not yield
ed (or canceled, int he case of durable timers), then the orchestration will remain in the Running
state until those tasks have completed in the background. We mention this in the context of timers in our documentation here.
Could this be the cause of the issue you're seeing?
@cgillum In my case, we are using "yield" before calling all timers.
eg :
yield context.create_timer(next_wait_time)
Is that correct?
@kanupriya15025: That yield
seems fine. However, are you cancelling those timers in case they don't fire prior to the orchestrator returning?
Closing due to inactivity
Description
We are running some load tests on our function app which requires ~ 1000 workflows running. I noticed that for quite a significant no. of orchestrators, the storage table shows "PENDING" for a long time 30-40mins. However, when searching for logs, it shows that the orchestrator instance completed successfully. But, this execution was also delayed.
Expected behavior
App Details
azure-functions==1.11.2 azure-functions-durable==1.1.6
Screenshots
If deployed to Azure