Closed ghost closed 4 years ago
I've found a possible suspect here - I have an orchestration client that runs every minute that starts the FileOrchestrator orchestrator as a singleton process. Given this orchestrator generally restarts (ContinueAsNew
) at the top of the hour (which closely matches up with my issue observation time), the client seems to think that the process is terminated and slip into the execution queue and start a new instance of the same orchestrator. Then when the original instance of the singleton orchestrator restarts, my guess is that these instances both run in tandem, perhaps sharing the same task hub queue items and causing some sort of conflict that destroys the necessary scheduling items.
The workaround I'm implementing is to simply not run the client at the top of the hour:
[TimerTrigger("0 */1 * * * *", RunOnStartup = true)]TimerInfo timer //Old
[TimerTrigger("0 1-59 * * * *", RunOnStartup = true)]TimerInfo timer //New
That being said, if my hypothesis is correct, the Durable Functions SDK should either (a) somehow throw an exception when an orchestration attempts to launch an orchestration instance between two eternal executions, or (b) throw an exception in the orchestration instance after running ContinueAsNew
if another instance with the same name is being executed.
I'll report back after letting this run for a while to see if this does the trick!
Yep, the timer change appeared to mitigate the issue - my functions have been running for a while without failing to restart. Hope this helps with root cause diagnosis!
Actually, scratch my last - the issue returned.
However, I think this was caused by another change I made - upon investigating my eternal function run history, I found the following:
Here's what my history looks like for a function that is running correctly:
And here's what it looks like for a function that is not restarting:
When looking at the relevant snippet of my orchestration code, I believe I found the problem:
List<Task> setTasks = new List<Task>
{
context.CallEntityAsync(entityId, nameof(DispatchInstance.SetFragmentBuffer), lineBuffer),
context.CallEntityAsync(entityId, nameof(DispatchInstance.SetReadLength), filePosition),
};
await Task.WhenAll(setTasks).ConfigureAwait(true);
The .ConfigureAwait(true)
settings were added in order to ensure orchestration runs on a single thread. However, given that multiple entities are being called, there appears to be some sort of race condition that occurs sporadically where returning to the same thread produces deadlock!
This pattern does not seem to deadlock for calling activities, but I suppose other single-threaded contexts such as entities or sub-orchestrations could face this issue?
In the meantime, I've mitigated the issue by calling the two entity updates in sequence.
Following up on this - I've started encountering a similar problem on some other eternal orchestrations - each time, there seems to be an issue specifically with CallEntityAsync
where the durable entity is updated successfully, but the callback to the parent orchestration function never runs.
Could there perhaps be an issue with the task hub queueing mechanism in this particular instance?
After following this up with Azure Support, my issue appears to be related to this one: https://github.com/Azure/azure-functions-durable-extension/issues/1094
I will try upgrading to v2.1 when it is released to see if this resolves my issue.
Description
I have an eternal function that, after about 2.5 hours (roughly two loops of the parent eternal function) of running successfully, does not appear to be restarting on completion. I checked my storage account for the instance status of the hanging thread, and the instance status is currently marked as Running, but there are no matching tasks in the task hub storage queues.
Expected behavior
Eternal function should continue executing when restarting.
Actual behavior
Function stops executing.
Relevant source code snippets
Known workarounds
I'll probably have to create a function that terminates hanging orchestration instances.
App Details
If deployed to Azure