Open thomasrosdahl opened 1 year ago
Hi @thomasrosdahl, thanks for reaching out.
We'll need a bit more information to be able to debug this. One question that comes to mind is: does your affected orchestrator have any pending sub-tasks (sub-orchestrators or Activities) that still need to complete by the time your orchestrator reaches its return
statement? If so, that could explain why you see the orchestrator remain as "Running" - we only reach the "Completed" state when all sub-tasks have completed as well.
Additionally, does this occur locally or only on Azure? If it's only on Azure, could you please provide us with your orchestrator's instanceID? Thanks!
Hi @davidmrdavid,
The orchestration has one activity: CreateTenantDashboardDataSet
. Looking at the attached screen dump from the history table, it looks like it also completed successfully (row 6)? We've only observed it running in Azure and it's not very frequent. However when it does happen it requires manual intervention.
Here's the code for our orchestrator function:
`
[FunctionName(nameof(BuildTenantDashboard))]
[Disable("DisableDashboardBuilder")]
public async Task BuildTenantDashboard(
[OrchestrationTrigger] IDurableOrchestrationContext context)
{
var retryOptions = new RetryOptions(TimeSpan.FromMinutes(1), 10)
{
BackoffCoefficient = 2,
MaxRetryInterval = TimeSpan.FromMinutes(10)
};
var tenantId = context.GetInput<string>();
await context.CallActivityWithRetryAsync(nameof(CreateTenantDashboardDataSet), retryOptions, tenantId);
}
[FunctionName(nameof(CreateTenantDashboardDataSet))]
[Disable("DisableDashboardBuilder")]
public async Task CreateTenantDashboardDataSet(
[ActivityTrigger] string tenantId,
[Table("DashboardData", Connection = "StorageConnection")] CloudTable table,
[Blob("operations", Connection = "StorageConnection")] CloudBlobContainer blobContainer)
{
await _dashboardBuilderService.BuildAsync(tenantId, table, blobContainer);
}
`
Do you have an email where I can send the orchestration ID?
Thanks!
@thomasrosdahl - you can reach me at
@davidmrdavid, you've got mail sir!
hey @thomasrosdahl, I'm just posting here for visibility that we've been discussing this issue directly via email. Did you get a chance to consider Netherite or MSSQL as alternative backends to circumvent this issue?
@davidmrdavid Not yet. It would introduce additional moving parts for us which we'd prefer to avoid if possible. Any ETA on the fix for the Azure Storage backend?
Thanks!
The root problem will take time to fix, but we're discussing a few tactical fixes that could be executed faster. I can't provide a concrete ETA just yet, but I plan to link a PR here once we have a prototype fix.
I'll aim keep this thread posted as updates come.
Description
Orchestration stuck in Running state even though execution completed successfully according to History table.
Expected behavior
Orchestration should transition to Completed state after successful execution.
Actual behavior
Orchestration is stuck in Running state and no way to recover without manually deleting the instance from the "Instances" table.
TerminateAsync
andRestartAsync
do not work.Relevant source code snippets
Known workarounds
Manually deleting the instance record fromt the "Instances" table.
App Details
Screenshots
If deployed to Azure