Open qazq opened 2 years ago
Hi! Me and my team are experiencing similar issues, where the execution becomes slower and slower as the number of executionPointers increase. In my application each evaluation, shown on the x-axis, adds another 10 or so execution pointers:
As you can see, the application steadily slows down as the number of execution pointers increase.
Update: I changed my persistence provder to remove "Cancelled" and "Completed" steps. After doing this, continuing the test above showed better behavior:
Changes:
public async Task PersistWorkflow(WorkflowInstance workflow, CancellationToken cancellationToken = default)
{
// Start of change
var cancelledPointers = workflow.ExecutionPointers.FindByStatus(PointerStatus.Cancelled);
var completedPointers = workflow.ExecutionPointers.FindByStatus(PointerStatus.Complete);
foreach (var pointer in cancelledPointers)
workflow.ExecutionPointers.Remove(pointer);
foreach (var pointer in completedPointers)
workflow.ExecutionPointers.Remove(pointer);
// End of change
await WorkflowInstances.ReplaceOneAsync(x => x.Id == workflow.Id, workflow, cancellationToken: cancellationToken);
}
@dthemg This issue is well known. I commented in a couple of earlier discussions. Retrieving active EPs only is a way around the problem with iterative loops. But what about parallel loops?
I have also encountered this problem, the jump between nodes is getting slower and slower. After restarting the program, it works. How can I solve the problem?
Stress test the same process as follows
There are three nodes in the process, the second of which pauses for 3 seconds and then throws an exception
public class ConveyorBeginJob : WorkflowStepBody
{
public override async Task<ExecutionResult> RunAsync(IStepExecutionContext context)
{
PassingData.ExecutionStartTime = DateTime.Now;
return await Task.FromResult(ExecutionResult.Next());
}
}
public class ThrowExceptionJobV1 : WorkflowStepBody
{
public override async Task<ExecutionResult> RunAsync(IStepExecutionContext context)
{
try
{
await Task.Delay(TimeSpan.FromSeconds(3));
throw new Exception("My Exception");
}
catch (Exception)
{
throw;
}
return await Task.FromResult(ExecutionResult.Next());
}
}
public class ConveyorEndJob : WorkflowStepBody
{
public override async Task<ExecutionResult> RunAsync(IStepExecutionContext context)
{
PassingData.ExecutionEndTime = DateTime.Now;
return await Task.FromResult(ExecutionResult.Next());
}
}
public class WorkflowErrorCommonHandler : IWorkflowErrorHandler
{
public WorkflowErrorHandling Type => WorkflowErrorHandling.Terminate;
public void Handle(WorkflowInstance workflow, WorkflowDefinition def, ExecutionPointer pointer, WorkflowStep step, Exception exception, Queue<ExecutionPointer> bubbleUpQueue)
{
try
{
var workflowStepPassingData = (WorkflowStepPassingData)workflow.Data;
workflowStepPassingData.EndTime = DateTime.Now;
foreach (var stepItem in workflow.ExecutionPointers)
{
if (!string.IsNullOrWhiteSpace(stepItem.StepName) && stepItem.StartTime != null)
{
long stepTotalMilliseconds = -1;
if (stepItem.EndTime != null)
{
stepTotalMilliseconds = stepItem.EndTime.ToTotalMilliseconds(stepItem.StartTime);
}
var jobElapsedTimeName = $"({stepItem.StepName})-({stepItem.Status})-({stepItem.StartTime?.ToLocalTime().ToString("yyyy-MM-dd HH:mm:ss.fff")})-({stepItem.EndTime?.ToLocalTime().ToString("yyyy-MM-dd HH:mm:ss.fff")})";
workflowStepPassingData.JobElapsedTime.TryAdd(jobElapsedTimeName, stepTotalMilliseconds);
}
}
//Logging workflow
}
catch (Exception ex)
{
_log4NetService.WriteLog(LogNameKey.SystemError, ex);
}
}
}
At 500 concurrency, process execution takes a long time 1、The conclusion is that the node jump takes a long time 2、It takes a long time to enter the IWorkflowErrorHandler
{
"id": "7199437576333133140",
"createTime": "2024-05-23 15:54:37.279",
"executionStartTime": "2024-05-23 15:54:44.796",
"executionEndTime": null,
"endTime": "2024-05-23 15:55:36.984",
"jobElapsedTime": {
"(ConveyorBeginJob)-(Complete)-(2024-05-23 15:54:44.796)-(2024-05-23 15:54:44.796)": "0",
"(ThrowExceptionJobV1)-(Failed)-(2024-05-23 15:54:47.174)-()": "-1"
},
"condition": {},
"data": {}
}
{
"id": "7199459854026519216",
"equipmentCode": "1112",
"createTime": "2024-05-23 17:23:08.695",
"executionStartTime": "2024-05-23 17:23:41.791",
"executionEndTime": null,
"endTime": "2024-05-23 17:26:25.098",
"jobElapsedTime": {
"(ConveyorBeginJob)-(Complete)-(2024-05-23 17:23:41.791)-(2024-05-23 17:23:41.791)": "0",
"(ThrowExceptionJobV1)-(Failed)-(2024-05-23 17:24:37.686)-()": "-1"
},
"condition": {},
"data": {}
}
I don't have persistence needs, can I use in-memory mode to clear terminated state workflows to prevent memory leaks and improve performance
Hi,
I use workflow-core to control the manufacturing process. One product might spend 1-2 months so that the number of execution points goes to almost 3000. In this case, executing one step will take a lot of time (1-2 seconds). I use sample.10 to reproduce this issue.
add debug message
EntityFrameworkPersistenceProvider
WorkflowConsumer
ProcessItem t1 is time to get the workflow instance from the database. It spend a lot of time because there are so many execution points.
ProcessItem t3 is time to persist the workflow. The detailed time span is shown in the
PersistWorkflow p1~p4
. p2 is time to query workflow from the database, p3 is ToPersistable(), p4 is SaveChangesAsync()There are 1595 execution points in the database, and the number of children is large (string length is 29489).
Maybe we can optimize
Children
growth. (I'm not sure about theChildren
purpose, but it looks like can useScope
instead of it?)Thanks!