danielgerlag / workflow-core

Lightweight workflow engine for .NET Standard
MIT License
5.4k stars 1.2k forks source link

Performance issues with Sql Server Persistence Provider #837

Open laffebaco opened 3 years ago

laffebaco commented 3 years ago

I noticed that the performance difference between the builtin memory persistence provider and the SQL Server Persistence provider increases when there are many steps to be executed and/or the data to persist for each step is not small. On top of that, the performance of the SQL Server Persistence provider seems to degrade as with each executed step within a workflow there are more and more rows queried from the database. It seems that each record contains json data to be deserialized, which costs resources as well.

As it seems that especially the read actions are resource consuming, I was thinking that a hybrid scenario which uses both a memory and SQL Server persistence provider could increase the performance (a lot). The memory persistence provider is used for the regular read and write actions, and the SQL Server persistence provider only writes data. The data in the database is used to initialize the memory persistence provider in case the workflow engine is restarted (for example when the server goes down).

What are your thoughts about this?

VKAlwaysWin commented 3 years ago

Hi @laffebaco you may want to use Workflow Purger which will clean SQL Server Workflows.

Will that work for you ? In case if do cleanup every hour or so Performance shouldn't be affected really much.

laffebaco commented 3 years ago

Hi @VKAlwaysWin, thanks for your suggestion.

However, this will probably not work in our case, as we need the workflow information in the database for a "management dashboard" we intent to build to see which workflows are running, have failed, etc.

vladimir-kovalyuk commented 3 years ago

@danielgerlag we experienced slow down in scenarios where workflow traversed child records for parent record in ForEeach loop. Just 300 records!!!!! The root of the problem turned to be WFC created another ExecutionPointer for every step it executed. Taking into account that persistence provider includes ALL related ExecutionPointer records while reading Workflow record despite their status it worsen situation and makes scenarios like handling 1 million records in chunks by 100 work days and not bearable from performance perspective which becomes showstopper for us. I don't see the need to read, serialize and write back all the ExecutionPointers that have already reached terminal status. I think the implementation of persistence providers should be altered a bit to skip terminal pointers. To improve overall performance of the framework I think ExecutionPointer should work similarly to CPU instruction pointers - it should just point to the step it currently executes. If we need to write audit - that should be a separate entity. If we do smth in parallel then the number of pointers increases. Join() would remove pointers.

LingDian2019 commented 6 months ago

Hi @laffebaco you may want to use Workflow Purger which will clean SQL Server Workflows.

Will that work for you ? In case if do cleanup every hour or so Performance shouldn't be affected really much.

Conducted relevant stress tests, but the performance was very poor Please refer to:https://github.com/danielgerlag/workflow-core/issues/1028