NitorCreations / nflow

Embeddable JVM-based workflow engine with high availability, fault tolerance, and support for multiple databases. Additional libraries are provided for visualization and REST API.
203 stars 39 forks source link

db inserts/update/delete with large number of rows needs to be batched #560

Closed akashkumar809 closed 1 year ago

akashkumar809 commented 2 years ago

issue: In the nflowMaintainance, the rows from nflow_workflow tables are archived in their respective archive tables. We have continuous running workflows which can create millions of rows in action table. And once nflowMaintainance workflow tries to archive these workflows, it can lead to huge amount of insertion or deletion in a single query(we had approx. 400000 rows in action table for single wf). In our Galera cluster we have limited the number of max rows, so that big chunk of update/insert does not choke the db. Hence the db query for nflowMaintainance job fails to execute if the number of rows is huge.

possible suggestion for this fix:

efonsell commented 2 years ago

This is a known issue: https://github.com/NitorCreations/nflow/wiki/FAQ#can-the-amount-of-workflow-actions-and-state-variables-archived-per-batch-be-controlled

The recommended solution is to configure workflow instance history cleanup to reduce the amount of actions that will get archived. Ofc this is not a valid solution if you must keep all actions, but that is usually not the case. The cleanup can be enabled using WorkflowSettings.setHistoryDeletableAfter.

akashkumar809 commented 2 years ago

Thanks for the comments, we have already implemented point 4 Implement a custom workflow that deletes old actions and state variables before archiving as you wish but just to be more sure of this db writeset issue, Is there a plan to introduce the batching of queries with huge row count to be added in nFlow?

gmokki commented 2 years ago

Only the maintenance workflow would have problems with galera. And there only the corner case you managed to hit where your workflow had too many actions compared to the galera limit of changes in a transaction.

Do you think you will need the 100k+ action history on single workflow in the future? If not, then you just manually need to do the cleanup once an then enable the automatic history cleanup on that specific workflow, as @efonsell already explained.

If you do need to keep the full history of a workflow. And it happens too often for manual cleanup. Then we might need to come up with some hack enabled with galera. The problem is that we want the move from main table to archive table to be atomic. But if we move the actions in batches and the node crashes some of the actions are in the archive table without any references to a valid workflow.

To keep the atomicity guarantees we could teach the nflow to detect the too-many-changes in transaction error (at least for galera) and automatically retry with batch size of 1. But that still would not work if single workflow needs that many action history entries. But so far those have been result of misconfiguration.

Do you think you could survive with the configured action history self-cleanup in the future after you have manually removed the excess actions? Would you be ok with a configuration option that does not guarantee atomicity when archiving in smaller batches?

akashkumar809 commented 2 years ago

I agree with your point that atomicity is needed which guarantees there won't be any dangling actions left. hence it was a bad idea to break the actions archiving in batches. we had already implemented a custom workflow for excess actions cleanup. I would probably make that more robust and clean up more unnecessary actions or even workflows, which might even not needed for archiving.

efonsell commented 2 years ago

You can also limit the workflow definition types that are processed via MaintenanceConfiguration. You could for example add another MaintenanceWorkflow instance that deletes certain type of workflows before they would get archived by the default maintenance workflow instance.

gmokki commented 2 years ago

@akashkumar809: Few questions, since you seem to have an interesting use case: 1) do you need all the historical action entries for these longer running workflows? 2) if you do, would you also like to archive all of them or would it be enough to move the latter actions only and delete the older ones?

akashkumar809 commented 2 years ago
  1. No we do not need all the historical actions for these longer running workflows.
  2. yes it would be enough to move the latter actions and delete the older ones.