flowable / flowable-engine

A compact and highly efficient workflow and Business Process Management (BPM) platform for developers, system admins and business users.
https://www.flowable.org
Apache License 2.0
7.82k stars 2.59k forks source link

Different behaviour between parallel and inclusive gateways #1741

Open wberges opened 5 years ago

wberges commented 5 years ago

Hello,

It seems that Parallel and Inclusive Gateways don't have the same behaviour (in the same configuration).

Tests done on processes with 3 parallel branches (1 branch with a catching message event): I have used 2 test sets: 1 using Parallel gateways, 1 using Inclusive gateways. And the Inclusive gateway use always the 3 branches (should so be equivalent to the parallel gateway, without default branch for this test).

Test set 1: Parallel gateways image

Test set 2: Inclusive gateways image

For each set, different settings on the Join gateway (4):

What is strange to me is that we do not get the same result with parallel and inclusive gateways. With Parallel gateway, we have 2 scenarios without OptimisticLock exception. But with Inclusive gateway, all scenarios generate an OptimisticLock exception!.. And it’s really problematic (to me at least :)).

For now, I see 2 "workarounds" to avoid OptimisticLockExceptions:

Here's the post in the forum: https://forum.flowable.org/t/different-behaviour-between-parallel-and-inclusive-gateways-tests-using-async-and-or-exclusive-flags/3848

I join the project I used for my tests (includes 8 BPMN & Java Test file): UnitTesting-ParallelGateway.zip

Best Regards William

tijsrademakers commented 5 years ago

Hi William,

Thanks a lot for the detailed description. We will look into this asap and come back with our findings.

Best regards,

Tijs

wberges commented 5 years ago

Hello,

No news about this strange behaviour?

Here's a log (generated by running my 5 branches test file): TEST_JUNIT_INCLUSIVE_ASYNC_EXCLUSIVE 5 branches.log

Thanks Best Regards William

tijsrademakers commented 5 years ago

Hi William,

No not yet sorry, it's still high on the todo list, so will be picked up soon.

Thanks,

Tijs

tijsrademakers commented 5 years ago

Hi William,

We looked into this and the the problem is that the inclusive and parallel gateway behaviours lock the process execution entity as well, separate from the locking that happens in the async and exclusive job handling. Because some async jobs are set to exclusive = false, parallel execution is happening and there's a big chance of collision. The same logic is happening for the parallel gateway, but the logic that is implemented in the parallel gateway is a lot less complex and therefore the time that a collision can happen is a lot less.

We've been discussing, that a separate job lock table might fix this issue, but that will need some more thinking and experimenting.

In the end, there is an optimistic lock exception but the process in the end still finishes correctly. The job is just executed more than once. Is this causing issues on your end?

Best regards,

Tijs

wberges commented 5 years ago

Hello Tijs, First of all, thanks to your team for the bug analysis 😁 About the issue, it is a problem when it is the external event thread which is rejected due to an optimistic lock. In this case, if the external system doesn't send again the event, it is lost, and it's unfortunately my case. I can try to add an asynchronous task between the event and the gateway (equivalent to an Asynchronous After, which doesn't exist in Flowable, on the Receive Event) to force a potential retry by the engine. But it is a workaround (not tested, so not sure it will work), not a solution. Do you think that using a triggerable task in the event branch (in place of both send and receive tasks) would solve the problem? Best regards and thanks again for your wonderful work on Flowable. William

wberges commented 4 years ago

Hello, Still no solution for this bug concerning the parallel execution with optimistic lock exception? What is strange to me is that it seems I'm the only one to raise this problem when (real) parallel executions with events received from external systems should be used often... Currently, the parallel gateway is useless if I have to use the same thread for all branches. Thanks for your feedback. Best regards

cdeneux commented 4 years ago

Hi all, As proposed by @wberges, I have checked that this problem can be worked around replacing all steps of each branch by a dedicated call activity. The process definition is now as: with-callactivity where the process definitions are:

To identify easily the sub-process started by the call activity and waiting the event, a unique business key can be used. Caution to propagate it correctly on call activity.

I join the initial test project updated with tests about this workaround: UnitTesting-ParallelGateway.zip

This workaround works fine with Flowable 6.3.1 but a new concurrency error occurs with Flowable 6.4.2 for tests TEST_JUNIT_INCLUSIVE_ASYNC_CALLACTIVITY and TEST_JUNIT_INCLUSIVE_ASYNC_EXCLUSIVE_CALLACTIVITY:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@ TEST_JUNIT_INCLUSIVE_ASYNC_CALLACTIVITY @@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Tue Dec 10 09:22:51 CET 2019 ASYNC Step11 - before catch event
Tue Dec 10 09:22:51 CET 2019 ASYNC Step2 - execution before sleeping 3s
Tue Dec 10 09:22:51 CET 2019 ASYNC Step3 - execution before sleeping 10s
Tue Dec 10 09:22:52 CET 2019 EVENT Step12 - ##### MESSAGE RECEIVED #####
Tue Dec 10 09:22:54 CET 2019 ASYNC Step2 - execution after sleep
Tue Dec 10 09:23:01 CET 2019 ASYNC Step3 - execution after sleep
09:23:01,954 [flowable-async-job-executor-thread-2] ERROR org.flowable.common.engine.impl.interceptor.CommandContext  - Error while closing command context
java.util.ConcurrentModificationException
    at java.util.HashMap$HashIterator.nextNode(HashMap.java:1445)
    at java.util.HashMap$ValueIterator.next(HashMap.java:1474)
    at org.flowable.engine.impl.agenda.ExecuteInactiveBehaviorsOperation.run(ExecuteInactiveBehaviorsOperation.java:58)
    at org.flowable.engine.impl.interceptor.CommandInvoker.executeOperation(CommandInvoker.java:88)
    at org.flowable.engine.impl.interceptor.CommandInvoker.executeOperations(CommandInvoker.java:72)
    at org.flowable.engine.impl.interceptor.CommandInvoker.execute(CommandInvoker.java:62)
    at org.flowable.engine.impl.interceptor.BpmnOverrideContextInterceptor.execute(BpmnOverrideContextInterceptor.java:25)
    at org.flowable.common.engine.impl.interceptor.TransactionContextInterceptor.execute(TransactionContextInterceptor.java:53)
    at org.flowable.common.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:72)
    at org.flowable.common.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:30)
    at org.flowable.common.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:56)
    at org.flowable.common.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:51)
    at org.flowable.job.service.impl.asyncexecutor.ExecuteAsyncRunnable.executeJob(ExecuteAsyncRunnable.java:128)
    at org.flowable.job.service.impl.asyncexecutor.ExecuteAsyncRunnable.run(ExecuteAsyncRunnable.java:116)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
09:23:01,958 [flowable-async-job-executor-thread-2] ERROR org.flowable.job.service.impl.asyncexecutor.DefaultAsyncRunnableExecutionExceptionHandler  - Job 304 failed

A HashMap seems to be used in a concurrent context. @tijsrademakers , should this map not be created as ConcurrentHashMap in org.flowable.engine.impl.util.CommandContextUtil#addInvolvedExecution(...) ?

Regards

tijsrademakers commented 4 years ago

Hi Christoph,

Thanks for the detailed analysis, we'll check the HashMap and see if it should be changed to a ConcurrentHashMap.

Thanks

realitix commented 4 years ago

Hello @tijsrademakers, do you know when you will be able to check this ?

Thanks

tijsrademakers commented 4 years ago

Hi,

Can't promise anything yet, but hopefully within the next couple of days

tijsrademakers commented 4 years ago

Hi Christoph,

Thanks for providing the test project, that made it really easy to reproduce the issue. We have applied a fix for the concurrent modification issue:

https://github.com/flowable/flowable-engine/commit/eb650424cec2a018b9538856986aa857ca0af01c

Let us know if you encounter any issues.

cdeneux commented 4 years ago

Hi @tijsrademakers, Thanks for the fix. I applied it on version 6.4.2. It solved the concurrency issue.

wberges commented 4 years ago

Hello, Glad to see that this workaround can now be used without this new hashmap bug. :) But the question remains the same: there's still no true solution allowing to execute (real) parallel branches without optimistic lock exception (and a systematic replay of the whole branch, with all problems it involves...)? And strange that I'm again alone raising this problem when (real, not simulated using the same thread) parallel executions should be part of the basic Flowable features... Thanks for your work in all cases :) Regards

wberges commented 4 years ago

Hi, I have a question about such behavior: if we set the "join" gateway (final <+> of the parallel gateway) as "Async" and "Exclusive", is it only the "join" job which will be replayed in case of Optimistic Lock exception? If it is the case, it's less problematic because we don't replay a task (including action/event), but just the final gateway (storage). Thanks for your help. Regards

wberges commented 4 years ago

BTW, there's a bug at least in the 6.4.2 modeler: when I set both Async and Exclusive flags to the join parallel gateway, then export/import the BPMN, both flags are lost...

wberges commented 1 year ago

Hello, I come back on this problem because we still have it. The new example is the following: image I have an inclusive gateway with several branches activated. Each one doesn't finish in a join inclusive gateway (that I could set as ASYNC and EXCLUSIVE to avoid the optimistic locking), but on an End event. And in this case, I retrieve my problem of locking. And there's no way to define a synchronization between branches except by putting an artificial inclusive join GTW to be able to set it as ASYNC and EXCLUSIVE. But for the design of the workflow, it's a pity... Do you have some tips to avoid this workaround? Certainly linked to this issue: https://github.com/flowable/flowable-engine/issues/3577 Thanks a lot again for your job :) Bets Regards

PinoEire commented 1 year ago

Hi there. I’m looking at Flowable reliability and support to evaluate its implementation in a critical system.

I’m worried about finding bugs like this one that have no answers and is still open after a long time. Is there a rationale behind this?

wberges commented 1 month ago

Hello, No news about this issue? Thanks