camunda-community-hub / micronaut-camunda-platform-7

Integration between Micronaut and Camunda (Workflow Engine). We configure Camunda with sensible defaults, so that you can get started with minimum configuration: simply add a dependency in your Micronaut project to embed the workflow engine!
https://camunda.com/blog/2021/07/automate-any-process-on-micronaut/
Apache License 2.0
75 stars 31 forks source link

Regression on fetchAndLock long poll (asyncResponseTimeout waits until timeout) #438

Closed datakurre closed 2 years ago

datakurre commented 2 years ago

The last version where asyncResponseTimeout for REST API external task long poll works properly is 2.7.2

Minimal working demo: demo-2.7.2.zip

Here I start a long poll and then start a new instance of a process and get response immediately after starting the instance, which is much less than 10 seconds of asyncResponseTimeout:

$ date --iso-8601=ns && curl -X POST http://localhost:8080/engine-rest/external-task/fetchAndLock -H "Content-Type: application/json" -d'{"maxTasks": 1, "workerId": "aWorkerId", "asyncResponseTimeout": 10000, "topics": [{"topicName": "Example", "lockDuration": 10000}]}' && printf "\n" && date --iso-8601=ns
2022-10-25T22:21:11,938346745+03:00
[{"activityId":"Activity_1d1ps3k","activityInstanceId":"Activity_1d1ps3k:31db367d-549a-11ed-a352-0242a7641d86","errorMessage":null,"errorDetails":null,"executionId":"31db367c-549a-11ed-a352-0242a7641d86","id":"31dbd2be-549a-11ed-a352-0242a7641d86","lockExpirationTime":"2022-10-25T22:21:25.683+0300","processDefinitionId":"Example:1:31bd0019-549a-11ed-a352-0242a7641d86","processDefinitionKey":"Example","processDefinitionVersionTag":null,"processInstanceId":"31d961ba-549a-11ed-a352-0242a7641d86","retries":null,"suspended":false,"workerId":"aWorkerId","topicName":"Example","tenantId":null,"variables":{},"priority":0,"businessKey":"default","extensionProperties":{}}]
2022-10-25T22:21:15,711268318+03:00

Minimal failing demo: demo-2.10.0.zip

Here I do the same, but I only get the response after asyncResponseTimeout of 10000 has exceeded:

$ date --iso-8601=ns && curl -X POST http://localhost:8080/engine-rest/external-task/fetchAndLock -H "Content-Type: application/json" -d'{"maxTasks": 1, "workerId": "aWorkerId", "asyncResponseTimeout": 10000, "topics": [{"topicName": "Example", "lockDuration": 10000}]}' && printf "\n" && date --iso-8601=ns
2022-10-25T22:22:52,782968742+03:00
[{"activityId":"Activity_1d1ps3k","activityInstanceId":"Activity_1d1ps3k:6df40a98-549a-11ed-86ec-0242a7641d86","errorMessage":null,"errorDetails":null,"executionId":"6df40a97-549a-11ed-86ec-0242a7641d86","id":"6df458b9-549a-11ed-86ec-0242a7641d86","lockExpirationTime":"2022-10-25T22:23:13.015+0300","processDefinitionId":"Example:1:6dd92f94-549a-11ed-86ec-0242a7641d86","processDefinitionKey":"Example","processDefinitionVersionTag":null,"processInstanceId":"6df235d5-549a-11ed-86ec-0242a7641d86","retries":null,"suspended":false,"workerId":"aWorkerId","topicName":"Example","tenantId":null,"variables":{},"priority":0,"businessKey":"default","extensionProperties":{}}]
2022-10-25T22:23:03,038642485+03:00
datakurre commented 2 years ago

This time I remembered to build and run demo projects from JAR files (because asyncResponseTimeout does not work on Maven project otherwise, as discussed in the original asyncResponseTimeout issue long time ago).

tobiasschaefer commented 2 years ago

Hi @datakurre , Just to be sure: release 2.8.0 fails for you too, right?

datakurre commented 2 years ago

@tobiasschaefer Yes. I started from 2.10.0 and tried all versions down to 2.7.2, which worked.

tobiasschaefer commented 2 years ago

The broken version 2.8.0 basically only contains one commit which zu updates Micronaut from 3.4.4 to 3.5.2: https://github.com/camunda-community-hub/micronaut-camunda-platform-7/commit/eb512522aa6fa6736e3e526b314fd85f06d38af2

I'll take a closer look.

datakurre commented 2 years ago

I double checked that the issue existed already on 2.8.0. It does.

datakurre commented 2 years ago

@tobiasschaefer I spent some time with debugger and found the following difference between 2.7.2 and 2.8.0

When a process enters external service task, the following transaction listener is fired in 2.7.2 but no longer in 2.8.0 (the method is called to register the listener, but it never fires), and therefore long poll request does not get the signal about new task and waits until the timeout:

https://github.com/camunda/camunda-bpm-platform/blob/13756c32033cf820ae48553c5116fa16d20c8fd9/engine/src/main/java/org/camunda/bpm/engine/impl/persistence/entity/ExternalTaskManager.java#L179

tobiasschaefer commented 2 years ago

Thanks for that detailed analysis. It must be related to Micronaut's transaction handling which changed in 2.8.0.

I'll have a look.

datakurre commented 2 years ago

🤩

datakurre commented 2 years ago

Thanks to this, I know now much better, how Camunda long polling works. I was not aware that only new tasks are immediately delivered, because of that event being triggered. Tasks with silently expiring lock, are only redelivered after asyncResponseTimeout (or with tasks that fired that event). Therefore, optimal asyncResponseTimeout might be surprisingly short. A compromise between not polling all the time, but poll often enough to also deliver those exceptions in timely manner.

tobiasschaefer commented 2 years ago

Hey @datakurre ,

can you please test release v2.11.1. It fixes the issue for me.

Note to myself:

Process Model: diagram_1.bpmn.txt

datakurre commented 2 years ago

@tobiasschaefer I can confirm the fix. Thanks again!