apache / incubator-kie-kogito-runtimes

Kogito Runtimes - Kogito is a cloud-native business automation technology for building cloud-ready business applications.
http://kogito.kie.org
Apache License 2.0
491 stars 192 forks source link

serverless workflow - unable to lookup and resume existing process after a restart #3467

Closed deepakkapoor23 closed 2 months ago

deepakkapoor23 commented 2 months ago

Describe the bug

Given a SW with events that define a correlation attribute which is added to cloud event as a custom extension.

from workflow definition: two events with same correlationid "events": [ { "name": "NewApplicantEvent", "source": "", "type": "applicants", "kind": "consumed", "correlation": [ { "contextAttributeName": "correlationid" } ] }, { "name": "ManualConfirmationEvent", "source": "", "type": "confirmations", "kind": "consumed", "correlation": [ { "contextAttributeName": "correlationid" } ] } ]

cloud events:

CloudEvent{id='32f24a1c-9c60-4edc-bb92-9c7abe76d9ab', source=http://localhost:8080, type='applicants', data=JsonCloudEventData{node={"name":"Ricardo","position":"Frontend Developer","office":"Atlanta","salary":300000}}, extensions={correlationid=Ricardo}}

CloudEvent{id='a7f8b789-ef00-489b-b7f5-321b4c437072', source=http://localhost:8080, type='confirmations', data=JsonCloudEventData{node={"name":"Ricardo","confirmation":"approved"}}, extensions={correlationid=Ricardo}}

process document stored in mongodb: { "_id": { "$oid": "6615670752062b46bf0d39fa" }, "processType": "SW", "processId": "applicantworkflow", "id": "15b59c77-00cd-4dc7-8ab2-fa3d61ddf4e5", "description": "Applicant Workflow", "state": 1, "startDate": "1712678663485", "signalCompletion": true, "sla": { "slaCompliance": 0 }, "context": { "variable": [ { "name": "workflowdata", "dataType": "com.fasterxml.jackson.databind.node.ObjectNode", "value": { "@type": "type.googleapis.com/org.kie.kogito.serialization.process.protobuf.JsonNode", "content": "{\r\n \"name\" : \"Ricardo\",\r\n \"position\" : \"Frontend Developer\",\r\n \"office\" : \"Atlanta\",\r\n \"salary\" : 300000,\r\n \"decision\" : \"Approved\"\r\n}" } } ], "nodeInstance": [ { "id": "65b7aeb8-bff4-4734-868e-9cde5e242c11", "nodeId": "11", "content": { "@type": "type.googleapis.com/org.kie.kogito.serialization.process.protobuf.EventNodeInstanceContent" }, "level": 1, "triggerDate": "1712678663524", "sla": { "slaCompliance": 0 } } ], "iterationLevels": [ { "id": "7", "level": 1 }, { "id": "_jbpm-unique-9", "level": 1 } ] }, "completedNodeIds": [ "_jbpm-unique-1", "_jbpm-unique-2", "_jbpm-unique-4", "_jbpm-unique-3", "_jbpm-unique-7", "_jbpm-unique-14" ] }

Expected behavior

Kogito runtime should be able to lookup the existing process using the correlation id on cloud event regardless of whether the runtime is restarted or not.

Actual behavior

Kogito runtime is unable to lookup the existing process instance based on correlation id in the cloud event after the runtime is restarted.

The following message is seen in the logs and the event is skipped.

2024-04-09 12:24:17,643 INFO [org.kie.kog.eve.imp.ProcessEventDispatcher] (vert.x-eventloop-thread-0) No matches found for trigger confirmations in process applicantworkflow. Skipping consumed message CloudEventWrapDataEvent [cloudEvent=CloudEvent{id='a7f8b789-ef00-489b-b7f5-321b4c437072', source=http://localhost:8080, type='confirmations', data=JsonCloudEventData{node={"name":"Ricardo","confirmation":"approved"}}, extensions={correlationid=Ricardo}}]

How to Reproduce?

Steps to reproduce:

  1. Start the kogito runtime
  2. Send applicant event with some name e.g. "Ricardo" (which is used as correlation id). This will start a new workflow instance
  3. Validate db for an active instance record
  4. Send confirmation event with the same name "Ricardo"
  5. Notice that workflow resumes and finishes and the record is db is removed

Now repeat the same steps except with a restart in between.

  1. Start the kogito runtime
  2. Send applicant event with some name e.g. "Ricardo" (which is used as correlation id). This will start a new workflow instance
  3. Validate db for an active instance record
  4. Restart kogito runtime
  5. Send confirmation event with the same name "Ricardo"
  6. Notice that workflow does not resume after the restart and the record is db is still present

Output of uname -a or ver

No response

Output of java -version

java version "17.0.5" 2022-10-18 LTS Java(TM) SE Runtime Environment (build 17.0.5+9-LTS-191) Java HotSpot(TM) 64-Bit Server VM (build 17.0.5+9-LTS-191, mixed mode, sharing)

GraalVM version (if different from Java)

No response

Kogito version or git rev (or at least Quarkus version if you are using Kogito via Quarkus platform BOM)

1.43.0.Final

Build tool (ie. output of mvnw --version or gradlew --version)

Apache Maven 3.9.4 (dfbb324ad4a7c8fb0bf182e6d91b0ae20e3d2dd9) Maven home: C:\Users\dkapoor.m2\wrapper\dists\apache-maven-3.9.4-bin\32a55694\apache-maven-3.9.4 Java version: 17.0.5, vendor: Oracle Corporation, runtime: C:\Build\depot\3rd_Party\jsdk\17.0.5\Windows Default locale: en_US, platform encoding: Cp1252 OS name: "windows 11", version: "10.0", arch: "amd64", family: "windows"

Additional information

Kogito runtime seems to be using some information from memory to lookup existing process instance which is lost after a restart.

Also notice that there is no correlation information in the record saved in mongo db which is strange because it needs correlation id to lookup the record later. I wonder how is it able to look it up without a restart even when there is no correlarion id saved in db explicitly.

When using postgres db, I am able to see correlation id saved in a separate table explicitly. However, it still has the same issue regardless of what db is used - infinispan, mongodb or postgres.

fjtirado commented 2 months ago

Correlation feature is only expected to work persistently on postgresql