camunda / camunda-bpm-platform

Flexible framework for workflow and decision automation with BPMN and DMN. Integration with Quarkus, Spring, Spring Boot, CDI.
https://camunda.com/
Apache License 2.0
4.03k stars 1.53k forks source link

Process variables turn unexpectedly null when jobs are executed in parallel #4324

Closed tasso94 closed 2 weeks ago

tasso94 commented 2 months ago

Environment (Required on creation)

Most likely, all Camunda versions are affected.

Description (Required on creation; please attach any relevant screenshots, stacktraces, log files, etc. to the ticket)

Process variables turn unexpectedly null when jobs are executed in parallel.

Steps to reproduce (Required on creation)

  1. Deploy a process with a flow node (script/service task) that is asyncBefore, which first reads process variable A and second updates process variable A.
  2. State
    • There is a process variable A of type Object.
    • The flow node has two tokens within the same process instance during runtime.
  3. The job executor executes both resulting jobs in parallel.

Observed Behavior (Required on creation)

Process variables turn null.

Expected behavior (Required on creation)

Process variables don't turn null.

Root Cause (Required on prioritization)

  1. T1 executes job 1 that starts executing token 1.
  2. T2 executes job 2 that starts executing token 2, and selects the process variable A, including the referenced byte array row from the database.
  3. T1 continues and updates the process variable A as defined in the script/service task.
    • Since the variable is of type Object, the value is stored in the byte array table, and the row in the runtime variable table references the row in the byte array table.
    • Updating the variable value leads to deleting the byte array row, inserting a new one, and updating the runtime variable row with the new reference.
  4. T1 is done and commits all modifications to the database.
  5. T2 continues and selects the byte array row from the previously selected process variable A. - happens during SetVariable.
    • The process variable A was selected before the byte array reference was updated => T2 still uses the state of the process variable A with the outdated byte array reference.
    • The variable A returns null since it doesn't exist anymore.

Solution Ideas

Reuse the byte array row by updating it. Like this, the variable value cannot turn null.

Hints

See the prototypical implementation that was validated with the customer: https://github.com/camunda/camunda-bpm-platform/pull/4313

Links

Breakdown

### camunda-bpm-platform PR
- [ ] https://github.com/camunda/camunda-bpm-platform/pull/4433
### Backport PR
- [ ] https://github.com/camunda/camunda-bpm-platform-maintenance/pull/1240
- [ ] https://github.com/camunda/camunda-bpm-platform-maintenance/pull/1241
- [ ] https://github.com/camunda/camunda-bpm-platform-maintenance/pull/1242
- [ ] https://github.com/camunda/camunda-bpm-platform-maintenance/pull/1247

Dev2QA handover

psavidis commented 1 month ago

Unit Test Investigation

Problem: Currently the reproduction of the issue using a unit test is not possible.

Step 5 - Using the RuntimeService to read the variable and RuntimeService#SetVariable does not reproduce the issue at step 5.

psavidis commented 1 month ago

Ιnvestigation Update

If the conditions are not identified, it doesn't give a lot of value to add a Unit Test that reproduces the example with arbitrary steps or private calls.

Next steps:

psavidis commented 1 month ago

Investigate Update

psavidis commented 1 month ago

Decision

The inability to reproduce the problem with a unit test it accepted. Moving on with backporting of the fix.

psavidis commented 4 weeks ago

Investigation Update

Context: Not Delete the ByteArray upon Update Introduction: The VariableInstanceEntity can store the variable values either:

Observation: In the scenario where the type changes, e.g from long to object:

Note: The public API correctly returns the correct type and value.

Potential Problem: The above raises concerns of potential side-effects introduced.

Solution Ideas: A mechanism to delete the byte-array on type change is needed

Kudos to @yanavasileva for coming up with this scenario.

psavidis commented 4 weeks ago

Hey @tasso94,

After the latest update, i went ahead and added unit-tests for ensuring the right state population of the variable on type change (see the PR notes for the reviewer).

After the PR is finalised and approved, i'll proceed with back-porting. Will assign the issue to you once the CI passes against theci:all-db profile.

psavidis commented 4 weeks ago

All pipelines passed against the ci:all-db profile.

Assigning to @tasso94 for Review.

tasso94 commented 3 weeks ago

Hi @psavidis, can you please assign this review to somebody else on the team? Let's try to spread the knowledge among engineers.

psavidis commented 3 weeks ago

Assigning to @yanavasileva for the Review

psavidis commented 3 weeks ago

Assigning for a final review round of the backports

yanavasileva commented 2 weeks ago

❗ backport PRs don't have the latest commits from master.

psavidis commented 2 weeks ago

❗ backport PRs don't have the latest commits from master.

They were outdated after the last changes to avoid deserialising the variables during type checks. The backports now are force-pushed with the new commit

psavidis commented 2 weeks ago

Merged the main fix and backports. Closing the issue.

psavidis commented 2 weeks ago

The PR for 7.20 was pointing to older changes due to an un-synced local branch.

Merged the latest behaviour with this PR

tasso94 commented 5 days ago

Potential labels were not replaced by version labels which broke the process of informing the customer.