Completed timer start events of event subprocesses rescheduled after instance migration

TimoStolz commented 1 year ago

Environment (Required on creation)

All environments, all databases, all supported versions.

Description (Required on creation; please attach any relevant screenshots, stacktraces, log files, etc. to the ticket)

When I migrate a process instance to an updated definition, timer start events of event subprocesses are rescheduled even if using "updateEventTrigger": false during migration. This applies only to timer start events that have already been completed.

Here is an example. See the screenshots below. A process is waiting for a long-standing task. The first timer start event triggered after 30 seconds.

grafik

After 30 seconds, I migrated the process instance to an updated process definition. As you can see, I specify "updateEventTrigger": false to prevent the timers from being rescheduled.

    {
      "sourceActivityIds": [
        "thirthy-seconds"
      ],
      "targetActivityIds": [
        "thirthy-seconds"
      ],
      "updateEventTrigger": false
    }

By then, the 1-hour-timer is still pending. As expected with "updateEventTrigger": false, the 1-hour-timer is NOT updated.

The 30-second timer had already been completed when the migration was running, and it is scheduled once again. Please look at the Jobs tab below: You can see, that the job "30 seconds" was created after the job "1 hour". More precisely: It was created during the migration of the process instance.

grafik

As a result, the first event subprocess has two tokens instead of just one.

grafik

Steps to reproduce (Required on creation)

Please download and deploy the attached BPMN file: debug-timer-start-events.zip
Start a process instance.
Wait 30 seconds for the first timer to be completed.
Update the process definition. (For testing, it is enough to set a new version number.)
Migrate the instance.
See the first timer being scheduled once more.

Observed Behavior (Required on creation)

The completed timer is being scheduled once again after migration.

Expected behavior (Required on creation)

The completed timer should not be scheduled again.

Root Cause (Required on prioritization)

Migration parsing looks for the current jobs of the process instance to migrate
- Only the waiting timer job is found (due after 1h).
- The job is correctly marked as migrating.
Migration parsing looks for emerging jobs, including timers
- The 30s timer is found and investigated regarding "Has it triggered already?".
- Only non-interrupting boundary events are considered in this investigation, event subprocess start events are not.
- The timer is considered to be a newly emerging trigger (i.e. considered as added to the target process definition).
- Newly emerging triggers are instantiated after the migration.

Solution Ideas

Add the timer start event subprocess job handler type to the "Has it triggered?" evaluation in the ActivityInstanceJobHandler:

if (targetTimerDeclaration.isInterruptingTimer() ||
    (targetTimerDeclaration.getJobHandlerType() != TimerExecuteNestedActivityJobHandler.TYPE && targetTimerDeclaration.getJobHandlerType() != TimerStartEventSubprocessJobHandler.TYPE) ||
    sourceTimerDeclarationsInEventScope.values().size() == 0) {
  return false;
}

Hints

Links

Breakdown

### Pull Requests

Dev2QA handover

[ ] Does this ticket need a QA test and the testing goals are not clear from the description? Add a Dev2QA handover comment

tmetzke commented 1 year ago

Thanks, @TimoStolz, for the detailed description, this is really helpful! We'll look into this as soon as possible and come back with our evaluation.

Thanks again and stay tuned!

Cheers, Tobias

tmetzke commented 1 year ago

Hi @TimoStolz,

thanks again for bringing this up.

I can reproduce the behavior you are experiencing. As far as I can tell, I would also consider this to be a bug at this point. I will add a root-causing as well as a solution proposal in a follow-up comment in this issue. We will review this in the team and get back here with a decision as soon as we can.

Cheers, Tobias

tmetzke commented 1 year ago

Root causing

Migration parsing looks for the current jobs of the process instance to migrate
- Only the waiting timer job is found (due after 1h).
- The job is correctly marked as migrating.
Migration parsing looks for emerging jobs, including timers
- The 30s timer is found and investigated regarding "Has it triggered already?".
- Only non-interrupting boundary events are considered in this investigation, event subprocess start events are not.
- The timer is considered to be a newly emerging trigger (i.e. considered as added to the target process definition).
- Newly emerging triggers are instantiated after the migration.

Solution idea

Add the timer start event subprocess job handler type to the "Has it triggered?" evaluation in the ActivityInstanceJobHandler:

if (targetTimerDeclaration.isInterruptingTimer() ||
    (targetTimerDeclaration.getJobHandlerType() != TimerExecuteNestedActivityJobHandler.TYPE && targetTimerDeclaration.getJobHandlerType() != TimerStartEventSubprocessJobHandler.TYPE) ||
    sourceTimerDeclarationsInEventScope.values().size() == 0) {
  return false;
}

tmetzke commented 1 year ago

Hey @TimoStolz, thanks again for bringing this up. I have identified the root cause of this issue and also created a solution proposal.

Would you be interested in making a contribution to fix this yourself? This would speed up the process tremendously. If you do, please reference this issue in your Pull Request so we can easily track it back.

I will put this issue into our general issue backlog until then.

Cheers, Tobias

camunda / camunda-bpm-platform