Extend directly-follows EMD with parallelism

david-chapela commented 1 year ago

Extend the implementation of directly-follows EMD by building the n-grams taking into account concurrency.

Use a concurrency oracle (e.g. Heuristics Miner) to extract concurrency relations between activities (e.g. B is concurrent with C).
Build the n-grams aware of this concurrency. For example, if the sequence is ABCD, the trigrams would be ABDand ACD.

marlondumas commented 1 year ago

This improvement is a bit dangerous because it could lead to equating a model sequence(A, choice(B, C), D) with the model sequence(A, parallel(B, C), D), despite the obvious differences between them. In my experience, there are some nasty interactions between concurrency oracles and Markovian abstractions, and it is better not to mix them up.

david-chapela commented 1 year ago

There would be a small difference between sequence(A, choice(B, C), D) and sequence(A, parallel(B, C), D). In the first one, each execution of the structure will lead to two arcs (A -> B and B -> D, or A -> C and A -> D), while the second one will always lead to four arcs: A -> B, A -> C, B -> D, and C -> D. Hence:

100 executions of sequence(A, choice(B, C), D) result in trigrams ABD and ACD with frequency 50 (ABD and ACD).
100 executions of sequence(A, parallel(B, C), D) result in the same trigrams, but with frequency 100.

I'm not sure if this difference is enough to differentiate these structures for a complete (and maybe complex) process.

AutomatedProcessImprovement / log-distance-measures

Extend directly-follows EMD with parallelism #4