AutomatedProcessImprovement / log-distance-measures

Python package with event log distance and similarity metrics
Apache License 2.0
5 stars 1 forks source link

Extend directly-follows EMD with parallelism #4

Closed david-chapela closed 1 year ago

david-chapela commented 1 year ago

Extend the implementation of directly-follows EMD by building the n-grams taking into account concurrency.

  1. Use a concurrency oracle (e.g. Heuristics Miner) to extract concurrency relations between activities (e.g. B is concurrent with C).
  2. Build the n-grams aware of this concurrency. For example, if the sequence is ABCD, the trigrams would be ABDand ACD.
marlondumas commented 1 year ago

This improvement is a bit dangerous because it could lead to equating a model sequence(A, choice(B, C), D) with the model sequence(A, parallel(B, C), D), despite the obvious differences between them. In my experience, there are some nasty interactions between concurrency oracles and Markovian abstractions, and it is better not to mix them up.

david-chapela commented 1 year ago

There would be a small difference between sequence(A, choice(B, C), D) and sequence(A, parallel(B, C), D). In the first one, each execution of the structure will lead to two arcs (A -> B and B -> D, or A -> C and A -> D), while the second one will always lead to four arcs: A -> B, A -> C, B -> D, and C -> D. Hence:

I'm not sure if this difference is enough to differentiate these structures for a complete (and maybe complex) process.