gertjanssenswillen / processmapR

!! repository moved to https://github.com/bupaverse/processmapR !! This repo is read-only from now one.
Other
7 stars 9 forks source link

Negative durations for overlapping/parallel activity periods #33

Closed JesseVent closed 5 years ago

JesseVent commented 5 years ago

Hello, I have a couple questions in regards to how best to handle activities that occur at the same time as other activities or overlap with other activities.

I'm trying to convert a series of customer and staff activities that can occur as part of a case, but often they can be created and/or completed at the same time.

All in all its quite a complex process with a variety of activities and users (Over 100 different activity types) but when filtering the frequency of the activities and then trying to visualise the performance process maps I seem to get a large amount of negative durations on the edges between activities on which I don't know how I should be handling.

My questions are:

I've replicated a similar issue in one of repos using the loan application event log data set which is available here jessevent/loan-app-process. Funnily enough it also causes processanimateR tokens to traverse backwards and float off of edges to different activities which is actually how I first identified I was experiencing something odd.

This is essentially the format/code i'm using to transform my activity instances into the event log.

example_log_4 %>%
    mutate(activity_instance = 1:nrow(.)) %>%
    gather(status, timestamp, schedule, start, complete)  %>%
    filter(!is.na(timestamp)) %>%
    eventlog(
        case_id = "patient",
        activity_id = "activity",
        activity_instance_id = "activity_instance",
        lifecycle_id = "status",
        timestamp = "timestamp",
        resource_id = "resource"
    )

Thanks so much for any assistance, the whole bupaR framework is an outstanding piece of work and an amazing achievement. Personally i've spent a long time looking for a framework like this and am quite excited with the progress and future to come!

gertjanssenswillen commented 5 years ago

Hi @JesseVent , thanks for your message.

The current process map visualization is not really made for activities which happen at the same time or overlap. The negative times are essentially telling you this. If activity B starts while activity A is still running, there will be a flow from A to B (telling you that B started after A started), but the time indication will be negative, telling you that that B started before A finished). Indeed, the time shown by default is the time between the end of the source and the start of the target activity.

NB: if activities start at exactly the same time, there will still be a flow between them. In that case the order is decided based on the order the events were in in the data when you create the event log (unless you configured another order at that point).

So, you are not doing anything wrong, the fact is just that the process map is not advanced enough for this kind of data. And just dropping those arcs with negative durations (as is typically done in commerical process mining tools) would not be very transparant.

What you basically need is real discovery algorithms which can detect and show AND and XOR splits. Unfortunately, these are currently not included in bupaR. You can have a look at other commercial tools for this, such as Apromore (http://apromore.org/) or ProM (http://www.promtools.org/doku.php).

Another possibility is to transform the event data you have towards a higher level of abstraction, where you have clearly delineated activities which cannot overlap. However, that might not be ideal, depending on the questions you are having.

Hopefully there will be other algorithms included in the future which can discover BPMN-like models. If you have any suggestions on how to tackle this issue in the current process maps yourself, please feel free to let me know.

frkbr commented 5 years ago

Adding to Gert's point on more advanced discovery algorithms that detect AND splits: it would be great if more open source developers in the process mining field added some of their best work to the bupaR ecosystem. Felix' work on processanimateR is the best example for this. Another algorithm that deserves attention and would fit nicely into the bupaR world is the Inductive Visual Miner. So far it's only available in ProM. R has so many advantages in terms of reproducibility and integrating data preparation over other platforms.

fmannhardt commented 5 years ago

I wonder whether we can leverage some of the work done by pm4py since it would be possible to use Python code via the reticulate package. However, this is still very much under construction and we would need stable versions.

Of course, when using real process discovery algorithms, some more problems pop up: having to deal with alignments for performance measures / animations for instance :-)

I started a quick and dirty implementation of the Heuristics Miner internally, which would remove the clutter of edges when parallelism is present. However, without having implemented the XOR/AND split detection. I will try to put it up as a package soon.

JesseVent commented 5 years ago

Thanks for the advice, I ended up identifying the overlapping activities, ranking them into sequences and then concatenated the activity id's together.. seem to be getting pretty good results so far, not receiving any negative durations anyway and probably reflects my process reasonably well due to how many activities get completed all at once.

Example code is available at loan-app-process/02-parallel-activities