How to extract edges values as a data frame?

bupaverse / processmapR

Visualize event logs using directed graphs, i.e. process maps.

https://bupaverse.github.io/processmapR/

Other

9 stars 6 forks source link

How to extract edges values as a data frame? #37

Open lejarx opened 4 years ago

lejarx commented 4 years ago

Hi,

First of all, thank you so much for this wonderful package, I really hope to use only this package for process mining analysis, even though there're other tools out there. But as an R user, I definitely want to give bupaR a try first.

I would need some help with extracting values from different edges into a tibble.

For example, in from the screenshot, Open to Pending User Info is 46.44 hours, Work in Progress to Closed is 1.28 hours.

Thank you

lejarx commented 4 years ago

library(bupaR)
res <- patients %>%
  filter_activity_presence(c("X-Ray", "MRI SCAN"), method = "none")  %>%
  process_map()

glimpse(res)

attr(res, 'base_precedence')
attr(res, 'edges')

I can extract it this way

gertjanssenswillen commented 4 years ago

You can indeed use the attributes. Alternatively, you can use the function get_flows to extract the edge information (instead of getting the edges attribute) - this will probably be more stable towards the future. The base_precedence data is more detailed, you can still the way change the computation of performance metrics (mean, median, etc).

Its on the to-do list to make these functions clearer and documented.

lejarx commented 4 years ago

@gertjanssenswillen thanks for the information on get_flows. I notice that the performance metrics is calculating based on calendar days, meaning it simply calculate the timestamp differences to calculate the performance metrics.

Are there some examples of how can I write the FUN function so that the performance metrics are calculated based on business hours in a day (8 hours) instead of the default 24 hours window?

gertjanssenswillen commented 4 years ago

@lejarx not straigthforwardly at the moment. How would such a calculation work for you, ideally? It is "easy" to not count days in a weekend, because they are known. How would you suggest to count only business hours?

Taking into account business hours, like from 9 to 5? Or only count a max of 8 hours a day?
what if anything happens outside business hours, (or make assumption its not possible?)

Certainly something we can add, but let's take a minute to get the idea right.

lejarx commented 4 years ago

@gertjanssenswillen, let me know if we can create a new thread on this time calculation. It's good you ask.

It's probably more complex due to the possibility to pause a task (i.e. pending external information), and different business time zones, among others.

But yes, I think for a start, we can assume a blanket rule of business hours from 9 to 5 based on only one timezone. I think by limiting to 9 to 5, we can already avoid overcalculating and this will be the closer to the actual performance than doing the difference between end and start of activities.