gertjanssenswillen / processmapR

!! repository moved to https://github.com/bupaverse/processmapR !! This repo is read-only from now one.
Other
7 stars 9 forks source link

Performance improvement in process_map, avoid the use of recode and r… #16

Closed fmannhardt closed 6 years ago

fmannhardt commented 6 years ago

…eplace by two separate slice operations. This part still feels very much to slow for what it is doing since almost all time in process_map (except for rendering) is spent in adding the endpoints.

As an indication, this is the result of a microbenchmark with several solutions:

image

fmannhardt commented 6 years ago

The second commit is about visualizing custom attributes on the nodes/edges. Feel free to adapt the code, I did not manage to dynamically generate the list of attributes for select, somehow select_vars did not apply the renaming when generated by a function.

fmannhardt commented 6 years ago

I just discovered that something is wrong with the two full_joins in: suppressWarnings(base_log %>% ungroup() %>% mutate(act = ordered(act, levels = c("Start", as.character(sort(activity_labels(eventlog))), "End"))) %>% group_by(case) %>% arrange(start_time, min_order) %>% mutate(next_act = lead(act), next_start_time = lead(start_time), next_end_time = lead(end_time)) %>% full_join(base_nodes, by = c("act" = "act")) %>% rename(from_id = node_id) %>% full_join(base_nodes, by = c("next_act" = "act")) %>% rename(to_id = node_id) %>% select(-n.x, -n.y) %>% filter(!is.na(act)) %>% # FM, something is wrong with the joins here ungroup() -> base_precedence) So that a NA row was generated. Not sure whether I caused that with my patches. Don't have time at the moment and just put a filter statement in there. Will have a look later.

gertjanssenswillen commented 6 years ago

Thanks for noticing. I'm having a look at it now, so I'll try to find the problem.

gertjanssenswillen commented 6 years ago

@fmannhardt Do you have a reproducible example of the generated NA row?

Bseides, if one uses the custom perspective for edges, the depicted values will be based on the attribute values related to the start-node of the edge, right? Just to make sure the documentation is correct on this part.

fmannhardt commented 6 years ago

Regarding the NA I will try this weekend if I manage. May be a problem in my dataset, which is difficult to share right now.

Yes I take the first attribute value encountered for each activity_instance_id: !!attribute := first(!!attribute)) -> base_log would be nice to be customizable, maybe it should actually even be the difference (absolute or not) between both values. That would be nice, wouldn't it? You could have something like the change in temperature or whatever.

gertjanssenswillen commented 6 years ago

Customizable would be nice indeed, but I leave it as is for now. I have added the option to configure the color scale to all perspective functions. It might be useful to chose a different one than the default red and blue for custom attribute (green to red for cost or whatever).

Overall, thanks for the addition! It's a very useful feature, people have been asking for it but I never got time to do something with it.