bupaverse / bupaR

Core R package for business process analysis
http://www.bupar.net
Other
56 stars 6 forks source link

Event log creation takes a long time to complete #34

Closed felipecr00 closed 1 year ago

felipecr00 commented 3 years ago

Hi everyone!

I have a Dataframe of about 15Million-rows. I have correctly defined the case_id, activity, timestamp, among others. My problem is that for large datasets it takes too long to run, can even reach up to hours.

Is there any way I can run this transformation fast enough ?

Thank you! Felipe

fmannhardt commented 3 years ago

Try to add validate = FALSE to the call when creating the event log: https://github.com/bupaverse/bupaR/blob/5fcb7540be4c0f166f352842bbe0810245594932/R/eventlog.R#L61

This disables certain consistency checks that take quite a while on large logs.

felipecr00 commented 3 years ago

It works!, this reduces the log creation time considerably. Just to be clear, what does the deactivation of validation affect?

Thank you!

fmannhardt commented 3 years ago

The validation checks (the list is not complete):

You can see the code here: https://github.com/bupaverse/bupaR/blob/5fcb7540be4c0f166f352842bbe0810245594932/R/eventlog.R#L224

Problem with this validation is that it is not really efficient. We left the parameter on TRUE by default for backwards compatibility. I think it should be better documented that this is really a showstopper for large event logs.

fmannhardt commented 3 years ago

@gertjanssenswillen I patched the website documentation here: https://github.com/bupaverse/website/pull/1