Closed felipecr00 closed 1 year ago
Try to add validate = FALSE
to the call when creating the event log:
https://github.com/bupaverse/bupaR/blob/5fcb7540be4c0f166f352842bbe0810245594932/R/eventlog.R#L61
This disables certain consistency checks that take quite a while on large logs.
It works!, this reduces the log creation time considerably. Just to be clear, what does the deactivation of validation affect?
Thank you!
The validation checks (the list is not complete):
You can see the code here: https://github.com/bupaverse/bupaR/blob/5fcb7540be4c0f166f352842bbe0810245594932/R/eventlog.R#L224
Problem with this validation is that it is not really efficient. We left the parameter on TRUE
by default for backwards compatibility. I think it should be better documented that this is really a showstopper for large event logs.
@gertjanssenswillen I patched the website documentation here: https://github.com/bupaverse/website/pull/1
Hi everyone!
I have a Dataframe of about 15Million-rows. I have correctly defined the case_id, activity, timestamp, among others. My problem is that for large datasets it takes too long to run, can even reach up to hours.
Is there any way I can run this transformation fast enough ?
Thank you! Felipe