Improve performance with dtplyr

HSLdevcom / mal-effect-calculations

Impact assessment scripts for the Helsinki region MAL planning process

https://www.hsl.fi/hsl/mal

European Union Public License 1.2

1 stars 0 forks source link

Improve performance with dtplyr #74

Closed johpiip closed 1 year ago

johpiip commented 1 year ago

This PR introduces dtplyr in places that have poor performance. In these places, we wish to calculate summaries from agents' tours per agent which means ~1.3-1.7M groups (agents) with ~1.7 rows (tours) on average per group. As suggested here (https://github.com/tidyverse/dplyr/issues/5017) I used dtplyr instead of dplyr in key functions that I recognized to be slow.

Runtime decreased 98 % from 210 minutes to only 4 minutes.

Closes #62

johpiip commented 1 year ago

I tested this PR by running the old and new codebase and comparing the results.

hsl-petrhaj commented 1 year ago

Should be fine to merge then.