Public-Health-Scotland / phsmethods

An R package to standardise methods used in Public Health Scotland (https://public-health-scotland.github.io/phsmethods/)
https://public-health-scotland.github.io/phsmethods/
54 stars 13 forks source link

Speed-ups for the CHI functions #41

Closed Moohan closed 4 years ago

Moohan commented 4 years ago

Changes can be made without losing any functionality to increase the speed of the CHI functions chi_check and chi_pad. The speedup is useful, particularly when running against a large number of CHIs.

I've already created a branch chi-speed-up. Below are the improvements when running on this list of CHIs (used in the tests) chis <- c("0101011237", "0101201234", "3201201234", "0113201234", "3213201234", "123456789", "12345678900", "010120123?", "1904851231")

A roughly 10X speedup overall on the chi_check function: check_bm <- microbenchmark::microbenchmark( check_old = chi_check(chis), check_new = chi_check_new(chis), check = "identical", times = 1000 )

check_bm

Unit: microseconds expr min lq mean median uq max neval check_old 2995.849 3105.486 3453.5949 3160.845 3286.9545 58056.93 1000 check_new 299.747 319.190 484.2312 336.472 360.2365 124137.35 1000

ggplot2::autoplot(check_bm) chi_check

And a >2X speedup for chi_pad: pad_bm <- microbenchmark::microbenchmark( pad_old = chi_pad(chis), pad_new = chi_pad_new(chis), check = "identical", times = 1000 )

pad_bm

Unit: microseconds expr min lq mean median uq max neval pad_old 32.405 34.025 38.27963 35.106 36.186 2678.280 1000 pad_new 11.341 11.882 15.47622 13.503 14.043 2212.726 1000

ggplot2::autoplot(pad_bm) chi_pad

graemegowans commented 4 years ago

Looks good, much faster! I had made a small change to this function as well after a suggestion from @alan-y to give more useful output when CHI is missing (currently returns as "invalid character") see #39. It hasn't been approved/merged yet so we might need to figure out the best way to incorporate both?

Moohan commented 4 years ago

I think I checked and there weren't any merge conflicts but I guess the best way to do it is to have yours approved and merged first, then either merge mine or make any changes to make it work before merging.