Public-Health-Scotland / phsmethods

An R package to standardise methods used in Public Health Scotland (https://public-health-scotland.github.io/phsmethods/)
https://public-health-scotland.github.io/phsmethods/
54 stars 13 forks source link

A large speed up to `extract_fin_year` #96

Closed Moohan closed 1 year ago

Moohan commented 1 year ago

I was working on a similar function for one of my projects and realised the improvements I made there were also applicable here.

This is basically a full rewrite of the function but there is no change functionally (all the tests still pass).

This provides a speedup of 70X for a single date, and a 2X speedup for 10 million dates, scaling between those two numbers for other vector sizes! Importantly the changes also use 2.5-3X less memory.

> bench::press(
+   n = c(1, 1e3, 1e5, 1e7),
+   {
+     dates <- create_dates(n)
+     bench::mark(
+      original = extract_fin_year(dates),
+      new = extract_fin_year_new(dates),
+      relative = TRUE,
+      min_time = 3
+     )
+   }
+ ) %>% 
+   print() %>% 
+   ggplot2::autoplot()
Running with:
         n
1        1
2     1000
3   100000
4 10000000
# A tibble: 8 × 14
  expression        n   min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result    memory     time      
  <bch:expr>    <dbl> <dbl>  <dbl>     <dbl>     <dbl>    <dbl> <int> <dbl>   <bch:tm> <list>    <list>     <list>    
1 original          1 71.3   69.1       1       Inf        1      776    15      2.84s <chr [2]> <Rprofmem> <bench_tm>
2 new               1  1      1        71.1     NaN        1.47  9996     4    515.3ms <chr [2]> <Rprofmem> <bench_tm>
3 original       1000 19.0   19.3       1         3.16     2.43   747    14      2.85s <chr>     <Rprofmem> <bench_tm>
4 new            1000  1      1        19.2       1        1     9996     4      1.98s <chr>     <Rprofmem> <bench_tm>
5 original     100000  2.17   2.05      1         2.77     4.09   223     4      2.91s <chr>     <Rprofmem> <bench_tm>
6 new          100000  1      1         2.16      1        1      493     1      2.98s <chr>     <Rprofmem> <bench_tm>
7 original   10000000  1.94   1.97      1         2.52     1.32     3     8      3.42s <chr>     <Rprofmem> <bench_tm>
8 new        10000000  1      1         2.02      1        1        6     6      3.38s <chr>     <Rprofmem> <bench_tm>

image

Tina815 commented 1 year ago

Hi James, thanks very much for the improvement! I really like the clever way of doing this and made the method much more efficient.