business-science / timetk

Time series analysis in the `tidyverse`
https://business-science.github.io/timetk/
613 stars 101 forks source link

Improve speed of tk_augment_lags() #143

Open PabloCanovas opened 1 year ago

PabloCanovas commented 1 year ago

When creating several lags at the same time for a given variable, I've found that using a map+partial structure is around 7 times faster when working with big datasets and multiple lags (I tried with 16M rows and 10 lags). It could be worth it to check it out.

For your reference, this is the function I built:

calculate_lags <- function(df, var, lags){ map_lag <- lags %>% map(~partial(lag, n = .x)) return(df %>% mutate(across(.cols = {{var}}, .fns = map_lag, .names = "{.col}_lag{lags}"))) }

Edit: I don't know why it doesn't respect indentation...

spsanderson commented 1 year ago

I was literally exploring the same this morning