business-science / modeltime

Modeltime unlocks time series forecast models and machine learning in one framework
https://business-science.github.io/modeltime/
Other
522 stars 79 forks source link

Add feature to combine global models with nested models #138

Open LeoTimmermans opened 2 years ago

LeoTimmermans commented 2 years ago

When running many time series we want to experiment. Using global models and nested models is a natural thing to do when there are not too many time series. It would be nice to have a simple way to to compare performance and pick the best (top n) models (by id) no matter if it is a nested model or a global model.

It would be good to mix and mash.

kransom14 commented 2 years ago

I'm attempting to do this same thing. Has it been implemented?

mdancho84 commented 2 years ago

Not yet. I know a bunch of students want this. But it's going to be a while before I can tackle. Any help would be appreciated.

CC @AlbertoAlmuinha

AlbertoAlmuinha commented 2 years ago

Hey,

Could you give an example of what you want to achieve? This way it will be easier for me to try to implement it.

kransom14 commented 2 years ago

Sure, here's what I have so far. I make an ARIMA by use of nesting and then I make a deepAR by use of a global model. Then I calculate the error metrics for each method by id and join the two accuracy tables to compare performance. Ideally, we could combine the two approaches so it's all together and only requires one training and testing split. Perhaps if the arima_reg() parsnip function could take an id argument similar to the deepAR?

library(tidymodels)
library(modeltime.gluonts)
library(tidyverse)
library(timetk)

walmart_sales_weekly

# weekly sales data from 7 departments
data <- walmart_sales_weekly %>% 
  select(id, Date, Weekly_Sales) %>%
  set_names(c("ID", "date", "value"))

data

# nested arima, bad model just for use as example
splits_nest <- data %>% 
  extend_timeseries(.id_var = ID, .length_future = 25, .date_var = date) %>% 
  nest_timeseries(.id_var = ID, .length_future = 25) %>% 
  split_nested_timeseries(.length_test = 25)

rec_arima <- recipe(value ~ date, extract_nested_train_split(splits_nest))

wflw_arima <- workflow() %>%
  add_model(
    arima_reg(non_seasonal_ar = 2,
              non_seasonal_differences = 1,
              non_seasonal_ma = 1) %>% 
      set_engine("arima") 
  ) %>% 
  add_recipe(rec_arima)

nested_modeltime_tbl <- modeltime_nested_fit(nested_data = splits_nest, wflw_arima)

# accuracy by id
nested_modeltime_tbl %>% 
  extract_nested_test_accuracy() %>% # nesting doesn't require calibration? It's all included?
  group_by(ID)

# global model with deep ar
# requires different data set up
# create training and testing splits
splits <- time_series_split(
  data = data,
  assess = 25, 
  cumulative = TRUE)

splits %>% 
  tk_time_series_cv_plan() %>% 
  plot_time_series_cv_plan(date, value)

splits

# deep AR model
fit_deepar_gluonts <- deep_ar(
  id = "ID",
  freq = "w", # 1 week frequency
  prediction_length = 25, # 25 weeks
  lookback_length = 50,  # 50 weeks
  epochs = 10) %>% 
  set_engine("gluonts_deepar") %>% 
  fit(value ~ ID + date, data = training(splits))

# calibrate by id
calib_tbl <- modeltime_table( # modeltime table stores list of fitted models
  fit_deepar_gluonts
) %>% 
  modeltime_calibrate(testing(splits), id = "ID") 

calib_tbl

# accuracy by id
calib_tbl %>% 
  modeltime_accuracy(acc_by_id = TRUE)

# combine results by id from both approaches for comparison by ID
all_test_results <- rbind(nested_modeltime_tbl %>% 
                            extract_nested_test_accuracy() %>% 
                            group_by(ID), 
                            calib_tbl %>% 
                            modeltime_accuracy(acc_by_id = TRUE))
LeoTimmermans commented 2 years ago

I agree with @kransom14. An alternative could be being able to combine the modeltime tables and calibrate, calculate accuracy and select the best model (by id) from there.