business-science / modeltime

Modeltime unlocks time series forecast models and machine learning in one framework
https://business-science.github.io/modeltime/
Other
535 stars 82 forks source link

Refit Recursive Ensembles #89

Closed mdancho84 closed 3 years ago

mdancho84 commented 3 years ago

Refitting is managed in the modeltime.ensemble package. Need to add refit helpers to manage the refitting process.

mdancho84 commented 3 years ago

@AlbertoAlmuinha Need your help. I've exhausted my brain for today. Trying to get the Refit for Panel Ensembles Working. I have it working for a single time series, but I'm running into an error with the .transform(temp_new_data).

#> Error in .transform(temp_new_data): argument "id" is missing, with no default

It's very odd since this only happens when refitting, and the objects

Error Example

Here's a reproducible example showing the error.

library(modeltime)
library(tidymodels)
library(tidyverse)
library(lubridate)
library(timetk)
library(slider)
library(dplyr)
library(modeltime.ensemble)

FORECAST_HORIZON <- 24

m4_extended <- m4_monthly %>%
    group_by(id) %>%
    future_frame(
        .length_out = FORECAST_HORIZON,
        .bind_data  = TRUE
    ) %>%
    ungroup()
#> .date_var is missing. Using: date

# TRANSFORM FUNCTION ----
# - NOTE - We create lags by group
lag_transformer_grouped <- function(data){
    data %>%
        group_by(id) %>%
        tk_augment_lags(value, .lags = 1:FORECAST_HORIZON) %>%
        ungroup()
}

m4_lags <- m4_extended %>%
    lag_transformer_grouped()

train_data <- m4_lags %>%
    drop_na()

future_data <- m4_lags %>%
    filter(is.na(value))

model_fit_lm <- linear_reg() %>%
    set_engine("lm") %>%
    fit(value ~ ., data = train_data)

model_fit_mars <- mars("regression") %>%
    set_engine("earth") %>%
    fit(value ~ ., data = train_data)

recursive_ensemble_p <- modeltime_table(
        model_fit_mars,
        model_fit_lm
    ) %>%
    ensemble_average(type = "median") %>%
    recursive(
        transform  = lag_transformer_grouped,
        train_tail = panel_tail(train_data, id, FORECAST_HORIZON),
        id = "id"
    )

fcast <- modeltime_table(
        recursive_ensemble_p
    ) %>%
    modeltime_forecast(
        new_data = future_data,
        actual_data = m4_lags,
        keep_data = TRUE
    )

fcast %>%
    group_by(id) %>%
    plot_modeltime_forecast(
        .interactive = TRUE,
        .conf_interval_show = FALSE
    )

# object <- recursive_ensemble_p
# 
# object$spec$train_tail %>%
#     dplyr::count(!! rlang::sym(object$spec$id)) %>%
#     dplyr::pull(n) %>%
#     stats::median(na.rm = TRUE)

obj_2 <- recursive_ensemble_p %>% mdl_time_refit(train_data) 

obj_2 %>% mdl_time_forecast(new_data = train_data)
#> Error in .transform(temp_new_data): argument "id" is missing, with no default

Created on 2021-04-01 by the reprex package (v1.0.0)

mdancho84 commented 3 years ago

Here's the full traceback

Error in .transform(temp_new_data) : 
  argument "id" is missing, with no default 
15.
.transform(temp_new_data) 
14.
tibble::rowid_to_column(., var = "..row_id") 
13.
dplyr::group_by(., !!..id) 
12.
dplyr::group_split(.) 
11.
purrr::map(., function(x) {
    dplyr::slice_tail(x, n = new_data_size) %>% .[slice_idx, 
        ]
}) 
10.
list2(...) 
9.
dplyr::bind_rows(.) 
8.
dplyr::arrange(., ..row_id) 
7.
dplyr::select(., -..row_id) 
6.
.transform(temp_new_data) %>% tibble::rowid_to_column(var = "..row_id") %>% 
    dplyr::group_by(!!..id) %>% dplyr::group_split() %>% purrr::map(function(x) {
    dplyr::slice_tail(x, n = new_data_size) %>% .[slice_idx, 
        ] ... at modeltime-recursive.R#686
5.
.transform(.temp_new_data, new_data_size, i, id) at modeltime-forecast.R#1094
4.
mdl_time_forecast_recursive_ensemble_panel(object = object, calibration_data = calibration_data, 
    new_data = new_data, h = h, actual_data = actual_data, bind_actual = bind_actual, 
    keep_data = keep_data, arrange_index = arrange_index, ...) at modeltime-forecast.R#942
3.
mdl_time_forecast.recursive_ensemble(., new_data = train_data) at modeltime-forecast.R#451
2.
mdl_time_forecast(., new_data = train_data) 
1.
obj_2 %>% mdl_time_forecast(new_data = train_data) 
mdancho84 commented 3 years ago

This is working now... The issue was that I was trying to forecast the training data, which is not good when we require recursion and a training tail to be behind it.

The fix was to run:

> obj_2 %>% mdl_time_forecast(new_data = future_data)
# A tibble: 96 x 3
   .key       .index     .value
   <fct>      <date>      <dbl>
 1 prediction 2015-07-01  7617.
 2 prediction 2015-07-01  2543.
 3 prediction 2015-07-01  9998.
 4 prediction 2015-07-01  1291.
 5 prediction 2015-08-01  7280.
 6 prediction 2015-08-01  2187.
 7 prediction 2015-08-01  9986.
 8 prediction 2015-08-01  1364.
 9 prediction 2015-09-01  5847.
10 prediction 2015-09-01  2087.
# … with 86 more rows