business-science / modeltime.ensemble

Time Series Ensemble Forecasting
https://business-science.github.io/modeltime.ensemble/
Other
73 stars 16 forks source link

Unable to set parameter ranges in metalearners (misleading error message in "ensemble_model_spec") #20

Closed lg1000 closed 2 years ago

lg1000 commented 2 years ago

When using the standard modeling workflow, without stacked ensembles, I do not experience any hardship in setting individual parameter ranges like this:

xgb_grid <- grid_latin_hypercube( learn_rate(range = c(-5.0, -0.1)), size = 30 )

Also I know, how to update parameters and how to pull them from workflow objects.

What I do not know is, how this works with metalearner stacks. If I am not fundamentally wrong, the argument "param_info" is used for this purpose. As the documentation of ensemble_model_spec states, param_info can take a.dials parameter object as an input. However, ether I am not getting the concept of dials param objects right, or there is some problem with my code, because my solution is resulting in "all models failed, see .notes column". This error message is not helpful, as I do not have a tuned object created by for example a tune_grid function. Where do I see a notes column here? From this point on I am stucked, because I have no clear indication about the source of the error.

reprex:

# time series ML
suppressPackageStartupMessages(library(modeltime))
suppressPackageStartupMessages(library(tidymodels))
suppressPackageStartupMessages(library(modeltime.ensemble))
suppressPackageStartupMessages(library(modeltime.resample))
suppressPackageStartupMessages(library(timetk))
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(doParallel))

data <- m4_monthly

H = 6
# training + forecast
full_data_tbl <- data %>%   
  group_by(id) %>%
  future_frame(
    .length_out = H,
    .bind_data  = TRUE
  ) %>%
  ungroup() %>% 
  mutate(id = fct_drop(id)
         )
# training and test
data_prepared_tbl <- full_data_tbl %>%
  filter(!is.na(value)
         )
# forecast 
future_tbl <- full_data_tbl %>%
  filter(is.na(value)
         )
# splits
set.seed(544)
splits <- data_prepared_tbl %>%
  time_series_split(
    date_var    = date,
    assess      = H,
    cumulative = TRUE
  )
resamples_tscv <- data_prepared_tbl %>%
  time_series_cv(
    date_var    = date,
    assess      = "6 months",
    skip        = "6 months",
    cumulative  = TRUE,
    slice_limit = 3
  )
#recipe
recipe_spec_mars <- recipe(value ~ .,
                       data = training(splits)
                       ) %>%  
  update_role(date, new_role = "ID") %>%
  step_dummy(all_nominal(), one_hot = TRUE) 
set.seed(522)
wflw_fit_mars <- workflow() %>%
  add_model(
    mars(num_terms = 5, prod_degree = 2,
         mode = "regression"
    ) %>%
      set_engine("earth")
  ) %>%
  add_recipe(recipe_spec_mars) %>%
  fit(training(splits)
      )
# lasso -----------------------
set.seed(522)
wflw_fit_lasso <- workflow() %>%
  add_model(
    linear_reg(penalty = 0.1, mixture = 1,
               mode = "regression"
    ) %>%
      set_engine("glmnet")
  ) %>%
  add_recipe(recipe_spec_mars) %>%
  fit(training(splits)
      )

#### STACK ------------------------
submodels_stacks <- modeltime_table(
  wflw_fit_lasso,
  wflw_fit_mars
)
# fit resamples
cores <- parallel::detectCores(logical = FALSE)
cl <- makePSOCKcluster(cores)
registerDoParallel(cl)
set.seed(234)
submodel_predictions <- submodels_stacks %>%
  modeltime_fit_resamples(
    resamples = resamples_tscv,
    control   = control_resamples(verbose = TRUE)
  )
stopCluster(cl)
# Metalearner XGBOOST
set.seed(123)
cores <- parallel::detectCores(logical = FALSE)
cl <- makePSOCKcluster(cores)
registerDoParallel(cl)
ensemble_fit_xgboost <- submodel_predictions %>%
  ensemble_model_spec(
    model_spec = boost_tree(
      trees          = tune(),
      tree_depth     = tune(),
      learn_rate     = tune(),
      loss_reduction = tune(),
      min_n          = tune(), 
      mtry           = tune()
    ) %>%
      set_engine("xgboost"),
    kfolds = 10,
    grid   = 30,
    param_info = tune::parameters(learn_rate(range = c(-0.5, -0.01))
    ),
    control = control_grid(verbose = TRUE,
                           allow_par = TRUE)
  )
stopCluster(cl)