business-science / modeltime.resample

Resampling Tools for Time Series Forecasting with Modeltime
https://business-science.github.io/modeltime.resample/
Other
19 stars 5 forks source link

Not possible to use ensemble #5

Open vidarsumo opened 3 years ago

vidarsumo commented 3 years ago

I was going to use an ensemble as a part of my time series cross validation. This did not work. Here is my code.

# Create average ensemble and add to the modeltime table
ml_mtbl <- ml_mtbl %>% 
    combine_modeltime_tables(
        ml_mtbl %>% 
            ensemble_average() %>% 
            modeltime_table()
    )

# TS CV
resamples_tscv <- time_series_cv(
    data        = train_data,
    assess      = "11 days",
    initial     = "730 days",
    skip        = 11,
    slice_limit = 20,
    cumulative = TRUE
    )
resamples_fitted <- ml_mtbl %>% 
    modeltime_fit_resamples(
        resamples = resamples_tscv,
        control   = control_resamples(verbose = FALSE, allow_par = TRUE)
    )

The results:

> resamples_fitted
# Modeltime Table
# A tibble: 6 x 4
  .model_id .model         .model_desc               .resample_results
      <int> <list>         <chr>                     <list>           
1         1 <workflow>     XGBOOST                   <rsmp[+]>        
2         2 <workflow>     RANGER                    <rsmp[+]>        
3         3 <workflow>     GLMNET                    <rsmp[+]>        
4         4 <workflow>     KERNLAB                   <rsmp[+]>        
5         5 <workflow>     KERNLAB                   <rsmp[+]>        
6         6 <ensemble [5]> ENSEMBLE (MEAN): 5 MODELS <lgl [1]>        

So when I want to check the accuracy I only get the accuracy for the individual models, not the ensemble. So this code gives me a tibble with all the five models, not he ensemble.

resamples_fitted %>%
    modeltime_resample_accuracy()
AlbertoAlmuinha commented 2 years ago

Hi @vidarsumo ,

The problem here is that there are no dispatch methods associated with ensembles as I've seen in modeltime.resample.

Keep in mind that it would be a expensive change since it would be necessary to recombine all the predictions of each individual model in each slice and also recalculate the associated metrics.

I can try to give it some thought when I get some time, but I would try to opt for other strategies because I don't think this change will be soon.

Regards,