Open tackes opened 2 years ago
Doing a bit of research now:
OK, I'm also looking into these references:
Revisiting this. It turns out that the probably
R package now has int_conformal_X
functions that can be used with tidymodels
.
Yes that would be awesome! +1
Yes! Amazing! +1
probably::int_conformal_split()
function can be used to calculate conformal intervals using the split method. probably
does not implement a conformal interval for a "refitted model". modeltime
's internal workflow can repurpose the conformal intervals after refitting the models to improve forecast accuracy on future unseen data.It's also worth mentioning that the global confidence intervals that Modeltime produces by default are similar to the default conformal split method's residuals. I didn't see a big difference in the Walmart data set forecast. If anything Modeltime is more conservative.
# MODELTIME CONFORMAL INTERVAL EVALUATION ----
# Goal: Assess using probably R package for conformal interval
# SETUP ----
library(tidymodels)
library(probably)
#>
#> Attaching package: 'probably'
#> The following objects are masked from 'package:base':
#>
#> as.factor, as.ordered
library(modeltime)
library(timetk)
library(tidyverse)
library(plotly)
#>
#> Attaching package: 'plotly'
#> The following object is masked from 'package:ggplot2':
#>
#> last_plot
#> The following object is masked from 'package:stats':
#>
#> filter
#> The following object is masked from 'package:graphics':
#>
#> layout
walmart_sales_tbl <- timetk::walmart_sales_weekly %>%
select(id, Date, Weekly_Sales) %>%
mutate(id = as_factor(id))
walmart_sales_tbl %>%
group_by(id) %>%
plot_time_series(
Date, Weekly_Sales,
.facet_ncol = 2,
.interactive = FALSE,
)
splits <- time_series_split(
walmart_sales_tbl,
assess = "1 year",
cumulative = TRUE
)
#> Using date_var: Date
#> Data is not ordered by the 'date_var'. Resamples will be arranged by `Date`.
#> Overlapping Timestamps Detected. Processing overlapping time series together using sliding windows.
new_data_tbl <- walmart_sales_tbl %>%
group_by(id) %>%
future_frame(.length_out = "1 year") %>%
ungroup()
#> .date_var is missing. Using: Date
recipe_ml <- recipe(Weekly_Sales ~ ., training(splits)) %>%
step_timeseries_signature(Date) %>%
step_rm(Date) %>%
step_dummy(all_nominal_predictors(), one_hot = TRUE)
# recipe_ml %>% prep() %>% juice() %>% glimpse()
model_xgb <- boost_tree("regression") %>%
set_engine("xgboost")
wflw_fit_xgb <- workflow() %>%
add_model(model_xgb) %>%
add_recipe(recipe_ml) %>%
fit(training(splits))
# wflw_fit_xgb %>% predict(testing(splits))
# CONFORMAL SPLIT ----
wflw_fit_xgb_conformal_split <- int_conformal_split(
wflw_fit_xgb,
testing(splits)
)
test_predictions <- wflw_fit_xgb_conformal_split %>%
predict(testing(splits), level = 0.95)
training(splits) %>%
bind_rows(
testing(splits) %>%
bind_cols(test_predictions)
) %>%
group_by(id) %>%
ggplot(aes(Date, Weekly_Sales, group = id)) +
geom_line() +
facet_wrap(~ id, ncol = 2, scales = "free_y") +
geom_ribbon(aes(ymin = .pred_lower, ymax = .pred_upper),
alpha = 0.5) +
geom_line(aes(y = .pred), color = "red") +
theme_light()
#> Warning: Removed 637 rows containing missing values (`geom_line()`).
# How to refit?
# - Solution is to carry the CI forward
# COMPARE TO MODELTIME'S DEFAULT CONF INTERVALS ----
calib_tbl <- modeltime_table(
wflw_fit_xgb
) %>%
modeltime_calibrate(testing(splits))
forecast_tbl <- calib_tbl %>%
modeltime_forecast(
new_data = testing(splits),
actual_data = walmart_sales_tbl,
conf_interval = 0.95,
keep_data = TRUE
)
forecast_tbl %>%
group_by(id) %>%
plot_modeltime_forecast(.facet_ncol = 2, .interactive = F)
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
future_forecast_tbl <- calib_tbl %>%
modeltime_refit(walmart_sales_tbl) %>%
modeltime_forecast(
new_data = new_data_tbl,
actual_data = walmart_sales_tbl,
conf_interval = 0.95,
keep_data = TRUE
)
future_forecast_tbl %>%
group_by(id) %>%
plot_modeltime_forecast(.facet_ncol = 2, .interactive = F)
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
# CONCLUSIONS ----
# 1. probably::int_conformal_split() function can be used to calculate
# conformal intervals using the split method
# 2. probably does not implement a conformal interval for a refitted model
# 3. modeltime's internal workflow can be used to repurpose the conformal
# intervals after refitting (not currently implemented in probably)
Created on 2023-08-03 with reprex v2.0.2
Upon further inspection, the process that modeltime uses is different than what probably
uses. Due to differences in the workflow, it seems that it's not as simple of an integration. However, the calculations can be used to replicate the int_conformal_split()
function.
The processes for conformal predictions could be implemented.
modeltime_calibrate()
modeltime_forecast()
int_conformal_split()
predict()
I can implement the prediction methodology used within int_conformal_split()
:
predict.int_conformal_split <- function(object, new_data, level = 0.95, ...) {
check_data(new_data, object$wflow)
rlang::check_dots_empty()
new_pred <- predict(object$wflow, new_data)
alpha <- 1 - level
q_ind <- ceiling(level * (object$n + 1))
q_val <- object$resid[q_ind]
new_pred$.pred_lower <- new_pred$.pred - q_val
new_pred$.pred_upper <- new_pred$.pred + q_val
new_pred
}
The Conformal Split Method has been integrated into modeltime_forecast()
. You can see it in action here. Not a whole lot of difference between the default quantile method.
# MODELTIME CONFORMAL INTERVAL EVALUATION ----
# Goal: Assess using probably R package for conformal interval
# SETUP ----
library(tidymodels)
library(probably)
library(modeltime)
library(timetk)
library(tidyverse)
walmart_sales_tbl <- timetk::walmart_sales_weekly %>%
select(id, Date, Weekly_Sales) %>%
mutate(id = as_factor(id))
walmart_sales_tbl %>%
group_by(id) %>%
plot_time_series(
Date, Weekly_Sales,
.facet_ncol = 2,
.interactive = FALSE,
)
splits <- time_series_split(
walmart_sales_tbl,
assess = "1 year",
cumulative = TRUE
)
#> Using date_var: Date
#> Data is not ordered by the 'date_var'. Resamples will be arranged by `Date`.
#> Overlapping Timestamps Detected. Processing overlapping time series together using sliding windows.
new_data_tbl <- walmart_sales_tbl %>%
group_by(id) %>%
future_frame(.length_out = "1 year") %>%
ungroup()
#> .date_var is missing. Using: Date
recipe_ml <- recipe(Weekly_Sales ~ ., training(splits)) %>%
step_timeseries_signature(Date) %>%
step_rm(Date) %>%
step_dummy(all_nominal_predictors(), one_hot = TRUE)
model_xgb <- boost_tree("regression") %>%
set_engine("xgboost")
wflw_fit_xgb <- workflow() %>%
add_model(model_xgb) %>%
add_recipe(recipe_ml) %>%
fit(training(splits))
calib_tbl <- modeltime_table(
wflw_fit_xgb
) %>%
modeltime_calibrate(testing(splits), id = "id")
# CONFORMAL (DEFAULT) ----
forecast_tbl <- calib_tbl %>%
modeltime_forecast(
new_data = testing(splits),
actual_data = walmart_sales_tbl,
conf_interval = 0.95,
conf_method = "conformal_default",
keep_data = TRUE
)
forecast_tbl %>%
group_by(id) %>%
plot_modeltime_forecast(
.facet_ncol = 2,
.interactive = F,
.title = "Conformal Default"
)
# NEW CONFORMAL SPLIT ----
forecast_tbl <- calib_tbl %>%
modeltime_forecast(
new_data = testing(splits),
actual_data = walmart_sales_tbl,
conf_interval = 0.95,
conf_method = "conformal_split",
keep_data = TRUE
)
forecast_tbl %>%
group_by(id) %>%
plot_modeltime_forecast(
.facet_ncol = 2,
.interactive = FALSE,
.title = "Conformal Split"
)
# FUTURE FORECAST ON REFITTED MODEL WITH CONFORMAL SPLIT PREDICTION ----
future_forecast_tbl <- calib_tbl %>%
modeltime_refit(walmart_sales_tbl) %>%
modeltime_forecast(
new_data = new_data_tbl,
actual_data = walmart_sales_tbl,
conf_interval = 0.95,
conf_method = "conformal_split",
# conf_by_id = TRUE,
keep_data = TRUE
)
future_forecast_tbl %>%
group_by(id) %>%
plot_modeltime_forecast(
.facet_ncol = 2,
.interactive = F,
.title = "Conformal Split Method"
)
Created on 2023-08-03 with reprex v2.0.2
I've added a new vignette on Conformal Prediction Intervals in Modeltime. Please let me know if any changes are needed.
https://business-science.github.io/modeltime/articles/modeltime-conformal-prediction.html
I've updaded modeltime 1.2.8.9000
(development version) to now include Conformal Prediction Intervals and print display throughout the process for the confidence interval and method used.
# NESTED FORECASTING: CONFORMAL PREDICTION METHOD AND INTERVAL INTEGRATION ----
library(tidymodels)
library(modeltime)
library(tidyverse)
library(timetk)
data_tbl <- walmart_sales_weekly %>%
select(id, Date, Weekly_Sales) %>%
set_names(c("id", "date", "value"))
# 1.0 NESTED STRUCTURE ----
nested_data_tbl <- data_tbl %>%
extend_timeseries(
.id_var = id,
.date_var = date,
.length_future = 52
) %>%
nest_timeseries(
.id_var = id,
.length_future = 52,
.length_actual = 52*2
) %>%
split_nested_timeseries(
.length_test = 52
)
nested_data_tbl
#> # A tibble: 7 x 4
#> id .actual_data .future_data .splits
#> <fct> <list> <list> <list>
#> 1 1_1 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]>
#> 2 1_3 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]>
#> 3 1_8 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]>
#> 4 1_13 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]>
#> 5 1_38 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]>
#> 6 1_93 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]>
#> 7 1_95 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]>
# 2.0 MODELS ----
rec_xgb <- recipe(value ~ ., extract_nested_train_split(nested_data_tbl)) %>%
step_timeseries_signature(date) %>%
step_rm(date) %>%
step_zv(all_predictors()) %>%
step_dummy(all_nominal_predictors(), one_hot = TRUE)
wflw_xgb <- workflow() %>%
add_model(boost_tree("regression") %>% set_engine("xgboost")) %>%
add_recipe(rec_xgb)
# 3.0 CONFORMAL PREDICTION ----
nested_modeltime_tbl <- nested_data_tbl %>%
modeltime_nested_fit(
# Add workflows
wflw_xgb,
conf_interval = 0.95,
conf_method = "conformal_split"
)
#> Fitting models on training data... ====>-------------------------- 14% | ET...
#> Fitting models on training data... =========>--------------------- 29% | ET...
#> Fitting models on training data... =============>----------------- 43% | ET...
#> Fitting models on training data... =================>------------- 57% | ET...
#> Fitting models on training data... =====================>--------- 71% | ET...
#> Fitting models on training data... ==========================>---- 86% | ET...
# New: Conf Method and Interval NOW Shown in Print Output
nested_modeltime_tbl %>% extract_nested_test_forecast()
#> # Forecast Results
#>
#> Trained on: .splits | Forecast Errors: [0] | Conf Method: conformal_split |
#> Conf Interval: 0.95
#> # A tibble: 1,092 x 8
#> id .model_id .model_desc .key .index .value .conf_lo .conf_hi
#> <fct> <int> <chr> <fct> <date> <dbl> <dbl> <dbl>
#> 1 1_1 NA ACTUAL actual 2010-11-05 34239. NA NA
#> 2 1_1 NA ACTUAL actual 2010-11-12 19549. NA NA
#> 3 1_1 NA ACTUAL actual 2010-11-19 19553. NA NA
#> 4 1_1 NA ACTUAL actual 2010-11-26 18820. NA NA
#> 5 1_1 NA ACTUAL actual 2010-12-03 22518. NA NA
#> 6 1_1 NA ACTUAL actual 2010-12-10 31498. NA NA
#> 7 1_1 NA ACTUAL actual 2010-12-17 44913. NA NA
#> 8 1_1 NA ACTUAL actual 2010-12-24 55931. NA NA
#> 9 1_1 NA ACTUAL actual 2010-12-31 19125. NA NA
#> 10 1_1 NA ACTUAL actual 2011-01-07 15984. NA NA
#> # i 1,082 more rows
# Visualize Test Forecast
nested_modeltime_tbl %>%
extract_nested_test_forecast() %>%
group_by(id) %>%
plot_modeltime_forecast(
.facet_ncol = 2,
.interactive = FALSE
)
# 4.0 SELECT BEST MODELS----
# Select Best Forecasters
best_nested_modeltime_tbl <- nested_modeltime_tbl %>%
modeltime_nested_select_best(metric = "rmse")
# New: Conf Method and Interval NOW Shown in Print Output
best_nested_modeltime_tbl
#> # Nested Modeltime Table
#>
#> Trained on: .splits | Forecast Errors: [0] | Conf Method: conformal_split |
#> Conf Interval: 0.95
#> # A tibble: 7 x 5
#> id .actual_data .future_data .splits .modeltime_tables
#> <fct> <list> <list> <list> <list>
#> 1 1_1 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]> <mdl_tm_t [1 x 5]>
#> 2 1_3 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]> <mdl_tm_t [1 x 5]>
#> 3 1_8 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]> <mdl_tm_t [1 x 5]>
#> 4 1_13 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]> <mdl_tm_t [1 x 5]>
#> 5 1_38 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]> <mdl_tm_t [1 x 5]>
#> 6 1_93 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]> <mdl_tm_t [1 x 5]>
#> 7 1_95 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]> <mdl_tm_t [1 x 5]>
# 5.0 REFIT ----
# Refitting reuses conformal method and interval
nested_modeltime_refit_tbl <- best_nested_modeltime_tbl %>%
modeltime_nested_refit()
#> Fitting models on actual data... ====>-------------------------- 14% | ETA:...
#> Fitting models on actual data... =========>--------------------- 29% | ETA:...
#> Fitting models on actual data... =============>----------------- 43% | ETA:...
#> Fitting models on actual data... =================>------------- 57% | ETA:...
#> Fitting models on actual data... =====================>--------- 71% | ETA:...
#> Fitting models on actual data... ==========================>---- 86% | ETA:...
# New: Conf Method and Interval NOW Shown in Print Output
nested_modeltime_refit_tbl
#> # Nested Modeltime Table
#>
#> Trained on: .actual_data | Forecast Errors: [0] | Conf Method: conformal_split
#> | Conf Interval: 0.95
#> # A tibble: 7 x 5
#> id .actual_data .future_data .splits .modeltime_tables
#> <fct> <list> <list> <list> <list>
#> 1 1_1 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]> <mdl_tm_t [1 x 5]>
#> 2 1_3 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]> <mdl_tm_t [1 x 5]>
#> 3 1_8 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]> <mdl_tm_t [1 x 5]>
#> 4 1_13 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]> <mdl_tm_t [1 x 5]>
#> 5 1_38 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]> <mdl_tm_t [1 x 5]>
#> 6 1_93 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]> <mdl_tm_t [1 x 5]>
#> 7 1_95 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]> <mdl_tm_t [1 x 5]>
# Nested future forecast reuses conf method and interval
nested_modeltime_refit_tbl %>% extract_nested_future_forecast()
#> # Forecast Results
#>
#> Trained on: .actual_data | Forecast Errors: [0] | Conf Method: conformal_split
#> | Conf Interval: 0.95
#> # A tibble: 1,092 x 8
#> id .model_id .model_desc .key .index .value .conf_lo .conf_hi
#> <fct> <int> <chr> <fct> <date> <dbl> <dbl> <dbl>
#> 1 1_1 NA ACTUAL actual 2010-11-05 34239. NA NA
#> 2 1_1 NA ACTUAL actual 2010-11-12 19549. NA NA
#> 3 1_1 NA ACTUAL actual 2010-11-19 19553. NA NA
#> 4 1_1 NA ACTUAL actual 2010-11-26 18820. NA NA
#> 5 1_1 NA ACTUAL actual 2010-12-03 22518. NA NA
#> 6 1_1 NA ACTUAL actual 2010-12-10 31498. NA NA
#> 7 1_1 NA ACTUAL actual 2010-12-17 44913. NA NA
#> 8 1_1 NA ACTUAL actual 2010-12-24 55931. NA NA
#> 9 1_1 NA ACTUAL actual 2010-12-31 19125. NA NA
#> 10 1_1 NA ACTUAL actual 2011-01-07 15984. NA NA
#> # i 1,082 more rows
# Visualize
nested_modeltime_refit_tbl %>%
extract_nested_future_forecast() %>%
group_by(id) %>%
plot_modeltime_forecast(
.interactive = FALSE,
.facet_ncol = 2
)
# 6.0 FORECAST ADJUSTMENTS POST REFIT ----
# CHANGE CONF INTERVAL FROM 0.95 TO 0.80
# Modeltime Nested Forecast does NOT reuse Conf Interval and Method (user must specify)
adjusting_conf_interval <- nested_modeltime_refit_tbl %>%
modeltime_nested_forecast(
h = 52*2, # CHANGE TO 2 YEARS
conf_interval = 0.80, # CHANGE TO 80% CI
conf_method = "conformal_split"
)
#> Forecast predictions... =========>--------------------- 29% | ETA: 4s
#> Forecast predictions... =============>----------------- 43% | ETA: 3s
#> Forecast predictions... =================>------------- 57% | ETA: 2s
#> Forecast predictions... =====================>--------- 71% | ETA: 2s Forecast
#> predictions... ==========================>---- 86% | ETA: 1s
# New: Conf Method and Interval NOW Shown in Print Output
adjusting_conf_interval
#> # Forecast Results
#>
#> Trained on: | Forecast Errors: [0] | Conf Method: conformal_split | Conf
#> Interval: 0.8
#> # A tibble: 1,456 x 8
#> id .model_id .model_desc .key .index .value .conf_lo .conf_hi
#> <fct> <int> <chr> <fct> <date> <dbl> <dbl> <dbl>
#> 1 1_1 NA ACTUAL actual 2010-11-05 34239. NA NA
#> 2 1_1 NA ACTUAL actual 2010-11-12 19549. NA NA
#> 3 1_1 NA ACTUAL actual 2010-11-19 19553. NA NA
#> 4 1_1 NA ACTUAL actual 2010-11-26 18820. NA NA
#> 5 1_1 NA ACTUAL actual 2010-12-03 22518. NA NA
#> 6 1_1 NA ACTUAL actual 2010-12-10 31498. NA NA
#> 7 1_1 NA ACTUAL actual 2010-12-17 44913. NA NA
#> 8 1_1 NA ACTUAL actual 2010-12-24 55931. NA NA
#> 9 1_1 NA ACTUAL actual 2010-12-31 19125. NA NA
#> 10 1_1 NA ACTUAL actual 2011-01-07 15984. NA NA
#> # i 1,446 more rows
# Visualize
adjusting_conf_interval %>%
group_by(id) %>%
plot_modeltime_forecast(
.interactive = FALSE,
.facet_ncol = 2
)
Created on 2023-09-04 with reprex v2.0.2
I've attempted to make it more obvious of what Conformal Prediction method is being used, the confidence interval, and whether or not it's a global confidence or local confidence interval (i.e. takes into account the specific ID of the group).
# CONFORMAL FORECASTING ----
# Goal: New integration of Conformal Prediction
# SETUP ----
library(tidymodels)
library(modeltime)
library(timetk)
library(tidyverse)
walmart_sales_tbl <- timetk::walmart_sales_weekly %>%
select(id, Date, Weekly_Sales) %>%
mutate(id = as_factor(id))
walmart_sales_tbl %>%
group_by(id) %>%
plot_time_series(
Date, Weekly_Sales,
.facet_ncol = 2,
.interactive = FALSE,
)
splits <- time_series_split(
walmart_sales_tbl,
assess = "1 year",
cumulative = TRUE
)
#> Using date_var: Date
#> Data is not ordered by the 'date_var'. Resamples will be arranged by `Date`.
#> Overlapping Timestamps Detected. Processing overlapping time series together using sliding windows.
new_data_tbl <- walmart_sales_tbl %>%
group_by(id) %>%
future_frame(.length_out = "1 year") %>%
ungroup()
#> .date_var is missing. Using: Date
recipe_ml <- recipe(Weekly_Sales ~ ., training(splits)) %>%
step_timeseries_signature(Date) %>%
step_rm(Date) %>%
step_dummy(all_nominal_predictors(), one_hot = TRUE)
model_xgb <- boost_tree("regression") %>%
set_engine("xgboost")
wflw_fit_xgb <- workflow() %>%
add_model(model_xgb) %>%
add_recipe(recipe_ml) %>%
fit(training(splits))
calib_tbl <- modeltime_table(
wflw_fit_xgb
) %>%
modeltime_calibrate(
testing(splits),
id = "id"
)
# CONFORMAL SPLIT (GLOBAL CONFIDENCE) ----
# Global
test_forecast_global_tbl <- calib_tbl %>%
modeltime_forecast(
new_data = testing(splits),
actual_data = walmart_sales_tbl,
conf_interval = 0.95,
conf_method = "conformal_split",
conf_by_id = FALSE, # GLOBAL CONFIDENCE
keep_data = TRUE
)
# New: Displays Conf Method, Conf Interval and Conf by ID
test_forecast_global_tbl
#> # Forecast Results
#>
#> Conf Method: conformal_split | Conf Interval: 0.95 | Conf By ID: FALSE (GLOBAL
#> CONFIDENCE)
#> # A tibble: 1,365 x 10
#> .model_id .model_desc .key .index .value .conf_lo .conf_hi id
#> <int> <chr> <fct> <date> <dbl> <dbl> <dbl> <fct>
#> 1 NA ACTUAL actual 2010-02-05 24924. NA NA 1_1
#> 2 NA ACTUAL actual 2010-02-12 46039. NA NA 1_1
#> 3 NA ACTUAL actual 2010-02-19 41596. NA NA 1_1
#> 4 NA ACTUAL actual 2010-02-26 19404. NA NA 1_1
#> 5 NA ACTUAL actual 2010-03-05 21828. NA NA 1_1
#> 6 NA ACTUAL actual 2010-03-12 21043. NA NA 1_1
#> 7 NA ACTUAL actual 2010-03-19 22137. NA NA 1_1
#> 8 NA ACTUAL actual 2010-03-26 26229. NA NA 1_1
#> 9 NA ACTUAL actual 2010-04-02 57258. NA NA 1_1
#> 10 NA ACTUAL actual 2010-04-09 42961. NA NA 1_1
#> # i 1,355 more rows
#> # i 2 more variables: Date <date>, Weekly_Sales <dbl>
# Visualize
test_forecast_global_tbl %>%
group_by(id) %>%
plot_modeltime_forecast(
.facet_ncol = 2,
.interactive = F,
.title = "Test Prediction | Conformal Split | Global Confidence"
)
# LOCAL -----
# Global
test_forecast_local_tbl <- calib_tbl %>%
modeltime_forecast(
new_data = testing(splits),
actual_data = walmart_sales_tbl,
conf_interval = 0.95,
conf_method = "conformal_split",
conf_by_id = TRUE # LOCAL CONFIDENCE
)
# New: Displays Conf Method, Conf Interval and Conf by ID
test_forecast_local_tbl
#> # Forecast Results
#>
#> Conf Method: conformal_split | Conf Interval: 0.95 | Conf By ID: TRUE (LOCAL
#> CONFIDENCE)
#> # A tibble: 1,365 x 8
#> .model_id .model_desc .key .index .value .conf_lo .conf_hi id
#> <int> <chr> <fct> <date> <dbl> <dbl> <dbl> <fct>
#> 1 NA ACTUAL actual 2010-02-05 24924. NA NA 1_1
#> 2 NA ACTUAL actual 2010-02-12 46039. NA NA 1_1
#> 3 NA ACTUAL actual 2010-02-19 41596. NA NA 1_1
#> 4 NA ACTUAL actual 2010-02-26 19404. NA NA 1_1
#> 5 NA ACTUAL actual 2010-03-05 21828. NA NA 1_1
#> 6 NA ACTUAL actual 2010-03-12 21043. NA NA 1_1
#> 7 NA ACTUAL actual 2010-03-19 22137. NA NA 1_1
#> 8 NA ACTUAL actual 2010-03-26 26229. NA NA 1_1
#> 9 NA ACTUAL actual 2010-04-02 57258. NA NA 1_1
#> 10 NA ACTUAL actual 2010-04-09 42961. NA NA 1_1
#> # i 1,355 more rows
# Visualize
test_forecast_local_tbl %>%
group_by(id) %>%
plot_modeltime_forecast(
.facet_ncol = 2,
.interactive = F,
.title = "Test Prediction | Conformal Split Method | Local Confidence"
)
# FUTURE FORECAST ----
refit_tbl <- calib_tbl %>%
modeltime_refit(walmart_sales_tbl)
# Global
future_forecast_global_tbl <- refit_tbl %>%
modeltime_forecast(
new_data = new_data_tbl,
actual_data = walmart_sales_tbl,
conf_interval = 0.95,
conf_method = "conformal_split",
conf_by_id = FALSE, # GLOBAL CONFIDENCE
keep_data = TRUE
)
future_forecast_global_tbl
#> # Forecast Results
#>
#> Conf Method: conformal_split | Conf Interval: 0.95 | Conf By ID: FALSE (GLOBAL
#> CONFIDENCE)
#> # A tibble: 1,365 x 10
#> .model_id .model_desc .key .index .value .conf_lo .conf_hi id
#> <int> <chr> <fct> <date> <dbl> <dbl> <dbl> <fct>
#> 1 NA ACTUAL actual 2010-02-05 24924. NA NA 1_1
#> 2 NA ACTUAL actual 2010-02-12 46039. NA NA 1_1
#> 3 NA ACTUAL actual 2010-02-19 41596. NA NA 1_1
#> 4 NA ACTUAL actual 2010-02-26 19404. NA NA 1_1
#> 5 NA ACTUAL actual 2010-03-05 21828. NA NA 1_1
#> 6 NA ACTUAL actual 2010-03-12 21043. NA NA 1_1
#> 7 NA ACTUAL actual 2010-03-19 22137. NA NA 1_1
#> 8 NA ACTUAL actual 2010-03-26 26229. NA NA 1_1
#> 9 NA ACTUAL actual 2010-04-02 57258. NA NA 1_1
#> 10 NA ACTUAL actual 2010-04-09 42961. NA NA 1_1
#> # i 1,355 more rows
#> # i 2 more variables: Date <date>, Weekly_Sales <dbl>
future_forecast_global_tbl %>%
group_by(id) %>%
plot_modeltime_forecast(
.facet_ncol = 2,
.interactive = F,
.title = "Future Prediction | Conformal Split Method | Global Confidence"
)
# Local
future_forecast_local_tbl <- refit_tbl %>%
modeltime_forecast(
new_data = new_data_tbl,
actual_data = walmart_sales_tbl,
conf_interval = 0.95,
conf_method = "conformal_split",
conf_by_id = TRUE # LOCAL CONFIDENCE
)
future_forecast_local_tbl
#> # Forecast Results
#>
#> Conf Method: conformal_split | Conf Interval: 0.95 | Conf By ID: TRUE (LOCAL
#> CONFIDENCE)
#> # A tibble: 1,365 x 8
#> .model_id .model_desc .key .index .value .conf_lo .conf_hi id
#> <int> <chr> <fct> <date> <dbl> <dbl> <dbl> <fct>
#> 1 NA ACTUAL actual 2010-02-05 24924. NA NA 1_1
#> 2 NA ACTUAL actual 2010-02-12 46039. NA NA 1_1
#> 3 NA ACTUAL actual 2010-02-19 41596. NA NA 1_1
#> 4 NA ACTUAL actual 2010-02-26 19404. NA NA 1_1
#> 5 NA ACTUAL actual 2010-03-05 21828. NA NA 1_1
#> 6 NA ACTUAL actual 2010-03-12 21043. NA NA 1_1
#> 7 NA ACTUAL actual 2010-03-19 22137. NA NA 1_1
#> 8 NA ACTUAL actual 2010-03-26 26229. NA NA 1_1
#> 9 NA ACTUAL actual 2010-04-02 57258. NA NA 1_1
#> 10 NA ACTUAL actual 2010-04-09 42961. NA NA 1_1
#> # i 1,355 more rows
future_forecast_local_tbl %>%
group_by(id) %>%
plot_modeltime_forecast(
.facet_ncol = 2,
.interactive = F,
.title = "Future Prediction | Conformal Split Method | Local Confidence"
)
Created on 2023-09-04 with reprex v2.0.2
This is interesting. I'm going to look into.