Open luzbarbosa opened 2 years ago
Hi @luzbarbosa ,
Could you provide a reproducible example?
It is very complicated to be able to help you without knowing how you have generated the resamples or how you have reached those results.
In the tests that I have done, I do not see that the variable is lagging and both show that they are identical. Here is the code I used to test it in case it could be of any use to you:
library(tidymodels)
library(modeltime)
library(modeltime.resample)
library(timetk)
library(tidyverse)
library(tidyquant)
full_data_tbl <- walmart_sales_weekly %>%
select(id, Date, Weekly_Sales) %>%
# Apply Group-wise Time Series Manipulations
group_by(id) %>%
future_frame(
.date_var = Date,
.length_out = "3 months",
.bind_data = TRUE
) %>%
ungroup() %>%
# Consolidate IDs
mutate(id = fct_drop(id))
# Training Data
data_prepared_tbl <- full_data_tbl %>%
filter(!is.na(Weekly_Sales))
# Forecast Data
future_tbl <- full_data_tbl %>%
filter(is.na(Weekly_Sales))
walmart_tscv <- data_prepared_tbl %>%
time_series_cv(
date_var = Date,
assess = "3 months",
skip = "3 months",
cumulative = TRUE,
slice_limit = 6
)
walmart_tscv %>%
tk_time_series_cv_plan() %>%
plot_time_series_cv_plan(Date, Weekly_Sales,
.facet_ncol = 2, .interactive = F)
recipe_spec <- recipe(Weekly_Sales ~ .,
data = training(walmart_tscv$splits[[1]])) %>%
step_timeseries_signature(Date) %>%
step_rm(matches("(.iso$)|(.xts$)|(day)|(hour)|(minute)|(second)|(am.pm)")) %>%
step_mutate(Date_week = factor(Date_week, ordered = TRUE)) %>%
step_dummy(all_nominal(), one_hot = TRUE)
wflw_fit_xgboost <- workflow() %>%
add_model(
boost_tree() %>% set_engine("xgboost")
) %>%
add_recipe(recipe_spec %>% step_rm(Date)) %>%
fit(training(walmart_tscv$splits[[1]]))
model_tbl <- modeltime_table(
wflw_fit_xgboost
)
resample_results <- model_tbl %>%
modeltime_fit_resamples(
resamples = walmart_tscv,
control = control_resamples(verbose = FALSE)
)
input <- walmart_tscv %>%
filter(id == "Slice1") %>%
pull(splits) %>%
pluck(1) %>%
training()
input_2<-resample_results$.resample_results[[1]] %>%
filter(id == "Slice1") %>%
pull(splits) %>%
pluck(1) %>%
training()
identical(input, input_2)
In case you need more help, please send a reprex.
Regards,
The out-of-sample predictions generated by modeltime_fit_resamples() incorrectly show the predicted variable as if it was lagged.
When I run the following code it outputs out-of-sample projections, as expected :
However, a closer inspection of the output provided by modeltime_fit_resamples() shows that the predicted variable is not indexed by the same ".row" (id column) as it was inside "cv_resamples"
If we plot those variables, we can clearly see that the test set was lagged by a few days. The first slice inside 'cv_resamples' includes the test and the training set, both depicted in blue. The output, includes the out-of-sample projections for the test set but we ignore those and plot only the realized value of the target variable, in red. I include the code below just for completeness.