business-science / modeltime

Modeltime unlocks time series forecast models and machine learning in one framework
https://business-science.github.io/modeltime/
Other
532 stars 82 forks source link

Error using certain functions (like RSI from the TTR package) in recursive forecasting #170

Open vidarsumo opened 2 years ago

vidarsumo commented 2 years ago
library(modeltime)
library(timetk)
library(tidyquant)
library(tidyverse)
library(tidymodels)

horizon <- 10

# 1.0.0 Get data ----
sp500_tbl <- tq_get(c("^GSPC","^IXIC","^DJI"), get = "stock.prices")

sp500_tbl <- sp500_tbl %>% 
    select(symbol, date, close) %>% 
    set_names("id", "date", "outcome")

# Create feature engineering function
fe_function <- function(data) {
    data %>%
        group_by(id) %>%
        mutate(
            rsi  = TTR::RSI(outcome) # using e.g. mean() from base R works but not SMA() from TTR package
        ) %>% 
        tk_augment_lags(outcome, .lags = 1:horizon) %>% 
        ungroup()

}

# Add new variables
sp500_features_tbl <- sp500_tbl %>% 
    fe_function

# Train and test split
splits <- time_series_cv(
    data        = sp500_features_tbl,
    date_var    = date,
    assess      = horizon,
    skip        = 1,
    cumulative  = TRUE,
    slice_limit = 1
)

train_data <- splits$splits[[1]] %>% training()
test_data  <- splits$splits[[1]] %>% testing()

recipe_spec <- recipe(outcome ~ ., data = train_data) %>% 
    step_dummy(all_nominal_predictors(), one_hot = TRUE) %>% 
    update_role(date, new_role = "indicator")

model_spec <- boost_tree(mode = "regression", engine = "xgboost", learn_rate = 0.003)

xgb_fit <- workflow() %>% 
    add_model(model_spec) %>% 
    add_recipe(recipe_spec) %>% 
    fit(train_data) %>% 
    recursive(
        transform = fe_function,
        train_tail = tail(train_data, horizon)
    ) %>% 
    modeltime_table()

xgb_fc_tbl <- xgb_fit %>% 
    modeltime_forecast(
        new_data = test_data,
        actual_data = bind_rows(train_data, test_data),
        keep_data = TRUE
    )

Error: Problem while computing rsi = TTR::RSI(outcome). Error in dplyr::filter(): ! Problem while computing ..1 = .model_desc == "ACTUAL" | .key == "prediction". Caused by error: ! object '.key' not found Run rlang::last_error() to see where the error occurred. Warning message: Unknown or uninitialised column: .key.

AlbertoAlmuinha commented 2 years ago

Hi @vidarsumo ,

The problem comes from the fact that in your transformation function you are grouping by "id" while later in the recipes you transform the "id" variable into a dummy variable. What happens is that later the test dataset is joined with the train_panel to be able to apply the transformation function and what happens is that (and I'm not entirely sure of this, but I think that's what is happening) the new_data is being joined with the transformation of the recipes while the train_panel is untransformed.

Therefore, what happens is that in the train_panel there is the variable "id" that contains 4 and 3 records for each value (which is too low to apply the moving average because in the function you have defined n = 14), but even if you reduce it, it will give you an error because the test dataset (new_data), when transformed and joined, enters NA values ​​in the "id" column and therefore fails.

@mdancho84 I think it is necessary to take a look to see that the union of new_data and train_panel is done correctly. I have also seen that there is a fix for the ID issue in the recursive function for panel data. I think it would be necessary to rethink in which part of the code the transformation of the recipes would have to be applied because perhaps these problems and corrections would disappear if the application of the recipes is done after the transformation of the function given by the user.

I haven't done many tests but I think it should be looked at at some point.

Regards,