business-science / modeltime

Modeltime unlocks time series forecast models and machine learning in one framework
https://business-science.github.io/modeltime/
Other
522 stars 79 forks source link

Output mismatch for multiple series (30+) #189

Open jjmarks opened 2 years ago

jjmarks commented 2 years ago

I have been forecasting price timeseries (15 minute intervals) across 32 different stocks in a game.

The model outputs seem incorrect (as if the plots are mismatched).

Is this an issue with the code I have produced or should it be expected from such inputs? The final forecast output is below. Note that deep_ar is clearly off, although prophet/xgboost also seem off by a constant shift.

modeltime_issue

The code I have used to produce these plots is below:

# Libraries ---------------------------------------------------------------
library(tidyverse)
library(janitor)
library(tidymodels)
library(modeltime)
library(modeltime.gluonts)
library(timetk)
library(lubridate)
library(feather)
library(catboost)
library(boostime)

# Load stock data ----------------------------------------------------
glimpse(stock_df)
#Rows: 177,824
#Columns: 3
#Groups: stock_ticker [32]
#$ stock_ticker <fct> TSB, TCI, SMFG, LAG, IOU, G, TCHS, Y, TTCT, CC, MI, TC~
#$ date         <dttm> 2022-05-04 22:45:00, 2022-05-04 22:45:00, 2022-05-04 ~
#$ price        <dbl> 926.93, 996.05, 493.07, 340.08, 137.39, 251.32, 323.35~

stock_df %>% 
  plot_time_series(
    .date_var    = date, 
    .value       = price,
    .facet_ncol  = 5,
    .interactive = FALSE
  )

# Test/train splits -------------------------------------------------------
splits <- stock_df %>% 
  ungroup() %>% 
  time_series_split(assess = "5 day", cumulative = TRUE, date_var = date)

# Feature Engineering -----------------------------------------------------

# Modeltime algos (e.g. prophet, arima etc)
rec_obj <- recipe(price ~ ., training(splits)) %>%
  step_timeseries_signature(date) %>%
  step_zv(all_predictors()) %>%
  step_dummy(all_nominal_predictors(), one_hot = TRUE)
rec_obj %>% prep() %>% juice() %>% glimpse()

# Parsnip algos (e.g. xgboost)
rec_obj2 <- rec_obj %>% 
  update_role(date, new_role = "ID")

# Build models ------------------------------------------------------------
xgb <- workflow() %>%
  add_model(
    boost_tree() %>% set_engine("xgboost")
  ) %>%
  add_recipe(rec_obj2) %>%
  fit(training(splits))

prophet_xgb <- workflow() %>% 
  add_model(
    prophet_boost() %>% set_engine("prophet_xgboost")
  ) %>% 
  add_recipe(rec_obj) %>% 
  fit(training(splits))

deep_learning <- deep_ar(
      # User Defined (Required) Parameters
      id = "stock_ticker",
      freq = "15min",
      prediction_length = 480,
      # Hyper Parameters
      epochs = 1,
      num_batches_per_epoch = 4
    ) %>% 
  set_engine("gluonts_deepar") %>% 
  fit(price ~ date + stock_ticker, training(splits))

# Compare models ----------------------------------------------------------
model_tbl <- modeltime_table(
  xgb,
  prophet_xgb,
  deep_learning
)

calib_tbl <- model_tbl %>%
  modeltime_calibrate(
    new_data = testing(splits), 
    id = "stock_ticker"
  )

overall_acc <- calib_tbl %>% 
  modeltime_accuracy(acc_by_id = FALSE) %>% 
  table_modeltime_accuracy(.interactive = FALSE)

stock_acc <- calib_tbl %>% 
  modeltime_accuracy(acc_by_id = TRUE) %>% 
  table_modeltime_accuracy(.interactive = FALSE)

# Forecast ----------------------------------------------------------------
stock_fct <- calib_tbl %>%
  modeltime_forecast(
    new_data    = testing(splits),
    actual_data = bind_rows(training(splits), testing(splits)),
    keep_data =   TRUE
  ) %>%
  group_by(stock_ticker) %>%
  plot_modeltime_forecast(
    .conf_interval_show = FALSE,
    .facet_ncol  = 5,
    .interactive = FALSE
  )
mdancho84 commented 2 years ago

What is your session info please. Use:

devtools::session_info()
mdancho84 commented 2 years ago

Hardhat has a known issue that you'll need to change

rec_obj2 <- rec_obj %>% 
  update_role(date, new_role = "ID")

To:

rec_obj2 <- rec_obj %>% 
  step_rm(date)
jjmarks commented 2 years ago

What is your session info please. Use:

devtools::session_info()

Sure. It's my first time posting a GitHub issue, please let me know if there's anything else that would be useful.

- Session info ------------------------------------------------------------
 setting  value
 version  R version 4.1.2 (2021-11-01)
 os       Windows 10 x64 (build 19044)
 system   x86_64, mingw32
 ui       RStudio
 language (EN)
 collate  English_United States.1252
 ctype    English_United States.1252
 tz       Europe/London
 date     2022-07-10
 rstudio  2022.02.0+443 Prairie Trillium (desktop)
 pandoc   NA

- Packages ----------------------------------------------------------------
 ! package           * version    date (UTC) lib source
   assertthat          0.2.1      2019-03-21 [1] CRAN (R 4.1.2)
   backports           1.4.1      2021-12-13 [1] CRAN (R 4.1.2)
   boostime          * 0.1.0      2022-07-08 [1] Github (AlbertoAlmuinha/boostime@516997b)
   broom             * 1.0.0      2022-07-01 [1] CRAN (R 4.1.3)
   cachem              1.0.6      2021-08-19 [1] CRAN (R 4.1.3)
   callr               3.7.0      2021-04-20 [1] CRAN (R 4.1.2)
   catboost          * 1.0.6      2022-07-08 [1] url (https://github.com/catboost/catboost/releases/download/v1.0.6/catboost-R-Windows-1.0.6.tgz)
   cellranger          1.1.0      2016-07-27 [1] CRAN (R 4.1.2)
   checkmate           2.1.0      2022-04-21 [1] CRAN (R 4.1.3)
   class               7.3-19     2021-05-03 [2] CRAN (R 4.1.2)
   cli                 3.3.0      2022-04-25 [1] CRAN (R 4.1.3)
   codetools           0.2-18     2020-11-04 [2] CRAN (R 4.1.2)
   colorspace          2.0-3      2022-02-21 [1] CRAN (R 4.1.2)
   crayon              1.5.1      2022-03-26 [1] CRAN (R 4.1.3)
   data.table          1.14.2     2021-09-27 [1] CRAN (R 4.1.2)
   DBI                 1.1.3      2022-06-18 [1] CRAN (R 4.1.3)
   dbplyr              2.2.1      2022-06-27 [1] CRAN (R 4.1.3)
   devtools            2.4.3      2021-11-30 [1] CRAN (R 4.1.3)
   dials             * 1.0.0      2022-06-14 [1] CRAN (R 4.1.3)
   DiceDesign          1.9        2021-02-13 [1] CRAN (R 4.1.3)
   digest              0.6.29     2021-12-01 [1] CRAN (R 4.1.2)
   dplyr             * 1.0.9      2022-04-28 [1] CRAN (R 4.1.3)
   ellipsis            0.3.2      2021-04-29 [1] CRAN (R 4.1.2)
   fansi               1.0.3      2022-03-24 [1] CRAN (R 4.1.3)
   farver              2.1.1      2022-07-06 [1] CRAN (R 4.1.2)
   fastmap             1.1.0      2021-01-25 [1] CRAN (R 4.1.2)
   feather           * 0.3.5      2019-09-15 [1] CRAN (R 4.1.3)
   forcats           * 0.5.1      2021-01-27 [1] CRAN (R 4.1.2)
   foreach             1.5.2      2022-02-02 [1] CRAN (R 4.1.3)
   fs                  1.5.2      2021-12-08 [1] CRAN (R 4.1.2)
   furrr               0.3.0      2022-05-04 [1] CRAN (R 4.1.3)
   future              1.26.1     2022-05-27 [1] CRAN (R 4.1.3)
   future.apply        1.9.0      2022-04-25 [1] CRAN (R 4.1.3)
   generics            0.1.3      2022-07-05 [1] CRAN (R 4.1.2)
   ggplot2           * 3.3.6      2022-05-03 [1] CRAN (R 4.1.3)
   globals             0.15.1     2022-06-24 [1] CRAN (R 4.1.3)
   glue                1.6.2      2022-02-24 [1] CRAN (R 4.1.2)
   gower               1.0.0      2022-02-03 [1] CRAN (R 4.1.2)
   GPfit               1.0-8      2019-02-08 [1] CRAN (R 4.1.3)
   gridExtra           2.3        2017-09-09 [1] CRAN (R 4.1.2)
   gt                  0.6.0      2022-05-24 [1] CRAN (R 4.1.3)
   gtable              0.3.0      2019-03-25 [1] CRAN (R 4.1.2)
   hardhat             1.2.0      2022-06-30 [1] CRAN (R 4.1.3)
   haven               2.5.0      2022-04-15 [1] CRAN (R 4.1.3)
   here                1.0.1      2020-12-13 [1] CRAN (R 4.1.3)
   hms                 1.1.1      2021-09-26 [1] CRAN (R 4.1.2)
   htmltools           0.5.2      2021-08-25 [1] CRAN (R 4.1.2)
   httr                1.4.3      2022-05-04 [1] CRAN (R 4.1.3)
   infer             * 1.0.2      2022-06-10 [1] CRAN (R 4.1.3)
   inline              0.3.19     2021-05-31 [1] CRAN (R 4.1.3)
   ipred               0.9-13     2022-06-02 [1] CRAN (R 4.1.3)
   iterators           1.0.14     2022-02-05 [1] CRAN (R 4.1.3)
   janitor           * 2.1.0      2021-01-05 [1] CRAN (R 4.1.3)
   jsonlite            1.8.0      2022-02-22 [1] CRAN (R 4.1.2)
   labeling            0.4.2      2020-10-20 [1] CRAN (R 4.1.1)
   lattice             0.20-45    2021-09-22 [2] CRAN (R 4.1.2)
   lava                1.6.10     2021-09-02 [1] CRAN (R 4.1.3)
   lhs                 1.1.5      2022-03-22 [1] CRAN (R 4.1.3)
   lifecycle           1.0.1      2021-09-24 [1] CRAN (R 4.1.2)
   listenv             0.8.0      2019-12-05 [1] CRAN (R 4.1.3)
   loo                 2.5.1      2022-03-24 [1] CRAN (R 4.1.3)
   lubridate         * 1.8.0      2021-10-07 [1] CRAN (R 4.1.2)
   magrittr            2.0.3      2022-03-30 [1] CRAN (R 4.1.3)
   MASS                7.3-54     2021-05-03 [2] CRAN (R 4.1.2)
   Matrix              1.3-4      2021-06-01 [2] CRAN (R 4.1.2)
   matrixStats         0.62.0     2022-04-19 [1] CRAN (R 4.1.3)
   memoise             2.0.1      2021-11-26 [1] CRAN (R 4.1.3)
   modeldata         * 1.0.0      2022-07-01 [1] CRAN (R 4.1.3)
   modelr              0.1.8      2020-05-19 [1] CRAN (R 4.1.2)
   modeltime         * 1.2.2      2022-06-07 [1] CRAN (R 4.1.3)
   modeltime.gluonts * 0.3.1      2022-07-09 [1] Github (business-science/modeltime.gluonts@f4eec5f)
   munsell             0.5.0      2018-06-12 [1] CRAN (R 4.1.2)
   nnet                7.3-16     2021-05-03 [2] CRAN (R 4.1.2)
   parallelly          1.32.0     2022-06-07 [1] CRAN (R 4.1.3)
   parsnip           * 1.0.0      2022-06-16 [1] CRAN (R 4.1.3)
   pillar              1.7.0      2022-02-01 [1] CRAN (R 4.1.2)
   pkgbuild            1.3.1      2021-12-20 [1] CRAN (R 4.1.3)
   pkgconfig           2.0.3      2019-09-22 [1] CRAN (R 4.1.2)
   pkgload             1.3.0      2022-06-27 [1] CRAN (R 4.1.3)
   png                 0.1-7      2013-12-03 [1] CRAN (R 4.1.1)
   prettyunits         1.1.1      2020-01-24 [1] CRAN (R 4.1.2)
   processx            3.7.0      2022-07-07 [1] CRAN (R 4.1.2)
   prodlim             2019.11.13 2019-11-17 [1] CRAN (R 4.1.3)
   prophet             1.0        2021-03-30 [1] CRAN (R 4.1.3)
   ps                  1.7.1      2022-06-18 [1] CRAN (R 4.1.3)
   purrr             * 0.3.4      2020-04-17 [1] CRAN (R 4.1.2)
   R6                  2.5.1      2021-08-19 [1] CRAN (R 4.1.2)
   rappdirs            0.3.3      2021-01-31 [1] CRAN (R 4.1.2)
   Rcpp                1.0.8.3    2022-03-17 [1] CRAN (R 4.1.3)
 D RcppParallel        5.1.5      2022-01-05 [1] CRAN (R 4.1.3)
   readr             * 2.1.2      2022-01-30 [1] CRAN (R 4.1.2)
   readxl              1.4.0      2022-03-28 [1] CRAN (R 4.1.3)
   recipes           * 1.0.1      2022-07-07 [1] CRAN (R 4.1.2)
   remotes             2.4.2      2021-11-30 [1] CRAN (R 4.1.3)
   reprex              2.0.1      2021-08-05 [1] CRAN (R 4.1.2)
   reticulate          1.25       2022-05-11 [1] CRAN (R 4.1.3)
   rlang               1.0.3      2022-06-27 [1] CRAN (R 4.1.3)
   rpart               4.1-15     2019-04-12 [2] CRAN (R 4.1.2)
   rprojroot           2.0.3      2022-04-02 [1] CRAN (R 4.1.3)
   rsample           * 1.0.0      2022-06-24 [1] CRAN (R 4.1.3)
   rstan               2.21.5     2022-04-11 [1] CRAN (R 4.1.3)
   rstudioapi          0.13       2020-11-12 [1] CRAN (R 4.1.2)
   rvest               1.0.2      2021-10-16 [1] CRAN (R 4.1.2)
   sass                0.4.1      2022-03-23 [1] CRAN (R 4.1.3)
   scales            * 1.2.0      2022-04-13 [1] CRAN (R 4.1.3)
   sessioninfo         1.2.2      2021-12-06 [1] CRAN (R 4.1.3)
   snakecase           0.11.0     2019-05-25 [1] CRAN (R 4.1.3)
   StanHeaders         2.21.0-7   2020-12-17 [1] CRAN (R 4.1.3)
   stringi             1.7.6      2021-11-29 [1] CRAN (R 4.1.2)
   stringr           * 1.4.0      2019-02-10 [1] CRAN (R 4.1.2)
   survival            3.2-13     2021-08-24 [2] CRAN (R 4.1.2)
   tibble            * 3.1.7      2022-05-03 [1] CRAN (R 4.1.3)
   tidymodels        * 0.2.0      2022-03-19 [1] CRAN (R 4.1.3)
   tidyr             * 1.2.0      2022-02-01 [1] CRAN (R 4.1.2)
   tidyselect          1.1.2      2022-02-21 [1] CRAN (R 4.1.2)
   tidyverse         * 1.3.1      2021-04-15 [1] CRAN (R 4.1.2)
   timeDate            3043.102   2018-02-21 [1] CRAN (R 4.1.2)
   timetk            * 2.8.1      2022-05-31 [1] CRAN (R 4.1.3)
   tune              * 1.0.0      2022-07-07 [1] CRAN (R 4.1.2)
   tzdb                0.3.0      2022-03-28 [1] CRAN (R 4.1.3)
   usethis             2.1.6      2022-05-25 [1] CRAN (R 4.1.3)
   utf8                1.2.2      2021-07-24 [1] CRAN (R 4.1.2)
   vctrs               0.4.1      2022-04-13 [1] CRAN (R 4.1.3)
   withr               2.5.0      2022-03-03 [1] CRAN (R 4.1.3)
   workflows         * 1.0.0      2022-07-05 [1] CRAN (R 4.1.2)
   workflowsets      * 0.2.1      2022-03-15 [1] CRAN (R 4.1.3)
   xgboost             1.6.0.1    2022-04-16 [1] CRAN (R 4.1.3)
   xml2                1.3.3      2021-11-30 [1] CRAN (R 4.1.2)
   xts                 0.12.1     2020-09-09 [1] CRAN (R 4.1.2)
   yardstick         * 1.0.0      2022-06-06 [1] CRAN (R 4.1.3)
   zoo                 1.8-10     2022-04-15 [1] CRAN (R 4.1.3)

 [1] C:/Users/Joe/OneDrive - Newcastle University/Documents/R/win-library/4.1
 [2] C:/Program Files/R/R-4.1.2/library

 D -- DLL MD5 mismatch, broken installation.

- Python configuration ----------------------------------------------------
 python:         C:/Users/Joe/AppData/Local/r-miniconda/envs/r-gluonts/python.exe
 libpython:      C:/Users/Joe/AppData/Local/r-miniconda/envs/r-gluonts/python37.dll
 pythonhome:     C:/Users/Joe/AppData/Local/r-miniconda/envs/r-gluonts
 version:        3.7.1 | packaged by conda-forge | (default, Mar 13 2019, 13:32:59) [MSC v.1900 64 bit (AMD64)]
 Architecture:   64bit
 numpy:          C:/Users/Joe/AppData/Local/r-miniconda/envs/r-gluonts/Lib/site-packages/numpy
 numpy_version:  1.16.6
 numpy:          C:\Users\Joe\AppData\Local\R-MINI~1\envs\R-GLUO~1\lib\site-packages\numpy\__init__.p

 NOTE: Python version was forced by use_python function

Hardhat has a known issue that you'll need to change

rec_obj2 <- rec_obj %>% 
  update_role(date, new_role = "ID")

To:

rec_obj2 <- rec_obj %>% 
  step_rm(date)

I changed this step, although the outputs appear the same (I removed the deep learning forecast so it is more clear).

modeltime_issue2
mdancho84 commented 2 years ago

Hey, circling back on this. It's tough to tell if this is a bug or what's going on.

Can you provide the dataset and I can attempt to reproduce?