Residual Diagnostics - Function to visualize residuals

spsanderson commented 4 years ago

I think it would be great to be able to extract from a modeltime_table the model description and the associated data with it this allows one to look at the residuals ect. how they want.

mdancho84 commented 4 years ago

Do you mean visualize the calibration residuals?

spsanderson commented 4 years ago

I do

On Sun, Aug 23, 2020 at 8:56 AM Matt Dancho notifications@github.com wrote:

Do you mean visualize the calibration residuals?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/business-science/modeltime/issues/22#issuecomment-678771164, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPCNS6TEYGOPFV4ZA2CMEDSCEGWPANCNFSM4QD7EFKA .

-- Steven P Sanderson II, MPH Book on Lulu http://goo.gl/lmrlFI

mdancho84 commented 4 years ago

Ok, yes, this is a great idea and something I've been contemplating too (just haven't had time to do it yet).

Implementation

I'm considering development of a modeltime_residuals() function and plot_modeltime_residuals() where the calibration tibble to evaluate out-of-sample residuals. It would work similar to modeltime_forecast() where the data is generated, then the plotting function makes it easy to visualize it.

spsanderson commented 4 years ago

Awesome in the meantime I’ll keep hacking away at it and see if I can get something working to send you

Sent from my iPhone

On Aug 23, 2020, at 2:15 PM, Matt Dancho notifications@github.com wrote:

Ok, yes, this is a great idea and something I've been contemplating too (just haven't had time to do it yet).

Implementation

I'm considering development of a modeltime_residuals() function and plot_modeltime_residuals() where the calibration tibble to evaluate out-of-sample residuals. It would work similar to modeltime_forecast() where the data is generated, then the plotting function makes it easy to visualize it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

mdancho84 commented 4 years ago

Improvements - Residuals & Accuracy

modeltime_residuals() - A new function used to extract out residual information
plot_modeltime_residuals() - Visualizes the output of modeltime_residuals(). Offers 3 plots:
1. Time Plot - Residuals over time
2. ACF Plot - Residual Autocorrelation vs Lags
3. Seasonality - Residual Seasonality Plot
Fitted vs Test - Models can now use training(splits) to visualize the in-sample residuals and accuracy. Modeltime Models like ARIMA use "fitted" predictions since sequential models cannot predict data in the past. Prophet and other modeltime models don't need to use "fitted" predictions, but it also saves time for these. You'll see "Fitted" in the .type column if fitted predictions are used.


# SETUP ----

library(modeltime)
library(tidymodels)
library(tidyverse)
library(timetk)
library(lubridate)

m750 <- m4_monthly %>%
    filter(id == "M750")

splits <- initial_time_split(m750, prop = 0.9)

# MODELS ----

model_fit_arima <- arima_reg() %>%
    set_engine("auto_arima") %>%
    fit(value ~ date, training(splits))
#> frequency = 12 observations per 1 year

model_fit_prophet <- prophet_reg() %>%
    set_engine("prophet") %>%
    fit(value ~ date, training(splits))
#> Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
#> Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

model_fit_lm <- linear_reg() %>%
    set_engine("lm") %>%
    fit(value ~ splines::ns(date, df = 5) 
        + month(date, label = TRUE), 
        training(splits))

# CALIBRATION ----

model_tbl <- modeltime_table(
    model_fit_arima,
    model_fit_prophet,
    model_fit_lm
)

calibration_tbl <- model_tbl  %>%
    modeltime_calibrate(testing(splits))

# ACCURACY ----

# Out-of-sample 
calibration_tbl %>% modeltime_accuracy()
#> # A tibble: 3 x 9
#>   .model_id .model_desc             .type   mae  mape  mase smape  rmse   rsq
#>       <int> <chr>                   <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1         1 ARIMA(0,1,1)(0,1,1)[12] Test   151.  1.41 0.516  1.43  198. 0.930
#> 2         2 PROPHET                 Test   178.  1.70 0.609  1.71  235. 0.880
#> 3         3 LM                      Test   156.  1.55 0.534  1.52  236. 0.915

# In-sample
calibration_tbl %>% modeltime_accuracy(training(splits))
#> # A tibble: 3 x 9
#>   .model_id .model_desc             .type    mae  mape  mase smape  rmse   rsq
#>       <int> <chr>                   <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1         1 ARIMA(0,1,1)(0,1,1)[12] Fitted  104.  1.19 0.409  1.19  154. 0.988
#> 2         2 PROPHET                 Fitted  157.  1.80 0.613  1.80  212. 0.977
#> 3         3 LM                      Test    180.  2.05 0.704  2.05  247. 0.969

# RESIDUALS - Time Plot ----

# Out of Sample
calibration_tbl %>%
    modeltime_residuals() %>%
    plot_modeltime_residuals(.type = "timeplot", .interactive = F)


# In Sample
calibration_tbl %>%
    modeltime_residuals(training(splits)) %>%
    plot_modeltime_residuals(.type = "timeplot", .interactive = F)


# RESIDUALS - ACF

# Out of Sample
calibration_tbl %>%
    modeltime_residuals() %>%
    plot_modeltime_residuals(.type = "acf", .interactive = F)
#> Max lag exceeds data available. Using max lag: 30
#> Max lag exceeds data available. Using max lag: 30
#> Max lag exceeds data available. Using max lag: 30


# In Sample
calibration_tbl %>%
    modeltime_residuals(training(splits)) %>%
    plot_modeltime_residuals(.type = "acf", .interactive = F)
#> Max lag exceeds data available. Using max lag: 274
#> Max lag exceeds data available. Using max lag: 274
#> Max lag exceeds data available. Using max lag: 274


# RESIDUALS - Seasonality

# Out of Sample
calibration_tbl %>%
    modeltime_residuals() %>%
    plot_modeltime_residuals(.type = "seasonality", .interactive = F)


# In Sample
calibration_tbl %>%
    modeltime_residuals(training(splits)) %>%
    plot_modeltime_residuals(.type = "seasonality", .interactive = F)

^{Created on 2020-08-24 by the reprex package (v0.3.0)}

mdancho84 commented 4 years ago

Closing this. Residuals are taken care of. :)

business-science / modeltime

Residual Diagnostics - Function to visualize residuals #22

Implementation

Improvements - Residuals & Accuracy