business-science / modeltime

Modeltime unlocks time series forecast models and machine learning in one framework
https://business-science.github.io/modeltime/
Other
522 stars 79 forks source link

Arima parameters #13

Closed mpquast closed 4 years ago

mpquast commented 4 years ago

Hi there! First, I must say I´m loving modeltime! Really great workflow for forecasting! I have one question (kind of more philosophical) and one suggestion (if it makes sense).

  1. When trainning with auto.arima, I get a model ARIMA(1,23), for example. Then, when refitting, these parameters may change. Wouldn´t it make more sense to keep the same parameters (in my example, 1,2,3), and recalculate just the coefficients? My rationale is that those are the parameters selected during trainning, the same way you select the best number of trees when using xgboost (and keep this number when refitting).
  2. As for my suggestion, in my team, we have found that it is interesting to fit multiple ARIMA models using different lengths of the time series as train set, and select the best length. Using modeltime, I can easily do that with _time_seriessplit, but, when refitting the model, the whole series is used. It would be interesting that _modeltimerefit could use the same length of the time series that was used during trainning (or maybe it is already possible, and I just can´t see how - I apologise if that´s the case).

Thanks!

mdancho84 commented 4 years ago

Hey @mpquast

Thanks for reaching out. I'm excited to hear you are loving modeltime.

Auto ARIMA Refitting

Regarding the Auto ARIMA fitting & refitting with parameters - I would love to include an option to refit with the same model. But, that's actually not how the forecast::auto.arima() is set up. It's also probably best to use the model it selects after refitting on the full data set, since this is usually going to perform better.

Here's the issue: When you select a parameter in arima_reg(nonseasonal_ar = 2) with set_engine("auto.arima"), this nonseasonal_ar parameter gets mapped to auto.arima(pmax), which is a maximum value. Then when you fit, auto.arima cycles through lags up to pmax for the AR term. So it's not a guarantee you'll get a P=2 model.

Also, Prophet has the same issues. It's an automated model so the parameters are internally selected (e.g. cutpoints). This can and will adapt to the time series, which means refitting results in a different set of points.

Proposed Solution - set_engine("arima")

You can always add a second model using arima_reg() with set_engine("Arima"). Then specify the model you want using arima_reg() parameters nonseasonal_ar, nonseasonal_difference, etc. This will guarantee that you control the model you get. You can use hyperparameter tuning to tune the model, then the refitting process will use the best set of parameters on the training set. Just keep in mind that tuning over 6 ARIMA parameters is non-trivial and will take a while.

mdancho84 commented 4 years ago

Refit with Shorter Window - Solution

Regarding the second comment, you and your team would like to modeltime_refit() with a shorter span than the full data.

This is relatively straightforward.

Just select the date range, then pass tomodeltime_refit(data)`. You'll then refit on a smaller time-frame.

image

mpquast commented 4 years ago

Thanks a lot for your remarks, Matt! Regarding the shorter span for refitting, as I understand, if I´m working with a model table, and refitting those models with modeltime_refit, the argument "data" will be the same to all models, since it is a tibble, rigth? So, it wouldn´t be possible to have models with different spans on the same table. Not a big problem, really, just trying to understand better.

mdancho84 commented 4 years ago

OK, I misunderstood.

When using modeltime_refit(), will the modeltime_table() will be refit using the same data for all models? Yes - This is how I set it up. If you want to vary the refitting data, I recommend just saving multiple results from modeltime_refit().

mdancho84 commented 4 years ago

Marking this as completed - I have a more descriptive method for displaying model parameters that get updated after refitting.

image