Using beste model for future predictions

MislavSag commented 4 years ago

Hi @AdrianAntico,

I have just tried AutoBanditSarima functin on hourly data. Everything works fine. This is my best model:

      DataSetName BoxCox IncludeDrift SeasonalDifferences SeasonalMovingAverages SeasonalLags MaxFourierTerms Differences MovingAverages Lags BiasAdj
1: ModelFrequency   skip        FALSE                   0                      0            1               3           1              4    0   FALSE
                    GridName Train_MSE Train_MAE  Train_MAPE Validate_MSE Validate_MAE Validate_MAPE Blended_MSE Blended_MAE Blended_MAPE
1: StratifyParsimonousGrid_4 0.4708038 0.2543165 0.002108957    0.3804573    0.4921519   0.002360596   0.4256306   0.3732342  0.002234776
   BanditProbs_ParsimonousGrid BanditProbs_RandomGrid BanditProbs_StratifyParsimonousGrid_1 BanditProbs_StratifyParsimonousGrid_2
1:                        0.08                   0.01                                  0.08                                  0.15
   BanditProbs_StratifyParsimonousGrid_3 BanditProbs_StratifyParsimonousGrid_4 BanditProbs_StratifyParsimonousGrid_5 BanditProbs_StratifyParsimonousGrid_6
1:                                  0.15                                  0.08                                  0.08                                  0.08
   BanditProbs_StratifyParsimonousGrid_7 BanditProbs_StratifyParsimonousGrid_8 BanditProbs_StratifyParsimonousGrid_9 BanditProbs_StratifyParsimonousGrid_10
1:                                  0.08                                  0.08                                  0.08                                   0.08
         RunTime ModelRankByDataType Mode
lRank ModelRunNumber
1: 2.083744 mins

The question is, how can I use this model in the future, for the prediction? I have parameters here, but from which package is the main function?

AdrianAntico commented 4 years ago

@MislavSag That's a good question - The way I've done forecasting in the past is to rebuild the model when new data is available. So when you run the function you will get the actual forecast along with the winning parameters which are intended to help you set the function arguments going forward. If you go into the file EconometricFunctions.R you can look under the hood. Check out lines 1296 to 1327

AdrianAntico commented 4 years ago

@MislavSag I just added this to the help file for the function as well.

DataSetName - ModelFrequency means that I used forecast::findfrequency() to define the periodicity of the time series data (versus user-supplied)
BoxCox - "skip" means I didn't use it
IncludeDrift - TRUE or FALSE in forecast::Arima()
SeasonalDifferences - 0, 1, 2, ... Set to 0 by default as values > 0 can cause model runs to take significantly longer depending on the size of the data
SeasonalMovingAverages - Q in Arima(p,d,q)(P,D,Q)
SeasonalLags - P in Arima(p,d,q)(P,D,Q)
MaxFourierTerms - used in xreg argument in Arima
Differences - d in Arima(p,d,q)(P,D,Q)
MovingAverages - q in Arima(p,d,q)(P,D,Q)
Lags - p in Arima(p,d,q)(P,D,Q)
BiasAdj - TRUE if BoxCox isn't "skip "12. GridName - ID for set of function arguments that are treated like hyperparameters
Train_MSE - MSE of the training data fit
Train_MAE - MAE of the training data fit
Train_MAPE - MAPE of the training data fit
Validate_MSE - MSE of the validation data fit
Validate_MAE - MAE of the validation data fit
Validate_MAPE - MAPE of the validation data fit
Blended_MSE - MSE weighted by the TrainWeighting argument so that the Blended MSE = TrainWeighting Train_MSE + (1 - TrainWeighting) Validate_MSE
Blended_MAE - like above
Blended_MAPE - like above

Non overlapping set of Arima arguments in order of increasing sophistication

BanditProbs_StratifyParsimonousGrid_3
BanditProbs_StratifyParsimonousGrid_4
BanditProbs_StratifyParsimonousGrid_5
BanditProbs_StratifyParsimonousGrid_6
BanditProbs_StratifyParsimonousGrid_7
BanditProbs_StratifyParsimonousGrid_8
BanditProbs_StratifyParsimonousGrid_9
BanditProbs_StratifyParsimonousGrid_10
RunTime - Time taken to build the model using the set of arguments
ModelRankByDataType - There are 4 data types: user-supplied frequency or not (2) and forecast::tsclean() or not (2)
ModelRank - the rank of the model based on the Blended_xxx measure
ModelRunNumber - The order that the model was run

MislavSag commented 4 years ago

@AdrianAntico ,

sou you fit AutoBanditSarima whenever new observation comes in? This is a good approach if the frequency is low. But if I have, let's say one-minute data and big table, I don't have time to recalculate it every minute.

It would be great feature if the final (best) model would be part of the output. Or, if you can provide a function that contains parameters from the best model. I looked at the source code. I see auto.arima and Arima functions. But they don't contain all parameters from Performancegrid.

Explanations of PerformanceGrid columns are very helpful.

AdrianAntico commented 4 years ago

@MislavSag Thanks for the response. I think this is a solid use case. I'm going to reopen and tag it as a feature enhancement!

AdrianAntico commented 3 years ago

@MislavSag You can now save the model and xregs to file by supplying a path to the FilePath args. Sorry for the delay on this one. I had to do quite a bit of work to get these ones to run smoothly.

AdrianAntico / AutoQuant

Using beste model for future predictions #56

Non overlapping set of Arima arguments in order of increasing sophistication