AdrianAntico / AutoQuant

R package for automation of machine learning, forecasting, model evaluation, and model interpretation
GNU Affero General Public License v3.0
235 stars 43 forks source link

Using beste model for future predictions #56

Closed MislavSag closed 3 years ago

MislavSag commented 4 years ago

Hi @AdrianAntico,

I have just tried AutoBanditSarima functin on hourly data. Everything works fine. This is my best model:

      DataSetName BoxCox IncludeDrift SeasonalDifferences SeasonalMovingAverages SeasonalLags MaxFourierTerms Differences MovingAverages Lags BiasAdj
1: ModelFrequency   skip        FALSE                   0                      0            1               3           1              4    0   FALSE
                    GridName Train_MSE Train_MAE  Train_MAPE Validate_MSE Validate_MAE Validate_MAPE Blended_MSE Blended_MAE Blended_MAPE
1: StratifyParsimonousGrid_4 0.4708038 0.2543165 0.002108957    0.3804573    0.4921519   0.002360596   0.4256306   0.3732342  0.002234776
   BanditProbs_ParsimonousGrid BanditProbs_RandomGrid BanditProbs_StratifyParsimonousGrid_1 BanditProbs_StratifyParsimonousGrid_2
1:                        0.08                   0.01                                  0.08                                  0.15
   BanditProbs_StratifyParsimonousGrid_3 BanditProbs_StratifyParsimonousGrid_4 BanditProbs_StratifyParsimonousGrid_5 BanditProbs_StratifyParsimonousGrid_6
1:                                  0.15                                  0.08                                  0.08                                  0.08
   BanditProbs_StratifyParsimonousGrid_7 BanditProbs_StratifyParsimonousGrid_8 BanditProbs_StratifyParsimonousGrid_9 BanditProbs_StratifyParsimonousGrid_10
1:                                  0.08                                  0.08                                  0.08                                   0.08
         RunTime ModelRankByDataType Mode
lRank ModelRunNumber
1: 2.083744 mins       

The question is, how can I use this model in the future, for the prediction? I have parameters here, but from which package is the main function?

AdrianAntico commented 4 years ago

@MislavSag That's a good question - The way I've done forecasting in the past is to rebuild the model when new data is available. So when you run the function you will get the actual forecast along with the winning parameters which are intended to help you set the function arguments going forward. If you go into the file EconometricFunctions.R you can look under the hood. Check out lines 1296 to 1327

AdrianAntico commented 4 years ago

@MislavSag I just added this to the help file for the function as well.

  1. DataSetName - ModelFrequency means that I used forecast::findfrequency() to define the periodicity of the time series data (versus user-supplied)
  2. BoxCox - "skip" means I didn't use it
  3. IncludeDrift - TRUE or FALSE in forecast::Arima()
  4. SeasonalDifferences - 0, 1, 2, ... Set to 0 by default as values > 0 can cause model runs to take significantly longer depending on the size of the data
  5. SeasonalMovingAverages - Q in Arima(p,d,q)(P,D,Q)
  6. SeasonalLags - P in Arima(p,d,q)(P,D,Q)
  7. MaxFourierTerms - used in xreg argument in Arima
  8. Differences - d in Arima(p,d,q)(P,D,Q)
  9. MovingAverages - q in Arima(p,d,q)(P,D,Q)
  10. Lags - p in Arima(p,d,q)(P,D,Q)
  11. BiasAdj - TRUE if BoxCox isn't "skip "12. GridName - ID for set of function arguments that are treated like hyperparameters
  12. Train_MSE - MSE of the training data fit
  13. Train_MAE - MAE of the training data fit
  14. Train_MAPE - MAPE of the training data fit
  15. Validate_MSE - MSE of the validation data fit
  16. Validate_MAE - MAE of the validation data fit
  17. Validate_MAPE - MAPE of the validation data fit
  18. Blended_MSE - MSE weighted by the TrainWeighting argument so that the Blended MSE = TrainWeighting Train_MSE + (1 - TrainWeighting) Validate_MSE
  19. Blended_MAE - like above
  20. Blended_MAPE - like above

Non overlapping set of Arima arguments in order of increasing sophistication

  1. BanditProbs_StratifyParsimonousGrid_3

  2. BanditProbs_StratifyParsimonousGrid_4

  3. BanditProbs_StratifyParsimonousGrid_5

  4. BanditProbs_StratifyParsimonousGrid_6

  5. BanditProbs_StratifyParsimonousGrid_7

  6. BanditProbs_StratifyParsimonousGrid_8

  7. BanditProbs_StratifyParsimonousGrid_9

  8. BanditProbs_StratifyParsimonousGrid_10

  9. RunTime - Time taken to build the model using the set of arguments

  10. ModelRankByDataType - There are 4 data types: user-supplied frequency or not (2) and forecast::tsclean() or not (2)

  11. ModelRank - the rank of the model based on the Blended_xxx measure

  12. ModelRunNumber - The order that the model was run

MislavSag commented 4 years ago

@AdrianAntico ,

sou you fit AutoBanditSarima whenever new observation comes in? This is a good approach if the frequency is low. But if I have, let's say one-minute data and big table, I don't have time to recalculate it every minute.

It would be great feature if the final (best) model would be part of the output. Or, if you can provide a function that contains parameters from the best model. I looked at the source code. I see auto.arima and Arima functions. But they don't contain all parameters from Performancegrid.

Explanations of PerformanceGrid columns are very helpful.

AdrianAntico commented 4 years ago

@MislavSag Thanks for the response. I think this is a solid use case. I'm going to reopen and tag it as a feature enhancement!

AdrianAntico commented 3 years ago

@MislavSag You can now save the model and xregs to file by supplying a path to the FilePath args. Sorry for the delay on this one. I had to do quite a bit of work to get these ones to run smoothly.