ellisp / forecastHybrid

Convenient functions for ensemble forecasts in R combining approaches from the {forecast} package
GNU General Public License v3.0
79 stars 23 forks source link

examples of functions and weights specification #62

Closed savvaskef closed 7 years ago

savvaskef commented 7 years ago

Hi, I think the forecastHybrid object is a nice addition to forecast package. I think that vignetes are very to the point and undestanable and show you the usage of the library.However, if you consider the manual they are very limited. Can you provide examples in the form of a link or by replying to this post?

Specifically i am intrigued about accuracy and cross validations.How do they work?I also tested fits with equal weight and insample.errors weights. The insample.error produced a batter(fitter) result...why do you want to deprecate them?What exactly is insample.errors?Are they calculated for each item or for the whole timeseries?Couldn't you weight the constituents of forecasthybrid with the rsquares?

dashaub commented 7 years ago

insample errors are calculated for each model comparing the model's fitted values against the actual time series. The errors--calculated by the specified errorMethod parameter--are then used to give more weight to accurate models and less to inaccurate models. Using the R squared has problems (such as overfitting as discussed below). Using the errors achieves the same goal but allows additional flexibility in the choice of error method. Using MSE for the errorMethod would yield R squared, and "RMSE" would be quite similar.

Using insample errors will produce a model that minimizes errors in the training set--and therefore results that look good for the data you fit the model to--but the accuracy will be worse in the new period that you are forecasting. Essentially the problem is overfitting. Since equal weights usually beat insample errors, there is really very little reason to keep them since they can be deceiving. Any depreciation will mean that they will still be available but not exposed as a default choice.

Since we are usually most interested in producing forecasts with optimal accuracy in the future period, we want to minimize the errors here instead of in the training set. Cross validation is a technique for estimating this error so we can choose an appropriate model. As a downside, it is more computationally expensive. Furthermore, the cv process must be modified for time series data, but Rob Hyndman has this and this explaining the process. As you can see, there are several varieties of cross validation for time series, and the cvts() function allows you to control the necessary parameters as desired.

Since running cv on time series data can be complicated and much slower, using equal weights is convenient and very fast. Empirical results show they perform very well for future forecasts, certainly beating insample errors and possibly cv errors in cases.

savvaskef commented 7 years ago

what do you have to say about CIs? could they be approached by something like wighted average instead of min(models) and max(models).Using the latter causes the min not to be symetrical to max which can be a spec asked for, onour hybrid mode

dashaub commented 7 years ago

Our approach of min/max of the model's isn't the standard statistical approach for modeling since it is more conservative, but on real time series it seems to perform more realistically. Peter has written a few blog posts on the issue and our approach at this. He also has a pptx presentation on the topic. All of this said, this is a behavior we're looking at addressing and possibly changing (see issue #19 and #37 ).

For now, if you don't like the behavior you could compute the prediction intervals yourself manually using whatever weighting methodology you please since the hybridModel object and all forecasts created from it contain all of the individual component models.