ellisp / forecastHybrid

Convenient functions for ensemble forecasts in R combining approaches from the {forecast} package
GNU General Public License v3.0
79 stars 23 forks source link

add thetaf as part of the ensemble? #50

Closed ellisp closed 7 years ago

ellisp commented 7 years ago

was there a reason we didn't include thetaf in the ensemble, even as an option? If we just didn't get around to it, I'm happy to put some belated work in to add it.

dashaub commented 7 years ago

The main reason I didn't include it is I don't use it frequently. Since thetaf doesn't produce a model object but rather a forecast object directly, we'd have to setup the model estimation in hybridModel and the forecasting in forecast.hybridModel somewhat differently for the theta model, and we would be pushing the model estimation and forecasting off at the time of forecasting as well. Essentially we'd probably be estimating the model twice: once in hybridModel to determine the weights/fitted values/residuals to store in the returned hybridModel object and once in forecast.hybridModel. It might be worth opening an issue over in "forecast" to see if Rob would be open to splitting the modeling/forecasting functionality out like in stlm and stlf.

It's been a while since I've read it, but Hyndman also shows in an IJF article "Unmasking the Theta method" that theta models can be represented as an ETS, so we might be capturing them already with ets in the ensemble. Since ets is a model search procedure, it wouldn't necessarily select the theta model, however, so the ensemble might benefit from the inclusion of thetaf.

ellisp commented 7 years ago

I've made a start at this in https://github.com/ellisp/forecastHybrid/tree/thetadev . I didn't notice your suggestion we ask Rob to split it out, which I've basically done in thetam and forecast.thetam. forecast(thetam(...) seems to get identical results to thetaf(...) (tests have been added).

The difference between Theta and ets() seems to be:

I'd be interested to see which of these makes it perform better than ets()! I suspect it's the drift.

More tests and checks needed before I try to merge into (what's your release strategy for a next version?).

dashaub commented 7 years ago

I haven't had a lot of time to work on this project recently and probably won't for the next few months, so the release question is up to you. The next major feature on my roadmap is parallel support, but I won't have time for that by the next release. That's the biggest remaining missing feature. There are a couple of major changes recently that could support a release (i.e. exporting accuracy.cvts() and the fix of prediction intervals for nnetar models). CRAN likes at least 1-2 months between package releases, so although we could release now with those updates, we might want to wait a little longer if you think you can get the theta model into working order by then.

ellisp commented 7 years ago

I'm nearly ready to merge this in. Do you have a preference for new models like theta about including them in the default models string? My inclination is not to, to preserve backwards compatibility and avoid gradual ballooning of the default,, and leave the default "aestn".

dashaub commented 7 years ago

Good question. Generally I strongly like to avoid breaking backwards compatibility, but I think it would ok in this instance for a couple of reasons.

  1. A call to hybridModel(inputSeries) would still run and not outright fail. The resulting model wouldn't be exactly what they thought they were getting, but a "soft failure" is much less egregious than a hard failure.
  2. Users can easily maintain the old behavior with models = "aenst". If I were running this in production I'd always specify the setting like this anyway.
  3. The package is still fairly young and without a huge userbase. API changes should be expected during development.
  4. Documenting the change in the NEWS should be sufficient, along with how to maintain the old behavior.
  5. Including the theta model will improve forecast robustness, and the model seems to run quickly too.
  6. We're extending the behavior, not restricting it. All of the model components will still be there if somebody interested with them before, e.g. $ets or $nnetar. Including theta would be somewhat analogous to the model search procedure for auto.arima() changing: the selected arima order and coefficients might change, but the user could interact with the Arima object in the same way as before.

Overall either approach wouldn't be horrible, but there are some advantages of including it. I do see your point about ballooning the defaults, and that definitely should be a concern, especially if the added models take a long time to run.

ellisp commented 7 years ago

I think this seems sensible. I think the point about how long it takes to run is an excellent one. For example, if we end up importing forecastxgbts I would be more reluctant to add that to the defaults, because it does take a while to run, at least compared to ets and theta.