Extract rolling forecasts and allow argument pass through

ganesh-krishnan commented 7 years ago

Currently additional parameters cannot be passed to the fitting function without creating custom models. eg. If you want to fit an ets model of type "MMM", you would have to create a custom function. This PR will pass through any additional params to the fitting function.

coveralls commented 7 years ago

Coverage remained the same at 98.067% when pulling 1a63ef6eb0eaa360cbb74d15ce18b756002496ed on ganesh-krishnan:master into 0ab57f7cd14783e129c34271970c0a7db7b0e6e3 on ellisp:master.

coveralls commented 7 years ago

Coverage remained the same at 98.067% when pulling 1a63ef6eb0eaa360cbb74d15ce18b756002496ed on ganesh-krishnan:master into 0ab57f7cd14783e129c34271970c0a7db7b0e6e3 on ellisp:master.

ganesh-krishnan commented 7 years ago

Added another commit with the following:

Added a helper function to combine sequential time series objects into a single ts object This is useful when you want a single TS object for the fit + forecast
Added a function to extract rolling forecasts This extracts the cross validates forecasts for rolling windows and allows them to be plotted/inspected
Added purrr as a dependency

dashaub commented 7 years ago

Customization to existing functions for FUN and FCFUN for cvts() can be achieved easily already with anonymous functions, e.g. cvts(AirPassengers, FUN = function(x) ets(x, model = "MMM")). Still passing additional arguments with ... seems like a good idea to include. Testing that these two approaches return the same results could be a good unit test.
The helper function combine_ts() looks quite useful as a general utility function for other purposes as well.
The unit tests can be simplified quite a bit by using an anonymous function for FUN and rwf() from the "forecast" package instead of defining a custom FUN = naive_forecast and FCFUN = forecast.naive_model, i.e. cv <- cvts(AirPassengers, FUN = function(x) rwf(x, drift = FALSE), rolling = TRUE, windowSize = 1, maxHorizon = 1). This approach seems better since we've used this model and forecast function elsewhere in unit tests. If we want to keep these custom functions, I'd rather add them to the whole package and @export them so they can be used more broadly and don't need to be defined in multiple places. EDIT: It looks like naive() already exists in the "forecast" package too, so we can just use that.
Expanding the cvts class with other generic methods (e.g. plot.cvts(), fitted.cvts(),etc) to make it more useful and a full-fledged useful class is an idea I have in mind, so your addition of extract_rolling_forecasts() is definitely welcomed. What do you think about renaming this to fitted.cvts()? Have any other idea in general for other methods to build out the class? What do you think about including an argument horizon to it to control which forecast horizon from each fold is used for returning the fitted values?
Add yourself as an author to these functions with the roxygen2 @author tag and to the DESCRIPTION file since you're contributing whole functions.
I'd like to avoid adding additional dependencies unless there is a compelling function that we need. It looks like the piping can be easily rewritten, and purr::map() and purrr:reduce() seem like they have decent replacements with base R functions Map() and Reduce(). I haven't reviewed this part too much, so if we really do need "purrr" let me know.
This isn't specific to your PR, but Travis isn't correctly reporting when package builds and tests fail now, so you'll manually need to check the actual Travis log to make sure the build succeeds (it looks like this one failed). I need to figure out why our build script isn't working. Somewhat related to this is unit test times. I've made an effort to speed up unit tests by using quick forecasting functions (e.g. thetam() and stlm()) and short time series, so if possible keep the tests quick. I don't immediately see anything that should be too slow, but the total package build time is starting to creep up again. This isn't a huge deal since we can always clean it up later but it makes development easier since you can iterate on builds quicker and ensure we can pass the CRAN runtime limits when we submit the package.

Thanks for your work on improving cvts().

ganesh-krishnan commented 7 years ago

Customization to existing functions for FUN and FCFUN for cvts() can be achieved easily already with anonymous functions, e.g. cvts(AirPassengers, FUN = function(x) ets(x, model = "MMM")). Still passing additional arguments with ... seems like a good idea to include. Testing that these two approaches return the same results could be a good unit test.

Good point. Didn't think of the anonymous function. Still feel that there is some value to allowing ..., since it might be cleaner for new users.

The helper function combine_ts() looks quite useful as a general utility function for other purposes as well

I'm thinking I'll rename combine_ts to tsCombine. In general I tend to use snake_case, but seems like the package is using camelCase. Let me know if you have any thoughts one way or the other. Otherwise I'll stick to camelCase from now on.

The unit tests can be simplified quite a bit by using an anonymous function for FUN and rwf() from the "forecast" package instead of defining a customFUN = naive_forecast and FCFUN = forecast.naive_model, i.e. cv <- cvts(AirPassengers, FUN = function(x) rwf(x, drift = FALSE), rolling = TRUE, windowSize = 1, maxHorizon = 1). This approach seems better since we've used this model and forecast function elsewhere in unit tests. If we want to keep these custom functions, I'd rather add them to the whole package and @export them so they can be used more broadly and don't need to be defined in multiple places. EDIT: It looks like naive() already exists in the "forecast" package too, so we can just use that.

Good points. I did try to use the forecast::naive() function. But it seemed to have trouble with a horizon of 1, so I just rolled my own. I'll take a look again and try to simplify the unit tests.

Expanding the cvts class with other generic methods (e.g. plot.cvts(),fitted.cvts(),etc) to make it more useful and a full-fledged useful class is an idea I have in mind, so your addition of extract_rolling_forecasts() is definitely welcomed. What do you think about renaming this to fitted.cvts()? Have any other idea in general for other methods to build out the class? What do you think about including an argument horizon to it to control which forecast horizon from each fold is used for returning the fitted values?

I completely agree. The reason I restricted it to rolling forecasts for now is that the cvts object doesn't currently store all the fit attributes. I'll implement a cv_forecasts.cvts() and fitted.cvts() method to extract the fitted values and the forecasts respectively. I'll also implement a plotting method plot.cvts(). For all this to work, we'll have to save the fit attributes, which I can also implement.

Add yourself as an author to these functions with the roxygen2 @author tag and to the DESCRIPTION file since you're contributing whole functions.

Thanks. Will do so.

I'd like to avoid adding additional dependencies unless there is a compelling function that we need. It looks like the piping can be easily rewritten, and purr::map() and purrr:reduce() seem like they have decent replacements with base R functions Map() and Reduce(). I haven't reviewed this part too much, so if we really do need "purrr" let me know.

We should definitely be able to use the base R functions. I generally tend to be disposed towards the tidyverse functions because they tend to be more consistent (eg. mtcars[1,] will return a vector), but I agree that we should try and minimize dependencies. I'll rewrite to use the base R Map() and Reduce() and eliminate the piping.

This isn't specific to your PR, but Travis isn't correctly reporting when package builds and tests fail now, so you'll manually need to check the actual Travis log to make sure the build succeeds (it looks like this one failed). I need to figure out why our build script isn't working. Somewhat related to this is unit test times. I've made an effort to speed up unit tests by using quick forecasting functions (e.g. thetam() and stlm()) and short time series, so if possible keep the tests quick. I don't immediately see anything that should be too slow, but the total package build time is starting to creep up again. This isn't a huge deal since we can always clean it up later but it makes development easier since you can iterate on builds quicker and ensure we can pass the CRAN runtime limits when we submit the package.

Ahhh. Got it. I'll click through and check going forward.

To summarize, I'll do the following:

[ ] Rename new functions to stick to camelCase conventions (1)
[ ] Eliminate pipes and purrr::map() and purrr::reduce() (2)
[ ] Simplify unit tests via anonymous functions (3)
[ ] Save fit characteristics to cvts object (4)
[ ] Create functions cv_forecasts.cvts(), cv_fitted.cvts(), plot.cvts() (5)
[ ] Add @author tag and add myself as author to the DESCRIPTION file (6)

Looks like it'll be a busy week at work. So it'll probably take me a bit to get these items done, especially 5.

dashaub commented 7 years ago

Great.

Feel free to submit PRs for individual parts of this if you like. And don't feel like you have to do 5 if that slows things down a lot. I haven't been able to come up with a satisfying way to do these, so ideas from others are definitely appreciated.

Regarding camelCase; yes that is my preference in keeping with the package style. Not a huge deal though since people follow different conventions in different R packages. Internal consistency is nice, however.

ganesh-krishnan commented 7 years ago

Closing for now. Will submit new PRs for items above as I get through them.

ellisp / forecastHybrid

Extract rolling forecasts and allow argument pass through #64