ellisp / forecastHybrid

Convenient functions for ensemble forecasts in R combining approaches from the {forecast} package
GNU General Public License v3.0
79 stars 23 forks source link

cvts? #63

Closed savvaskef closed 7 years ago

savvaskef commented 7 years ago

where can i find a tutorial/vignette/example about cvts?I know there is an example in the manual but i did nor understand how to use cvmod1 (fully packed with properties)constituents.Can you provide a link or an example.And a request on behalf of wide range of i would like to suggest a methodology. Why not building an example from easy to complex progressively adding parameters for the functions and comments on background needed(for illustration I am not very comfortable with logarithms and they appear as parametes everywhere...shouldn't there be a short definition of how they ae used , ie their properties related to the example)

Thnx again on behalf of may "students"

dashaub commented 7 years ago

A vignette for cvts() would definitely be a good addition to add. I recently added some better clarification of the documentation on the Github package version and will be releasing an update to CRAN within the next few weeks. I'll add some more clarification comments of the existing examples as well.

Until this lands on CRAN, hopefully these clarification will help:

One of the most interesting things to do is run accuracy() on the cvts object. This can be used to compare the accuracy of several forecasting methods on the time series.

accuracy(cvts(AirPassengers, FUN = thetam))
accuracy(cvts(AirPassengers, FUN = stlm))

Not sure what you mean about logarithms appearing everywhere.

ganesh-krishnan commented 7 years ago

I wonder if we should add the characteristics of the cvts call itself to the cvts object. Was the object a rolling fit? What was the maxHorizon? What was the windowSize? etc.

dashaub commented 7 years ago

Good idea, I'll save those in the object.

savvaskef commented 7 years ago

is it possible for you to clarify a couple of terms for all those that do not understand what the algorithm does? for example 1)what are folds? 2)what is a rolling fit (and consequently what is windowsize and maxhorizon) ?

a description of the algorithm would be very helpful(I bet you can explain in a couple of paragraphs all of those but it is missing from the manual)

savvaskef commented 7 years ago

also related seems to be the cv.errors in forecastHybrid...what is the statistic according to which different models are weighted?is it rolling or for the whole series?

ganesh-krishnan commented 7 years ago

@savvaskef the documentation appears plenty clear enough. Regardless, here is some clarification.

Edit: apologies, looks like the documentation was edited recently

Regarding folds, this is not as relevant to time series cross validation and is only meant to serve as an analogy to regular cross validation (non time-series data). In non time-series data, if you perform k-fold cross validation, you will split the dataset into k partitions or folds. For each fold, you will train the model on the other folds and test on the current fold. As an example, let us say you are performing 5 fold cross validation. Then the dataset will be split into folds 1-5. Fold 1 will be held out and folds 2-5 will be used to fit the model. This process will be repeated for fold 2-5. For fold 2, folds 1, 3, 4, 5 will be used for model fitting.

In non time series data, each of the rows or cases or independent of each other and can thus be sampled independently. This is however, not the case for time series models. Model fitting for time series depends on the observations being sequential. In order to get an idea of the generalization performance of a time series model, the sampling has to be in line with the model fitting procedure. Two ways to do this are to use rolling (or sliding) windows and non-rolling windows. You can read up about it here. It also has nice diagrams to help you understand the procedures.

Regarding horizons, this is standard forecasting terminology. A simple google search led me to this link. I suggest you spend some time reading up on time series since you seem to have some very basic questions.

dashaub commented 7 years ago

@savvaskef Glad to get feedback on the clarity of this function and how to improve it. When writing the documentation one of my concerns was that it may not be clear to others. I think a vignette could help here a lot, particularly one that includes the type of graphics in Rob's blog post. I'll add this to the package roadmap. @ganesh-krishnan is right that you should probably read up on the terminology outside of just the cvts documentation. The documentation does assume some existing knowledge and probably wouldn't be comprehensible without it. That said, I chose default values here for cvts() that should be sensible for most decently-long time series if you just want to use cross validation in hybridModel() with errorMethod = "cv.errors". I can also suggest just playing around with it with different input time series and values of the maxHorizon, rolling, windowSize parameters and examining the resulting cvts object to see what is going on. Even examining the code for the function could help. The source is all available here after all, so you can see exactly what it is doing.

dashaub commented 7 years ago

Related #66

russellcameronthomas commented 6 years ago

@dashaub If you are writing a vignette for cvts(), I would like you to include examples where you show how to cross validate a hybrid model that has been previously fit using hybridModel(), where model weights have been generated and special arguments are sent to some component models.

It seems obvious that there should be a separate function to cross validate a hybrid model after it is fit. Am I missing something????

I know that there is the cv.errors option in hybridModel(), but I can't get it to work successfully when I include the stlm model because I get the error "series is not periodic or has less than two periods". I've tried everything I can to eliminate this error, with no success. I can successfully run stlm() separately on the same data and the same parameters.


General comment: your package is very good and very useful. Generally the documentation is good (at least better than most). But like nearly all R packages, even better documentation is, by far, the best way to improve usability and popularity.