Validation Strategy and in/out of sample dataframes

stefanosh commented 4 years ago

Thank you for this interesting and promising project. I've been using it lately and I have a couple of quenstions :

How are the forecasts being generated in the in-sample and out-of-sample dataframes? As one-step ahead or multi-step ahead forecasts?
From my experiments I perceive the forecast_in dataframe as the forecasts generated on the validation set and the forecast_out dataframe as the forecasts generated on the test set, like the attached image. Do I get this right?
Is there a way to access the fitted values on the training set?

4) So, to sum up, is the modeling process as follows? Train the initial model on the training set, apply the specific hyperparameter optimization of each algorithm in the validation set (producing forecast_in df ) and make the final forecasts in the test set (producing the forecast_out df).

1_Nv2NNALuokZEcV6hYEHdGA

firmai commented 4 years ago

Simply put, the training data is never returned to the user the training data is used as both the training and validation set. The training portion is hardcoded at 70% from what I can recall. (in the future a parameter can be exposed to the user). The in-sample forecast (30%) is the test set, the out-of-sample forecast is the extrapolation without test data. Imagine predicting whether or not bankruptcies would occur without having labels to confirm it. This project needs an immense amount of attention. But thank you for appreciating it in its current form. Once some of my commitments ease up, I would turn it into something more accessible and light weight.

stefanosh commented 4 years ago

Thanks, It's clear now.

About the model fitting in the training data, Isn't there any value in knowing and returning the perfomance of the models in the training set?

And how is the validation being made for the in-sample forecasts? I mean are the forecasts produced all at once (i.e, multi-step ahead forecasts) or do you apply something like rolling/walk-forward validation (re-training and forecasting) for each next time step in the test set? I assume, the former is applied but would like to confirm.

Nice to hear about the future potential in the project, I could make some contributions as well.

firmai commented 4 years ago

About the model fitting in the training data, Isn't there any value in knowing and returning the perfomance of the models in the training set?

I guess it all depends on your purpose, would you mind looking into how one can return the training decompositions for GluonTS and AutoArima, both used in the package. I think I know how to do it for Prophet.

And how is the validation being made for the in-sample forecasts? I mean are the forecasts produced all at once (i.e, multi-step ahead forecasts) or do you apply something like rolling/walk-forward validation (re-training and forecasting) for each next time step in the test set? I assume, the former is applied but would like to confirm.

One multi-step forecast, the parameter called len is the amount of steps.

Your very helpful to help me consider how to approach the future course of this project. Please keep on playing and let me know if you come across additional steps that need clarifications, suggestions and what not. I would be happy to have someone on-board. Let me know if you want to take some of these development tasks onto your own shoulders. (I have got a list of ideas at the end of the package)

Best, Derek

stefanosh commented 4 years ago

That's good to hear. It's a really nice work so far and as you said there are of course a lot of ideas and much open space for constant improvement and enchancement of the project.

At this moment I'm experimenting with a certain use case that might lead to publishing a research paper, probably using also Atspy, so it could be cited as well.

This period the work/personal schedule is a bit hectic, so I will get back to you when it's more convenient for contributing.

Regards, Stefanos

firmai / atspy

Validation Strategy and in/out of sample dataframes #13