Open vspinu opened 11 months ago
Hey @vspinu, thanks for using statsforecast. The forecast method is designed to be more memory efficient by returning only the forecasted values. If you're interested in seeing the models attributes you should use fit + predict.
For CV it's the same case, it's designed to just return the forecasts in order to evaluate the models performance. If you want the attributes you can also compute the splits manually and run fit + predict for each fold.
I'll take a look at what we can do to allow you to save the fitting and forecasting times.
Thanks @jmoralez. Fit+predict is surely an option but it would require fitting the models twice. Also there are some implementation differences between fit+predict and forecast (ex. progress bar, fallback model).
I wonder if some consistent abstraction of parameters is warranted more generally. Is there currently a way to fit, say AutoETS, retrieve and store the parameters without storing the AutoETS object itself, and finally recreate the AutoETS from the parameters?
Why would you need to fit the models twice? In your use case you said you wanted to inspect the parameters of the fitted models, this only requires fitting once.
About restoring a model, the parameters vary a lot between the different models, we currently don't have a consistent way to save/retrieve these, but I think it's something we could have on the roadmap. Depending on how you're using the library you currently have a couple of options:
from statsforecast import StatsForecast
from statsforecast.models import AutoETS
from statsforecast.utils import AirPassengersDF
sf = StatsForecast(models=[AutoETS(season_length=12)], freq='D') sf.fit(df=AirPassengersDF)
learnedmodel = sf.fitted[0, 0].model_['components'] single_ets = AutoETS(season_length=12, model=learned_model[:3], damped=learned_model[3] != 'N') single_ets.fit(AirPassengersDF['y'].values) forecasts = single_ets.predict(h=12, level=[80])
2. Using the ETS functions directly, recomputes only residuals and some statistics
```python
from statsforecast import StatsForecast
from statsforecast.ets import ets_f, forecast_ets
from statsforecast.models import AutoETS
from statsforecast.utils import AirPassengersDF
# first fit finds the best model type
sf = StatsForecast(models=[AutoETS(season_length=12)], freq='D')
sf.fit(df=AirPassengersDF)
# fitted_ is of shape n_series, n_models
fitted_model = sf.fitted_[0, 0].model_
# use the learned params & state
learned_params = {k: v for k, v in fitted_model.items() if k in ('components', 'par', 'm', 'fit', 'n_params')}
single_ets = ets_f(AirPassengersDF['y'].values, m=12, model=learned_params)
forecasts = forecast_ets(single_ets, h=12, level=[80])
Please let us know if this helps.
Why would you need to fit the models twice?
Once for forecasting and once to get the parameters from fit
, or once for CV and once for parameters. I am a bit confused and don't know all the details regarding the redundancy between fit+predict/predict_in_sample
vs forecast (with or without fitted=True
). I guess it should be possible to get the forecast and even CV by myself from the fit objects. Then yes, one fit would be enough.
Please let us know if this helps.
It does, but both approaches require dealing with internals to some extent, and are not exactly "user-friendly" or generic. Given the huge number of time series in real-life scenarios one would ideally be able store the parameters in a database and would re-create the models on the fly in prediction or monitoring applications. In any case, not a big deal. Feel free to close this one if not considered of great importance.
Sorry for the confusion, the overview is:
fit
: learns params.predict
: uses the params learned during fit to compute future predictions.predict_in_sample
: this is used to compute the predictions for the training set. This requires more work and thus is disabled by default. By setting fitted=True
these are saved during the fit step.forecast
: learns params and compute future predictions, returns only the predictions. This is designed to be more memory efficient, for example in a distributed setting where sending big objects back and forth is expensive.I agree with you on the second point. We're working towards making deployments easier and more efficient, as a first step we're trying to reduce the dependencies so that the size of the library is smaller (#509, #596, #631). We can address having a way to easily save/load models as a next step.
Description
Currently the models' fit during forecasting and crossvalidation is lost. Would be nice to have a way to preserve the optimal parameters of the model.
One way to implement this is to make the forecast method return the fitted parameters along other metadata. For example it could be a
meta
slot of theresults
objects except the vector outputs (cols_m
,fitted
,mean
etc).Same
meta
slot could be used for internal metadata, for example time taken for fitting/forecasting per model, a very useful comparison metric which, to the best of my knowledge, is not easy to retrieve in the current setup.Use case