Nixtla / statsforecast

Lightning ⚡️ fast forecasting with statistical and econometric models.
https://nixtlaverse.nixtla.io/statsforecast
Apache License 2.0
3.84k stars 266 forks source link

[Models] Return and store models parameters during forecast and CV #639

Open vspinu opened 11 months ago

vspinu commented 11 months ago

Description

Currently the models' fit during forecasting and crossvalidation is lost. Would be nice to have a way to preserve the optimal parameters of the model.

One way to implement this is to make the forecast method return the fitted parameters along other metadata. For example it could be a meta slot of the results objects except the vector outputs (cols_m, fitted, mean etc).

Same meta slot could be used for internal metadata, for example time taken for fitting/forecasting per model, a very useful comparison metric which, to the best of my knowledge, is not easy to retrieve in the current setup.

Use case

jmoralez commented 11 months ago

Hey @vspinu, thanks for using statsforecast. The forecast method is designed to be more memory efficient by returning only the forecasted values. If you're interested in seeing the models attributes you should use fit + predict.

For CV it's the same case, it's designed to just return the forecasts in order to evaluate the models performance. If you want the attributes you can also compute the splits manually and run fit + predict for each fold.

I'll take a look at what we can do to allow you to save the fitting and forecasting times.

vspinu commented 11 months ago

Thanks @jmoralez. Fit+predict is surely an option but it would require fitting the models twice. Also there are some implementation differences between fit+predict and forecast (ex. progress bar, fallback model).

I wonder if some consistent abstraction of parameters is warranted more generally. Is there currently a way to fit, say AutoETS, retrieve and store the parameters without storing the AutoETS object itself, and finally recreate the AutoETS from the parameters?

jmoralez commented 11 months ago

Why would you need to fit the models twice? In your use case you said you wanted to inspect the parameters of the fitted models, this only requires fitting once.

About restoring a model, the parameters vary a lot between the different models, we currently don't have a consistent way to save/retrieve these, but I think it's something we could have on the roadmap. Depending on how you're using the library you currently have a couple of options:

  1. Extracting the model type to avoid searching for it again.
    
    from statsforecast import StatsForecast
    from statsforecast.models import AutoETS
    from statsforecast.utils import AirPassengersDF

first fit finds the best model type

sf = StatsForecast(models=[AutoETS(season_length=12)], freq='D') sf.fit(df=AirPassengersDF)

fitted_ is of shape n_series, n_models

learnedmodel = sf.fitted[0, 0].model_['components'] single_ets = AutoETS(season_length=12, model=learned_model[:3], damped=learned_model[3] != 'N') single_ets.fit(AirPassengersDF['y'].values) forecasts = single_ets.predict(h=12, level=[80])

2. Using the ETS functions directly, recomputes only residuals and some statistics
```python
from statsforecast import StatsForecast
from statsforecast.ets import ets_f, forecast_ets
from statsforecast.models import AutoETS
from statsforecast.utils import AirPassengersDF

# first fit finds the best model type
sf = StatsForecast(models=[AutoETS(season_length=12)], freq='D')
sf.fit(df=AirPassengersDF)
# fitted_ is of shape n_series, n_models
fitted_model = sf.fitted_[0, 0].model_
# use the learned params & state
learned_params = {k: v for k, v in fitted_model.items() if k in ('components', 'par', 'm', 'fit', 'n_params')}
single_ets = ets_f(AirPassengersDF['y'].values, m=12, model=learned_params)
forecasts = forecast_ets(single_ets, h=12, level=[80])

Please let us know if this helps.

vspinu commented 11 months ago

Why would you need to fit the models twice?

Once for forecasting and once to get the parameters from fit, or once for CV and once for parameters. I am a bit confused and don't know all the details regarding the redundancy between fit+predict/predict_in_sample vs forecast (with or without fitted=True). I guess it should be possible to get the forecast and even CV by myself from the fit objects. Then yes, one fit would be enough.

Please let us know if this helps.

It does, but both approaches require dealing with internals to some extent, and are not exactly "user-friendly" or generic. Given the huge number of time series in real-life scenarios one would ideally be able store the parameters in a database and would re-create the models on the fly in prediction or monitoring applications. In any case, not a big deal. Feel free to close this one if not considered of great importance.

jmoralez commented 11 months ago

Sorry for the confusion, the overview is:

I agree with you on the second point. We're working towards making deployments easier and more efficient, as a first step we're trying to reduce the dependencies so that the size of the library is smaller (#509, #596, #631). We can address having a way to easily save/load models as a next step.