Nixtla / statsforecast

Lightning ⚡️ fast forecasting with statistical and econometric models.
https://nixtlaverse.nixtla.io/statsforecast
Apache License 2.0
4.03k stars 287 forks source link

REPL for fitted model #569

Open david-waterworth opened 1 year ago

david-waterworth commented 1 year ago

Description

Is there an api to print the model coefficients/aic/bic etc like R or statsmodels? I fitted an ARIMA model but the Nixtla classes don't appear to implement REPL support so its hard to identify the model parameters, for example

from statsforecast import StatsForecast
from statsforecast.models import ARIMA

arima = ARIMA(order=(1,0,0), seasonal_order=(0,1,1), season_length = 24, include_mean=False)
sf = StatsForecast(
    models=[arima],
    freq='H',
)

sf = sf.fit(df)

If I fit this model in statsmodels and generate the following report

                                      SARIMAX Results                                       
============================================================================================
Dep. Variable:                          temperature   No. Observations:               227022
Model:             SARIMAX(4, 0, 0)x(0, 1, [1], 24)   Log Likelihood             -177345.435
Date:                              Fri, 23 Jun 2023   AIC                         354702.871
Time:                                      05:11:08   BIC                         354764.866
Sample:                                           0   HQIC                        354721.018
                                           - 227022                                         
Covariance Type:                       Not computed                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1          1.1471        nan        nan        nan         nan         nan
ar.L2          0.0048        nan        nan        nan         nan         nan
ar.L3          0.0402        nan        nan        nan         nan         nan
ar.L4         -0.2412        nan        nan        nan         nan         nan
ma.S.L24       0.0564        nan        nan        nan         nan         nan
sigma2         0.2758        nan        nan        nan         nan         nan
===================================================================================
Ljung-Box (L1) (Q):                 205.00   Jarque-Bera (JB):           6612251.83
Prob(Q):                              0.00   Prob(JB):                         0.00
Heteroskedasticity (H):               0.23   Skew:                            -0.01
Prob(H) (two-sided):                  0.00   Kurtosis:                        29.44
===================================================================================

Warnings:
[1] Covariance matrix not calculated.

I've not been able to find similar functionality in statsforecast, after studying the code I found

sf.fitted_[0][0].model_

But this is a raw dict, I can see the AIC but I cannot see how to extract the coefficients - I'd assumed coef but they're zero?

{'coef': {'ar1': 0.0, 'ar2': 0.0, 'ar3': 0.0, 'ar4': 0.0, 'sma1': 0.0},
 'sigma2': 16.23453487354103,
 'var_coef': array([[1.96326501e-11, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
         0.00000000e+00],
        [0.00000000e+00, 1.96326501e-11, 0.00000000e+00, 0.00000000e+00,
         0.00000000e+00],
        [0.00000000e+00, 0.00000000e+00, 1.96326501e-11, 0.00000000e+00,
         0.00000000e+00],
        [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.96326501e-11,
         0.00000000e+00],
        [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
         1.96326632e-11]]),
 'mask': array([ True,  True,  True,  True,  True]),
 'loglik': -635037.9613148367,
 'aic': 1270087.9226296735,
 'arma': (4, 0, 0, 1, 24, 0, 1),
...
}

Use case

I need a standard way of generating the properties of a fitted model, similar to R and statsmodels

david-waterworth commented 1 year ago

The reason that all the coefficients are zero appears to be because I have NaN's in my dataset - removing them (with ffill) results in

{'coef': {'ar1': 0.8099046785691325,
  'ar2': -0.033508152860819373,
  'ar3': 0.22413016829097446,
  'ar4': -0.21262877451379128,
  'sma1': -0.9515628249609445},
 'sigma2': 0.6471134960291787,
 'var_coef': array([[ 2.81907572e-12, -2.02133764e-12,  1.91492792e-12,
          0.00000000e+00,  0.00000000e+00],
        [-2.76103420e-11,  3.49215011e-11, -9.27323490e-12,
          0.00000000e+00,  0.00000000e+00],
        [ 2.13012271e-11, -4.24520930e-11,  2.28853117e-11,
          0.00000000e+00,  0.00000000e+00],
        [-4.64671662e-12,  1.54557668e-11, -2.89979576e-11,
          1.76992626e-11,  0.00000000e+00],
        [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00,  1.94068897e-11]]),
 'mask': array([ True,  True,  True,  True,  True]),
 'loglik': -272723.5607908177,
 'aic': 545459.1215816354,
  ...

So I'd say this is a bug because it appears that unlike stats models, statsforcast doesn't appear to handle NaNs but it also doesn't seem to check and raise an error.

Also related it doesn't appear to have identified that the fit failed and the model is degenerate, statsmodels will report a convergence error in these cases. The aic is also significantly;t worse than statsmodel.