alkaline-ml / pmdarima

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
MIT License
1.59k stars 234 forks source link

Is there any way to get model parameters #244

Closed ThamaluM closed 4 years ago

ThamaluM commented 4 years ago


Versions (if necessary)

Is there anyway in an auto_arima model to get the list of parameters. (For example p number of AR parameters, Q number of MA parameters,...)

ThamaluM commented 4 years ago

There is params method I know. But I cannot find what value belongs to what in the result

tgsmith61591 commented 4 years ago

There are several ways. Here's an example.

>>> import pmdarima as pm
>>> y = pm.datasets.load_wineind()
>>> fit = pm.auto_arima(y, m=12)

As you know, you can get the params:

>>> fit.params()
array([-1.00733138e+02, -5.12265612e-01, -8.06198388e-02, -4.42994620e-01,
       -4.02451067e-01,  7.66264248e+06])

You can also access the AR and MA params. You will only have AR params if p is > 0, and you'll only have MA params if q is > 0. Here's our model order (1, 1, 2)x(0, 1, 1, 12), as we can see by just looking at the model object:

>>> fit
ARIMA(maxiter=50, method='lbfgs', order=(1, 1, 2), out_of_sample_size=0,
      scoring='mse', scoring_args=None, seasonal_order=(0, 1, 1, 12),
      start_params=None, suppress_warnings=False, trend=None,

To get AR params:

>>> fit.arparams()

To get MA params:

>>> fit.maparams()
array([-0.08061984, -0.44299462])

(Notice these are indices 1, 2 and 3, respectively in the params—index 0 is the intercept). If the model is seasonal, there are seasonal AR and MA params that are part of the params array. They aren't as easily accessible (but it's a known issue and we hope to make this easier in the future), and the same holds true as above: there will only be seasonal AR params if P > 0, and seasonal MA params if Q > 0.

In this case, there is no P, so as expected, we get an AttributeError:

>>> fit.arima_res_.seasonalarparams
AttributeError: 'SARIMAXResults' object has no attribute '_params_seasonal_ar'

But there is a Q:

>>> fit.arima_res_.seasonalmaparams

In the future, they'll be accessible via something like fit.seasonalarparams(), etc. Here's where I normally just suggest using the summary() method to understand which params are which:

>>> fit.summary()
<class 'statsmodels.iolib.summary.Summary'>
                                 Statespace Model Results
Dep. Variable:                                  y   No. Observations:                  176
Model:             SARIMAX(1, 1, 2)x(0, 1, 1, 12)   Log Likelihood               -1527.371
Date:                            Tue, 03 Dec 2019   AIC                           3066.742
Time:                                    07:14:06   BIC                           3085.305
Sample:                                         0   HQIC                          3074.278
                                            - 176
Covariance Type:                              opg
                 coef    std err          z      P>|z|      [0.025      0.975]
intercept   -100.7331     72.197     -1.395      0.163    -242.236      40.770
ar.L1         -0.5123      0.390     -1.312      0.189      -1.277       0.253
ma.L1         -0.0806      0.404     -0.200      0.842      -0.872       0.711
ma.L2         -0.4430      0.224     -1.978      0.048      -0.882      -0.004
ma.S.L12      -0.4025      0.054     -7.448      0.000      -0.508      -0.297
sigma2      7.663e+06    7.3e+05     10.495      0.000    6.23e+06    9.09e+06
Ljung-Box (Q):                       48.70   Jarque-Bera (JB):                21.57
Prob(Q):                              0.16   Prob(JB):                         0.00
Heteroskedasticity (H):               1.18   Skew:                            -0.61
Prob(H) (two-sided):                  0.54   Kurtosis:                         4.31

[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number 8.14e+14. Standard errors may be unstable.

So we can see that, in order, the params are: intercept, AR1, MA1, MA2, MA(seasonal)1, and sigma^2.

ThamaluM commented 4 years ago

@tgsmith61591 Thank you for answer. I suggest to implement a function such as it outputs all the parameters and another function that takes parameters as input (in the same format of the output of previous function) to create a SARIMA model.

Then it is easy to regenerate models from variables rather than serializing which takes about 2MB. I am constructing a scheduled online data mining unit. Storing set of parameters and regenerate takes less memory rather than storing model dumps.

tgsmith61591 commented 4 years ago

You can always pass the start_params argument, as outlined in the documentation. But there are a lot of instance attributes that are created on fit that are necessary to the functionality of the model, so there will likely never be a simple static function a la ARIMA.from_parameters. In the grand scope of things, deserializing and holding a 2MB model in memory should be a pretty low impact operation, and that's the standard for libraries like scikit-learn (which we try to hold to).