alkaline-ml / pmdarima

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
https://www.alkaline-ml.com/pmdarima
MIT License
1.59k stars 234 forks source link

Is there any way to get model parameters #244

Closed ThamaluM closed 4 years ago

ThamaluM commented 4 years ago

Question

Versions (if necessary)

Is there anyway in an auto_arima model to get the list of parameters. (For example p number of AR parameters, Q number of MA parameters,...)

ThamaluM commented 4 years ago

There is params method I know. But I cannot find what value belongs to what in the result

tgsmith61591 commented 4 years ago

There are several ways. Here's an example.

>>> import pmdarima as pm
>>> y = pm.datasets.load_wineind()
>>> fit = pm.auto_arima(y, m=12)

As you know, you can get the params:

>>> fit.params()
array([-1.00733138e+02, -5.12265612e-01, -8.06198388e-02, -4.42994620e-01,
       -4.02451067e-01,  7.66264248e+06])

You can also access the AR and MA params. You will only have AR params if p is > 0, and you'll only have MA params if q is > 0. Here's our model order (1, 1, 2)x(0, 1, 1, 12), as we can see by just looking at the model object:

>>> fit
ARIMA(maxiter=50, method='lbfgs', order=(1, 1, 2), out_of_sample_size=0,
      scoring='mse', scoring_args=None, seasonal_order=(0, 1, 1, 12),
      start_params=None, suppress_warnings=False, trend=None,
      with_intercept=True)

To get AR params:

>>> fit.arparams()
array([-0.51226561])

To get MA params:

>>> fit.maparams()
array([-0.08061984, -0.44299462])

(Notice these are indices 1, 2 and 3, respectively in the params—index 0 is the intercept). If the model is seasonal, there are seasonal AR and MA params that are part of the params array. They aren't as easily accessible (but it's a known issue and we hope to make this easier in the future), and the same holds true as above: there will only be seasonal AR params if P > 0, and seasonal MA params if Q > 0.

In this case, there is no P, so as expected, we get an AttributeError:

>>> fit.arima_res_.seasonalarparams
AttributeError: 'SARIMAXResults' object has no attribute '_params_seasonal_ar'

But there is a Q:

>>> fit.arima_res_.seasonalmaparams
array([-0.40245107])

In the future, they'll be accessible via something like fit.seasonalarparams(), etc. Here's where I normally just suggest using the summary() method to understand which params are which:

>>> fit.summary()
<class 'statsmodels.iolib.summary.Summary'>
"""
                                 Statespace Model Results
==========================================================================================
Dep. Variable:                                  y   No. Observations:                  176
Model:             SARIMAX(1, 1, 2)x(0, 1, 1, 12)   Log Likelihood               -1527.371
Date:                            Tue, 03 Dec 2019   AIC                           3066.742
Time:                                    07:14:06   BIC                           3085.305
Sample:                                         0   HQIC                          3074.278
                                            - 176
Covariance Type:                              opg
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
intercept   -100.7331     72.197     -1.395      0.163    -242.236      40.770
ar.L1         -0.5123      0.390     -1.312      0.189      -1.277       0.253
ma.L1         -0.0806      0.404     -0.200      0.842      -0.872       0.711
ma.L2         -0.4430      0.224     -1.978      0.048      -0.882      -0.004
ma.S.L12      -0.4025      0.054     -7.448      0.000      -0.508      -0.297
sigma2      7.663e+06    7.3e+05     10.495      0.000    6.23e+06    9.09e+06
===================================================================================
Ljung-Box (Q):                       48.70   Jarque-Bera (JB):                21.57
Prob(Q):                              0.16   Prob(JB):                         0.00
Heteroskedasticity (H):               1.18   Skew:                            -0.61
Prob(H) (two-sided):                  0.54   Kurtosis:                         4.31
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number 8.14e+14. Standard errors may be unstable.
"""

So we can see that, in order, the params are: intercept, AR1, MA1, MA2, MA(seasonal)1, and sigma^2.

ThamaluM commented 4 years ago

@tgsmith61591 Thank you for answer. I suggest to implement a function such as it outputs all the parameters and another function that takes parameters as input (in the same format of the output of previous function) to create a SARIMA model.

Then it is easy to regenerate models from variables rather than serializing which takes about 2MB. I am constructing a scheduled online data mining unit. Storing set of parameters and regenerate takes less memory rather than storing model dumps.

tgsmith61591 commented 4 years ago

You can always pass the start_params argument, as outlined in the documentation. But there are a lot of instance attributes that are created on fit that are necessary to the functionality of the model, so there will likely never be a simple static function a la ARIMA.from_parameters. In the grand scope of things, deserializing and holding a 2MB model in memory should be a pretty low impact operation, and that's the standard for libraries like scikit-learn (which we try to hold to).