Closed zachlefevre closed 4 years ago
Can you include some code and data so we can reproduce this?
@tgsmith61591 I will defer to you on this one. See #357 for details/data. The gist of it is this:
Using auto_arima
or (statsmodels.tsa.statespace.sarimax.SARIMAX
):
>>> data = [359125698.64, -419934.7, -31623185.37, 19556515.5, 934374.83, 88643847.73, 18983611.85, 638945.66, -4079020.52, -41301474.38, -999558.35, -999558.35, 52100319.7, -784917.62, 48527345.98, -3161926.01, 1757748.53, 86546250.69, 66971847.0, 1386714.85, -13195016.89, -19012979.5, 1152601.3, 11968257.33, 71107580.44, -1156768.56, -27878666.6, 888604.13, -29190832.71, 5499733.88, -23377004.53, -91490448.09, 7413504.15, 19425919.04, 742786.38, -3862882.17]
>>> auto_arima(data, seasonal=False).predict()
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
Using statsmodels.tsa.arima_model.ARIMA
:
>>> data = [359125698.64, -419934.7, -31623185.37, 19556515.5, 934374.83, 88643847.73, 18983611.85, 638945.66, -4079020.52, -41301474.38, -999558.35, -999558.35, 52100319.7, -784917.62, 48527345.98, -3161926.01, 1757748.53, 86546250.69, 66971847.0, 1386714.85, -13195016.89, -19012979.5, 1152601.3, 11968257.33, 71107580.44, -1156768.56, -27878666.6, 888604.13, -29190832.71, 5499733.88, -23377004.53, -91490448.09, 7413504.15, 19425919.04, 742786.38, -3862882.17]
>>> statsmodels.tsa.arima_model.ARIMA(data, order=(0,0,0)).fit().predict()
RUNNING THE L-BFGS-B CODE
* * *
Machine precision = 2.220D-16
N = 1 M = 12
This problem is unconstrained.
At X0 0 variables are exactly at the bounds
At iterate 0 f= 1.94498D+01 |proj g|= 0.00000D+00
* * *
Tit = total number of iterations
Tnf = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip = number of BFGS updates skipped
Nact = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F = final function value
* * *
N Tit Tnf Tnint Skip Nact Projg F
1 0 1 0 0 0 0.000D+00 1.945D+01
F = 19.449750884608154
CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL
array([15856612.03500003, 15856612.035 , 15856612.035 ,
15856612.035 , 15856612.035 , 15856612.03500001,
15856612.035 , 15856612.035 , 15856612.035 ,
15856612.035 , 15856612.03500001, 15856612.03500001,
15856612.035 , 15856612.035 , 15856612.035 ,
15856612.035 , 15856612.035 , 15856612.035 ,
15856612.035 , 15856612.035 , 15856612.035 ,
15856612.035 , 15856612.035 , 15856612.035 ,
15856612.035 , 15856612.035 , 15856612.035 ,
15856612.035 , 15856612.035 , 15856612.035 ,
15856612.035 , 15856612.035 , 15856612.035 ,
15856612.035 , 15856612.035 , 15856612.03500001])
Since mean(data)
is 15856612.035
, I think he is asking if we could produce what ARIMA
does as opposed to what SARIMAX
does
@zachlefevre will you share your code?
from statsmodels.tsa.arima_model import ARIMA
from pmdarima import auto_arima
data = [359125698.64, -419934.7, -31623185.37, 19556515.5, 934374.83, 88643847.73, 18983611.85, 638945.66, -4079020.52, -41301474.38, -999558.35, -999558.35, 52100319.7, -784917.62, 48527345.98, -3161926.01, 1757748.53, 86546250.69, 66971847.0, 1386714.85, -13195016.89, -19012979.5, 1152601.3, 11968257.33, 71107580.44, -1156768.56, -27878666.6, 888604.13, -29190832.71, 5499733.88, -23377004.53, -91490448.09, 7413504.15, 19425919.04, 742786.38, -3862882.17]
auto_arima(data).predict() # array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
ARIMA(data, order=(0,0,0)).fit().predict()# [15856612.035000026, 15856612.035000004, 15856612.035, 15856612.035000004, 15856612.035000004, 15856612.035000011, 15856612.035000004, 15856612.035000004, 15856612.035000004, 15856612.035000004, 15856612.035000006, 15856612.035000006, 15856612.035000004, 15856612.035000004, 15856612.035000004, 15856612.035000002, 15856612.035000004, 15856612.034999996, 15856612.035000004, 15856612.035000004, 15856612.035000004, 15856612.035000004, 15856612.035000004, 15856612.035000004, 15856612.035000004, 15856612.035000002, 15856612.035000004, 15856612.035000004, 15856612.035000004, 15856612.035000004, 15856612.035000004, 15856612.034999996, 15856612.035000004, 15856612.035000004, 15856612.035000004, 15856612.035000006]
ARIMA(data, order=(0,0,0)).fit().predict().mean # 15856612.035000004
auto_arima chooses order=(0,0,0) so these two should be equivalent.
No, auto_arima
fits a number of models. If you set trace=2
you'll see all of the logging:
>>> auto_arima(data, seasonal=False, trace=2)
Performing stepwise search to minimize aic
/anaconda3/envs/pmdenv/lib/python3.6/site-packages/statsmodels/base/model.py:568: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
"Check mle_retvals", ConvergenceWarning)
Fit ARIMA(2,0,2)x(0,0,0,0) [intercept=True]; AIC=1445.241, BIC=1454.742, Time=1.472 seconds
Near non-invertible roots for order (2, 0, 2)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 1.000)
Fit ARIMA(0,0,0)x(0,0,0,0) [intercept=True]; AIC=1404.382, BIC=1407.549, Time=0.033 seconds
First viable model found (1404.382)
Fit ARIMA(1,0,0)x(0,0,0,0) [intercept=True]; AIC=1457.824, BIC=1462.575, Time=0.066 seconds
Fit ARIMA(0,0,1)x(0,0,0,0) [intercept=True]; AIC=1453.060, BIC=1457.810, Time=0.103 seconds
Fit ARIMA(0,0,0)x(0,0,0,0) [intercept=False]; AIC=1404.304, BIC=1405.887, Time=0.007 seconds
New best model found (1404.304 < 1404.382)
Fit ARIMA(1,0,1)x(0,0,0,0) [intercept=True]; AIC=1450.501, BIC=1456.835, Time=0.100 seconds
Final model order: (0, 0, 0)x(0, 0, 0, 0) (constant=False)
Total fit time: 1.812 seconds
You'll notice it selects a model without an intercept, because it minimizes the AIC. The default for ARIMA
's with_intercept
is True. When you instantiate it with the same parameters, you get the expected result, which is equal:
>>> ARIMA(order=(0,0,0), with_intercept=False).fit(data).predict()
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
I see. Thank you!
Question
AIMRA(0,0,0).predict give a horizontal line y = mean(train_data) AutoArima(0,0,0).predict gives a horizontal line through the x axis.
Is there a way to make AutoArima(0,0,0).predict() == ARIMA(0,0,0).predict()?