Open claudia-hm opened 2 years ago
Hi @claudia-hm sorry for the late reply. I'm looking into this. First thing I notice is a peculiar statsmodels warning when I enable the trace on the fit:
In [15]: model = pm.auto_arima(y2_train, trace=5)
Performing stepwise search to minimize aic
/opt/miniconda3/envs/ml/lib/python3.7/site-packages/statsmodels/tsa/statespace/sarimax.py:1890: RuntimeWarning: divide by zero encountered in reciprocal
return np.roots(self.polynomial_reduced_ar)**-1
ARIMA(2,1,2)(0,0,0)[0] intercept : AIC=-1614.762, Time=0.07 sec
First viable model found (-1614.762)
ARIMA(0,1,0)(0,0,0)[0] intercept : AIC=-1670.667, Time=0.06 sec
New best model found (-1670.667 < -1614.762)
ARIMA(1,1,0)(0,0,0)[0] intercept : AIC=-1665.984, Time=0.03 sec
ARIMA(0,1,1)(0,0,0)[0] intercept : AIC=15212.220, Time=0.05 sec
ARIMA(0,1,0)(0,0,0)[0] : AIC=-1839.861, Time=0.02 sec
New best model found (-1839.861 < -1670.667)
ARIMA(1,1,1)(0,0,0)[0] intercept : AIC=-1663.997, Time=0.04 sec
Best model: ARIMA(0,1,0)(0,0,0)[0]
Total fit time: 0.276 seconds
I'll dig more into this.
is there an update, @tgsmith61591 ?
Looks like it still presents even with the latest statsmodels (0.14.0) when using the default optimizer ('lbfgs'
):
File "statsmodels/tsa/statespace/_representation.pyx", line 1373, in statsmodels.tsa.statespace._representation.dStatespace.initialize
File "statsmodels/tsa/statespace/_representation.pyx", line 1362, in statsmodels.tsa.statespace._representation.dStatespace.initialize
File "statsmodels/tsa/statespace/_initialization.pyx", line 288, in statsmodels.tsa.statespace._initialization.dInitialization.initialize
File "statsmodels/tsa/statespace/_initialization.pyx", line 406, in statsmodels.tsa.statespace._initialization.dInitialization.initialize_stationary_stationary_cov
File "statsmodels/tsa/statespace/_tools.pyx", line 1525, in statsmodels.tsa.statespace._tools._dsolve_discrete_lyapunov
numpy.linalg.LinAlgError: LU decomposition error.
Seems like changing the optimizer method makes a difference:
In [23]: model = pm.auto_arima(y2_train, method='nm')
In [24]: forecasts = model.predict(y2_test.shape[0])
In [25]: forecasts
Out[25]:
80 0.000280
81 0.000281
82 0.000281
83 0.000282
84 0.000283
85 0.000284
86 0.000285
87 0.000286
88 0.000287
89 0.000288
90 0.000289
91 0.000290
92 0.000291
93 0.000292
94 0.000293
95 0.000294
96 0.000295
97 0.000296
98 0.000297
99 0.000298
dtype: float64
I tried the following and they also appeared to work without issue:
powell
bfgs
Describe the bug
auto_arima is returning constant predictions when data is too small., i.e., close to zero
Initially, I generated a linear trendy time series with slope 0.5 and intercept 100, plus some noise. Then, I wanted to change units of my data and I divided the values of the time series by $10^6$. I expected to obtain a similar prediction. However, auto_arima returned a repeated constant value that poorly predicts my data.
To Reproduce
This is a sample code that shows the bug
The output is:
Versions
Expected Behavior
I would expect to obtain a similar shape to the one produced in the first forecast
Actual Behavior
auto_arima
produces the following forecast:Additional Context
If this is due to some numerical issue, I would like to understand what is happening and if there is some tolerance value that I can change to bypass this problem.