alkaline-ml / pmdarima

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
https://www.alkaline-ml.com/pmdarima
MIT License
1.58k stars 232 forks source link

ARIMA.arima_res_ doesn't store pd.Series name but statsmodels do #535

Open JavierEscobarOrtiz opened 1 year ago

JavierEscobarOrtiz commented 1 year ago

Describe the question you have

Hello!

We are creating a wrapper in Skforecast for forecasting using ARIMA models and we are using pmdarima as a dependency.

We are trying to apply the append method from statsmodels in ARIMA().arima_res_and we are finding different behavior between pmdarima and statsmodels.

Inside ARIMA.arima_res_ there is an attribute that stores the original endogenous data (ARIMA().arima_res_.model.data.orig_endog). When statsmodels is used, it stores the pd.Series and its name but when pmdarima is used the name is removed.

As result, when we try to apply the append() method we get the following error:

ValueError: Columns must match to concatenate along rows.

Reproducible example:

np.random.seed(123) y_datetime = pd.Series(data=np.random.rand(50)) y_datetime.name = 'y' y_datetime.index = pd.date_range(start='2000', periods=50, freq='A') print(y_datetime.head(5))

last_window_datetime = pd.Series(data=np.random.rand(50)) last_window_datetime.name = 'y' last_window_datetime.index = pd.date_range(start='2050', periods=50, freq='A')

2000-12-31    0.696469
2001-12-31    0.286139
2002-12-31    0.226851
2003-12-31    0.551315
2004-12-31    0.719469
Freq: A-DEC, Name: y, dtype: float64

+ statsmodels: (Here `append()` works)

```python
from statsmodels.tsa.statespace.sarimax import SARIMAX

mod = SARIMAX(endog=y_datetime, order=(1,1,1))
res = mod.fit()
print(res.model.data.orig_endog.head(5))

new_res = res.append(last_window_datetime, refit=False)

2000-12-31 0.696469 2001-12-31 0.286139 2002-12-31 0.226851 2003-12-31 0.551315 2004-12-31 0.719469 Freq: A-DEC, Name: y, dtype: float64

from pmdarima.arima import ARIMA

mod = ARIMA(order=(1,1,1))
mod.fit(y_datetime)
print(mod.arima_res_.model.data.orig_endog.head(5))

mod.arima_res_ = mod.arima_res_.append(last_window_datetime, refit=False)

2000-12-31 0.696469 2001-12-31 0.286139 2002-12-31 0.226851 2003-12-31 0.551315 2004-12-31 0.719469 Freq: A-DEC, dtype: float64

Versions (if necessary)

Session info:

-----
numpy               1.23.5
pandas              1.4.0
pmdarima            2.0.2
pytest              7.1.2
session_info        1.0.0
skforecast          0.7.dev
sklearn             1.1.0
statsmodels         0.13.5
-----
IPython             8.5.0
jupyter_client      7.3.5
jupyter_core        4.11.1
notebook            6.4.12
-----
Python 3.9.13 (main, Aug 25 2022, 23:51:50) [MSC v.1916 64 bit (AMD64)]
Windows-10-10.0.19042-SP0
-----
Session information updated at 2023-01-09 12:08