Nixtla / statsforecast

Lightning ⚡️ fast forecasting with statistical and econometric models.
https://nixtlaverse.nixtla.io/statsforecast
Apache License 2.0
3.96k stars 282 forks source link

auto arima "xreg is rank deficient" test is incorrect #902

Open andrewscottm opened 1 month ago

andrewscottm commented 1 month ago

What happened + What you expected to happen

Auto arima checks if adding a constant to the model will cause collinearity between exogenous variables by instead checking if adding a trend would do so.

https://github.com/Nixtla/statsforecast/blob/12f6654dcf6e776af0f3cc461072e8cad6e2f684/python/statsforecast/arima.py#L1449

Should be X = np.hstack([np.repeat(1, xregg.shape[0] + 1).reshape(-1, 1), xregg]) as in the R version

If an exogenous variable is added with trend starting from 1, as for utilsforecast.feature_engineering.trend, then the model fit fails with ValueError: xreg is rank deficient when it need not. Shifting the trend circumvents the bug.

Versions / Dependencies

statsforecast 1.7.6 pandas 2.2.2 Python 3.11.9 (main, Apr 19 2024, 11:44:45) [Clang 14.0.6 ] MacOS 14.6.1

Reproducible example

import pandas as pd
from statsforecast import StatsForecast
from utilsforecast.feature_engineering import trend
from statsforecast.models import AutoARIMA

df = pd.DataFrame({
    'unique_id': '0',
    'ds': pd.date_range(start='2000-01-01', periods=10, freq='1D'),
    'y': [1., 4., 5., 5., 1., 4., 5., 5., 7., 3.]
})

df, _ = trend(df, freq='1D', h=1)

m = StatsForecast(models=[AutoARIMA(seasonal=False)], freq='1D', n_jobs=1, verbose=True)

m.fit(df=df)

# Workaround:
#
# df['trend'] += 1
#
# m.fit(df=df)

Traceback (most recent call last):

Cell In[39], line 16 m.fit(df=df)

File /opt/anaconda3/envs/fnixtla1/lib/python3.11/site-packages/statsforecast/core.py:721 in fit self.fitted_ = self.ga.fit(

File /opt/anaconda3/envs/fnixtla1/lib/python3.11/site-packages/statsforecast/core.py:71 in fit raise error

File /opt/anaconda3/envs/fnixtla1/lib/python3.11/site-packages/statsforecast/core.py:64 in fit fm[i, i_model] = new_model.fit(y=y, X=X)

File /opt/anaconda3/envs/fnixtla1/lib/python3.11/site-packages/statsforecast/models.py:356 in fit self.model_ = auto_arima_f(

File /opt/anaconda3/envs/fnixtla1/lib/python3.11/site-packages/statsforecast/arima.py:1915 in auto_arima_f raise ValueError("xreg is rank deficient")

ValueError: xreg is rank deficient

Issue Severity

Low: It annoys or frustrates me.

jmoralez commented 1 month ago

Hey @andrewscottm, thanks for the report. Are you interested in contributing the fix?

andrewscottm commented 1 month ago

Hey @andrewscottm, thanks for the report. Are you interested in contributing the fix?

Yes, I can do that