If an ARIMA model is fit with exogenous variables, in-sample predictions do not appear to depend on the X values provided to predict_in_sample. In fact, even arrays with missing columns and rows are allowed. Are the true X values in sample saved somewhere, so they don't need to be provided? Or am I missing something here..?
To Reproduce
import pmdarima as pm
from pmdarima import model_selection
import numpy as np
import pandas as pd
np.random.seed(42)
y = pm.datasets.load_wineind()
df = pd.DataFrame(
{
"x1": y * np.random.uniform(0, 0.5, len(y)) + np.random.randint(1, 1000, len(y)),
"x2": y * np.random.uniform(0.5, 0.7, len(y)) + np.random.randint(1, 10000, len(y)),
}
)
df["y"] = y
train, test = model_selection.train_test_split(df, train_size=150)
arima = pm.auto_arima(
train["y"],
train.drop(columns="y"),
error_action="ignore",
trace=True,
suppress_warnings=True,
maxiter=5,
seasonal=True,
m=12,
)
# preds1 takes the expected X args
preds1 = arima.predict_in_sample(X=train.drop(columns="y"))
# preds2 takes xargs with the correct dims, but different values from those used for preds1
preds2 = arima.predict_in_sample(X=train.drop(columns="y") + 1000)
# preds3 takes only x2, not x1, and x2 is subset to only 10 observations
preds3 = arima.predict_in_sample(X=train[:10].drop(columns=["y", "x1"]))
len(preds1) # 150
len(preds2) # 150
len(preds3) # 150
all(preds1 == preds2) # True
all(preds2 == preds3) # True
arima.summary() # To confirm that indeed x1 and x2 are in the model
Describe the bug
If an ARIMA model is fit with exogenous variables, in-sample predictions do not appear to depend on the X values provided to
predict_in_sample
. In fact, even arrays with missing columns and rows are allowed. Are the true X values in sample saved somewhere, so they don't need to be provided? Or am I missing something here..?To Reproduce
Versions
Expected Behavior
I expect the code to break if an X of incorrect dimensions is provided, and predictions to depend on the values of a correctly-dimensioned X.
Actual Behavior
The code does not break, and there is no difference in output.
Additional Context
No response