Open clumdee opened 2 years ago
Hi @clumdee thanks for the issue. This is an interesting one that I think comes down to the user's intent when calling .fit_predict
. If the intention is to predict in-sample values, then your approach is correct and the existing implementation is wrong. However, if the user's intention is to fit a model and forecast n_periods
ahead, then the existing implementation is correct, and withholding n_periods
from training could create some confusion.
For instance:
import pmdarima as pm
y = pm.datasets.load_wineind()
next_10 = pm.AutoARIMA(seasonal=True, m=12).fit_predict(y)
print(next_10)
# array([21833.71615189, 26239.84853621, 30813.84738283, 35970.36202699,
# 13683.27930437, 20482.58814877, 22439.71347295, 24738.3241369 ,
# 22838.44936401, 25000.73827201])
Given the .predict
function is used to forecast future values, my instinct here would be to clear up the confusion with a better docstr, explaining the intended behavior of the function.
Thanks @tgsmith61591 for taking a look.
Let me try to re-address our discussion a bit.
n_periods
ahead.no exogenous variables
as in your example.with exogenous variables
. This was actually my intention in the example to reproduce.Please take a look at an adapted and expanded version of your example below. Please kindly share your thoughts.
import pmdarima as pm
y = pm.datasets.load_wineind()
# Ex1. this works -- basically the same as your example, I adjusted their placement to help us compare with other setups
m = pm.AutoARIMA(seasonal=True, m=12)
next_10 = m.fit_predict(y, n_periods=10)
print(next_10)
# Ex2. this does not work -- I tried adding dump exogenous variables to make the case
m = pm.AutoARIMA(seasonal=True, m=12)
next_10 = m.fit_predict(y, X=y.reshape(-1, 1), n_periods=10)
print(next_10)
# ValueError: X array dims (n_rows) != n_periods
# Ex3. this does not work, basically this is the same as Ex2 breaking down according to steps in fit_predict in base.py
m = pm.AutoARIMA(seasonal=True, m=12)
m.fit(y, X=y.reshape(-1, 1))
next_10 = m.predict(n_periods=10, X=y.reshape(-1, 1))
print(next_10)
# ValueError: X array dims (n_rows) != n_periods
# Ex4. this works because we supply the correct amount of exogenous variables for the target n_periods
m = pm.AutoARIMA(seasonal=True, m=12)
m.fit(y, X=y.reshape(-1, 1))
next_10 = m.predict(n_periods=10, X=y.reshape(-1, 1)[:10])
print(next_10)
Describe the bug
Please correct if I understand the concept wrongly.
Should this part https://github.com/alkaline-ml/pmdarima/blob/4869c43796a37f7a83ea56525803c797be3693d9/pmdarima/base.py#L47 be adjusted to this?
I can create a PR if this makes sense.
Thank you.
To Reproduce
Versions
Expected Behavior
The method separates exogenous feature for fit and predict to execute as described.
Actual Behavior
The method feeds all exogenous feature to the fit method producing
ValueError: Found input variables with inconsistent numbers of samples
.Additional Context
No response