alkaline-ml / pmdarima

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
https://www.alkaline-ml.com/pmdarima
MIT License
1.58k stars 232 forks source link

FourierFeaturizer and DateFeaturizer pipeline: "transform() got multiple values for argument 'y' result" #539

Closed wouterbles closed 1 year ago

wouterbles commented 1 year ago

Describe the bug

When using the FourierFeaturizer or DateFeaturizer in a pipeline and calling any of the predict methods (predict_in_sample or predict) the following error is thrown: transform() got multiple values for argument 'y'

To Reproduce

Can be reproduced by running any of the pipeline examples:

Versions

System:
    python: 3.9.16 (main, Mar  1 2023, 18:22:10)  [GCC 11.2.0]
executable: /home/whbles/miniconda3/bin/python
   machine: Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.35

Python dependencies:
        pip: 22.3.1
 setuptools: 65.6.3
    sklearn: 1.2.1
statsmodels: 0.13.5
      numpy: 1.23.5
      scipy: 1.10.0
     Cython: 0.29.33
     pandas: 1.5.3
     joblib: 1.2.0
   pmdarima: 2.0.2
Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Python 3.9.16 (main, Mar  1 2023, 18:22:10) 
[GCC 11.2.0]
pmdarima 2.0.2
NumPy 1.23.5
SciPy 1.10.0
Scikit-Learn 1.2.1
Statsmodels 0.13.5

Expected Behavior

Output the model results

Actual Behavior

Error thrown:

TypeError                                 Traceback (most recent call last)
Cell In[4], line 34
     31 print(pipe)
     33 # We can compute predictions the same way we would on a normal ARIMA object:
---> 34 preds, conf_int = pipe.predict(n_periods=10, return_conf_int=True)
     35 print("\nForecasts:")
     36 print(preds)

File [~/miniconda3/lib/python3.9/site-packages/pmdarima/pipeline.py:443](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/whbles/tno-optimization/~/miniconda3/lib/python3.9/site-packages/pmdarima/pipeline.py:443), in Pipeline.predict(self, n_periods, X, return_conf_int, alpha, inverse_transform, **kwargs)
    441 n_periods = self._check_n_periods(n_periods, X)
    442 kwargs = _warn_for_deprecated(**kwargs)
--> 443 Xt, est, predict_kwargs = self._pre_predict(
    444     n_periods, X, **kwargs)
    446 return_vals = est.predict(
    447     n_periods=n_periods,
    448     X=Xt,
    449     return_conf_int=return_conf_int,
    450     alpha=alpha,
    451     **predict_kwargs)
    453 return self._post_predict(
    454     Xt, return_vals, return_conf_int, inverse_transform)

File [~/miniconda3/lib/python3.9/site-packages/pmdarima/pipeline.py:250](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/whbles/tno-optimization/~/miniconda3/lib/python3.9/site-packages/pmdarima/pipeline.py:250), in Pipeline._pre_predict(self, n_periods, X, **kwargs)
    246             kw["n_periods"] = n_periods
...
    146             _wrap_data_with_container(method, data_to_wrap[0], X, self),
    147             *data_to_wrap[1:],
    148         )

TypeError: transform() got multiple values for argument 'y'

Additional Context

No response

benman1 commented 1 year ago

I've come across the same problem. But I found that if I downgrade to scikit-learn 1.1.3 (from 1.2.0) the problem goes away.

msat59 commented 1 year ago

I have the same issue. I created a new environment and updated all libraries, so I am not sure which one causes this.

Here is the code to replicate the issue:

from pmdarima import pipeline
from pmdarima import arima
from pmdarima import preprocessing as ppc

max_p=5
max_q=5
ff_m=12
ff_k=4
n_jobs=4
trend = 'c'

train = np.array([ 53.49732848,  55.67194689,  58.38817983,  60.15814887,
        60.78495554,  60.92771421,  61.30123253,  62.37336819,
        64.31094699,  66.95783357,  69.91670478,  72.54269204,
        73.76937463,  72.23523373,  67.0330952 ,  58.5990769 ,
        49.0156232 ,  41.45220597,  38.91781587,  43.00469642,
        53.33866198,  67.77987974,  83.06470788,  95.81475843,
       103.74286951, 106.54764586, 105.90211977, 104.27736115,
       103.47655163, 104.35387802, 107.59139252, 113.98787597,
       123.92153058, 136.93004483, 151.58183531, 165.31885435,
       175.02600621, 178.97489817, 178.31169515, 175.85739037,
       173.50084961, 171.48318201, 169.87503182, 169.55511273,
       171.45912256, 175.46712864])

pipe = pipeline.Pipeline([
            ("fourier", ppc.FourierFeaturizer(m=ff_m, k=ff_k)),
            ("arima", arima.AutoARIMA(stepwise=False, trace=5, error_action="ignore",
                                      seasonal=False,  # because we use Fourier
                                      trend=trend, n_jobs=n_jobs,
                                      max_p=max_p, max_q=max_q,
                                      suppress_warnings=True))
        ])

pipe.fit(train)

pip.predict(2)

Output:

TypeError: transform() got multiple values for argument 'y' 
aaronreidsmith commented 1 year ago

I think this should be fixed by #532. We'll get a fixed version released so y'all aren't stuck pinning scikit to <1.2.0

msat59 commented 1 year ago

I think it's because of scikit-learn new feature:

Major Feature The set_output API has been adopted by all transformers. Meta-estimators that contain transformers such as pipeline.Pipeline or compose.ColumnTransformer also define a set_output.

aaronreidsmith commented 1 year ago

I just deployed version 2.0.3 (to PyPI; conda builds are maintained separately). Give that a shot and let me know if it works

msat59 commented 1 year ago

Thank you @aaronreidsmith .

The bug is fixed and my code runs smoothly.