alkaline-ml / pmdarima

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
https://www.alkaline-ml.com/pmdarima
MIT License
1.57k stars 231 forks source link

The expected model is not found for the sunspots example in Section 10.2.5 of http://alkaline-ml.com/pmdarima/usecases/sun-spots.html #441

Open Awled opened 3 years ago

Awled commented 3 years ago

Describe the bug

When I run the example code in Section 10.2.5 of the documentation at http://alkaline-ml.com/pmdarima/usecases/sun-spots.html the best fitting model I obtain is ARIMA(0,1,2)(0,0,0)[12] but this is not the one that should be obtained. Further inconsistencies arise with the results in the documentation e.g. the forecast values are constant.

To Reproduce
Steps to reproduce the behavior:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import pmdarima as pm
from pmdarima.datasets import load_sunspots
from pmdarima.model_selection import train_test_split
from pmdarima.preprocessing import BoxCoxEndogTransformer

y = load_sunspots(True)
train_len = 2750
y_train, y_test = train_test_split(y, train_size=train_len)
y_train.head()

from pmdarima.pipeline import Pipeline
fit2 = Pipeline([
    ('boxcox', BoxCoxEndogTransformer(lmbda2=1e-6)),
    ('arima', pm.AutoARIMA(trace=True,
                           suppress_warnings=True,
                           m=12))
])

# Fit the model
fit2.fit(y_train)

# Predict next 70 values
fit2.predict(70)

Versions

System:
    python: 3.6.5   [GCC 7.2.0]
executable: /pyenvs/PyARIMA_Test/bin/python
   machine: Linux-3.10.0-1160.11.1.el7.x86_64-x86_64-with-centos-7.9.2009-Core

Python dependencies:
        pip: 21.1.3
 setuptools: 57.2.0
    sklearn: 0.24.2
statsmodels: 0.12.2
      numpy: 1.19.5
      scipy: 1.5.4
     Cython: 0.29.24
     pandas: 1.1.5
     joblib: 1.0.1
   pmdarima: 1.8.2

Expected behavior I'd expect the model identified to be consistent with that in the documentation, namely SARIMAX(3, 1, 2), and for the predicted values to be non-constant.

Actual behavior

Here is the output showing the search path:

Performing stepwise search to minimize aic
 ARIMA(2,1,2)(1,0,1)[12] intercept   : AIC=inf, Time=8.46 sec
 ARIMA(0,1,0)(0,0,0)[12] intercept   : AIC=10560.255, Time=0.05 sec
 ARIMA(1,1,0)(1,0,0)[12] intercept   : AIC=10190.842, Time=0.59 sec
 ARIMA(0,1,1)(0,0,1)[12] intercept   : AIC=10000.189, Time=1.08 sec
 ARIMA(0,1,0)(0,0,0)[12]             : AIC=10558.255, Time=0.05 sec
 ARIMA(0,1,1)(0,0,0)[12] intercept   : AIC=9998.955, Time=0.25 sec
 ARIMA(0,1,1)(1,0,0)[12] intercept   : AIC=10000.226, Time=0.67 sec
 ARIMA(0,1,1)(1,0,1)[12] intercept   : AIC=10000.401, Time=2.12 sec
 ARIMA(1,1,1)(0,0,0)[12] intercept   : AIC=9985.703, Time=0.45 sec
 ARIMA(1,1,1)(1,0,0)[12] intercept   : AIC=9985.956, Time=1.21 sec
 ARIMA(1,1,1)(0,0,1)[12] intercept   : AIC=9985.855, Time=2.20 sec
 ARIMA(1,1,1)(1,0,1)[12] intercept   : AIC=9985.704, Time=3.28 sec
 ARIMA(1,1,0)(0,0,0)[12] intercept   : AIC=10189.020, Time=0.18 sec
 ARIMA(2,1,1)(0,0,0)[12] intercept   : AIC=9986.300, Time=0.80 sec
 ARIMA(1,1,2)(0,0,0)[12] intercept   : AIC=9987.000, Time=0.79 sec
 ARIMA(0,1,2)(0,0,0)[12] intercept   : AIC=9985.001, Time=0.42 sec
 ARIMA(0,1,2)(1,0,0)[12] intercept   : AIC=9985.241, Time=1.00 sec
 ARIMA(0,1,2)(0,0,1)[12] intercept   : AIC=9985.139, Time=1.12 sec
 ARIMA(0,1,2)(1,0,1)[12] intercept   : AIC=9985.004, Time=3.01 sec
 ARIMA(0,1,3)(0,0,0)[12] intercept   : AIC=9986.999, Time=0.50 sec
 ARIMA(1,1,3)(0,0,0)[12] intercept   : AIC=9988.412, Time=2.18 sec
 ARIMA(0,1,2)(0,0,0)[12]             : AIC=9983.001, Time=0.17 sec
 ARIMA(0,1,2)(1,0,0)[12]             : AIC=9983.241, Time=0.43 sec
 ARIMA(0,1,2)(0,0,1)[12]             : AIC=9983.139, Time=0.44 sec
 ARIMA(0,1,2)(1,0,1)[12]             : AIC=9983.004, Time=1.12 sec
 ARIMA(0,1,1)(0,0,0)[12]             : AIC=9996.955, Time=0.10 sec
 ARIMA(1,1,2)(0,0,0)[12]             : AIC=9985.000, Time=0.51 sec
 ARIMA(0,1,3)(0,0,0)[12]             : AIC=9984.999, Time=0.34 sec
 ARIMA(1,1,1)(0,0,0)[12]             : AIC=9983.703, Time=0.18 sec
 ARIMA(1,1,3)(0,0,0)[12]             : AIC=9986.412, Time=1.01 sec

Best model:  ARIMA(0,1,2)(0,0,0)[12]          
Total fit time: 34.787 seconds

And here are the predicted values:

array([65.76997589, 62.30105759, 62.30105759, 62.30105759, 62.30105759,
       62.30105759, 62.30105759, 62.30105759, 62.30105759, 62.30105759,
       62.30105759, 62.30105759, 62.30105759, 62.30105759, 62.30105759,
       62.30105759, 62.30105759, 62.30105759, 62.30105759, 62.30105759,
       62.30105759, 62.30105759, 62.30105759, 62.30105759, 62.30105759,
       62.30105759, 62.30105759, 62.30105759, 62.30105759, 62.30105759,
       62.30105759, 62.30105759, 62.30105759, 62.30105759, 62.30105759,
       62.30105759, 62.30105759, 62.30105759, 62.30105759, 62.30105759,
       62.30105759, 62.30105759, 62.30105759, 62.30105759, 62.30105759,
       62.30105759, 62.30105759, 62.30105759, 62.30105759, 62.30105759,
       62.30105759, 62.30105759, 62.30105759, 62.30105759, 62.30105759,
       62.30105759, 62.30105759, 62.30105759, 62.30105759, 62.30105759,
       62.30105759, 62.30105759, 62.30105759, 62.30105759, 62.30105759,
       62.30105759, 62.30105759, 62.30105759, 62.30105759, 62.30105759])

Additional context

bsbk1 commented 2 years ago

same