alkaline-ml / pmdarima

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
https://www.alkaline-ml.com/pmdarima
MIT License
1.57k stars 231 forks source link

auto_arima gives errors when time series has numbers of different orders #510

Open VasudevTA opened 2 years ago

VasudevTA commented 2 years ago

Describe the bug

If the data provided has numbers of different orders, auto_arima gives errors.

To Reproduce

import numpy as np
import pmdarima as pm

# original data
data = np.array([2166, 200, 902, 1585, 227, 2275, 3826, 5559, 2837, 4098, 1292,
5300, 1581, 1751, 1746, 1489, 2357, 7957, 6321, 3210, 464, 2928, 605, 605, 1127, 331, 329])

fit = pm.auto_arima(data,
                             start_p=0, start_q=0, max_p=3, max_q=3, m=12,
                                 start_P=0,start_Q=0, max_P=2, max_Q=2, seasonal=True, max_D=1)

Gives this error:

[/usr/local/lib/python3.7/dist-packages/numpy/core/_methods.py](https://localhost:8080/#) in _ptp(a, axis, out, keepdims)
    274 def _ptp(a, axis=None, out=None, keepdims=False):
    275     return um.subtract(
--> 276         umr_maximum(a, axis, None, out, keepdims),
    277         umr_minimum(a, axis, None, None, keepdims),
    278         out

ValueError: zero-size array to reduction operation maximum which has no identity

Then we noticed that the only difference between the inbuilt wine data and my data is that the wine data are all of the same order (1e4), while mine changes from numbers in the hundreds to thousands. 1e2-1e3. So, for science, We converted the three-digit nos to 4 digit nos by adding a 0 or 9 in the end:


# padding 0 or 9 at the end of 3-digit nos to make them 4-digit nos.
data = np.array([2166, 2000,9020,1585,2270,2275,3826,5559,2837,4098,1292,5300,
1581, 1751, 1746, 1489, 2357, 7957, 6321, 3210, 4640, 2928, 6059, 6059,1127,3310,3290])
fit = pm.auto_arima(data,
                             start_p=0, start_q=0, max_p=3, max_q=3, m=12,
                                 start_P=0,start_Q=0, max_P=2, max_Q=2, seasonal=True, max_D=1,trace=100)

And voila:

Performing stepwise search to minimize aic
 ARIMA(0,0,0)(0,0,0)[12] intercept   : AIC=493.430, Time=0.04 sec
First viable model found (493.430)
 ARIMA(1,0,0)(1,0,0)[12] intercept   : AIC=497.363, Time=0.23 sec
 ARIMA(0,0,1)(0,0,1)[12] intercept   : AIC=497.335, Time=0.85 sec
 ARIMA(0,0,0)(0,0,0)[12]             : AIC=528.172, Time=0.02 sec
 ARIMA(0,0,0)(1,0,0)[12] intercept   : AIC=495.430, Time=0.13 sec
 ARIMA(0,0,0)(0,0,1)[12] intercept   : AIC=495.427, Time=0.08 sec
 ARIMA(0,0,0)(1,0,1)[12] intercept   : AIC=497.427, Time=0.15 sec
 ARIMA(1,0,0)(0,0,0)[12] intercept   : AIC=495.372, Time=0.03 sec
 ARIMA(0,0,1)(0,0,0)[12] intercept   : AIC=495.370, Time=0.29 sec
 ARIMA(1,0,1)(0,0,0)[12] intercept   : AIC=497.389, Time=0.48 sec

Best model:  ARIMA(0,0,0)(0,0,0)[12] intercept
Total fit time: 2.375 seconds

(Though the Best Model looks suspect, at least it runs!)

Other things we have tested -

  1. The length of the data is not an issue. Wine data runs with 19 data points. So it's not the size of the data.
  2. The original data runs and gives results when you use auto.arima from the forecast library in R:
    
    #R code
    library(forecast)

sbd = c(2166, 200, 902, 1585, 227, 2275, 3826, 5559, 2837, 4098, 1292, 5300, 1581, 1751, 1746, 1489, 2357, 7957, 6321, 3210, 464, 2928, 605, 605, 1127, 331, 329)

sbdts = ts(sbd,start=c(2020,1),frequency=12) plot(sbdts)

auto.arima(sbdts)


Seems like a bug in auto_arima. 
Could you please help us out here? Many Thanks in advance!

### Versions

```shell
1.8.0, 1.8.5

Expected Behavior

The model should run for this data, or exit gracefully.

Actual Behavior

Errors happen. See 'To Reproduce' for a working example.

Additional Context

No response

tgsmith61591 commented 2 years ago

Hey thanks for the well-written issue. We'll take a look at this

imShaswata commented 1 year ago

Hi , I just got the same issue. Is there any fix yet? I think the issue is not because of the data and its range of values but because the auto_arima couldn't find a valid value for the seasonal difference parameter D and throws up the error (at least for me). If I explicitly define D=0 or D=1, it runs smoothly. So I am not sure why the functionality of automatically determining the appropriate number of seasonal difference (D) (using some seasonality test ) is failing here..!!

Cici-J-github commented 5 months ago

hey I got the same issue. I agree with what imShaswata said above. Any updates about the solution? thank you