alkaline-ml / pmdarima

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
https://www.alkaline-ml.com/pmdarima
MIT License
1.58k stars 234 forks source link

Error: Input contains NaN, infinity or a value too large for dtype('float64'): pmdarima.predict() #404

Open joshi-abhishek opened 3 years ago

joshi-abhishek commented 3 years ago

Describe the bug The method abruptly exit with the below error... ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

But The data is clean and no sign of any reported behavior above.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-56-cb145de96983> in <module>
      4 model_arima = auto_arima(data_tra, start_p = 0, start_q = 0, max_p = 12, max_q = 12, m = 12, start_P = 0, start_Q = 0, seasonal = False, error_action = 'ignore', suppress_warnings = True, stepwise = True)
----> 6 forecast_arima = model_arima.predict(n_periods = 18, return_conf_int = True, alpha = 0.05)

/opt/anaconda/envs/shared/lib/python3.7/site-packages/pmdarima/arima/arima.py in predict(self, n_periods, exogenous, return_conf_int, alpha)
    651             end=end,
    652             exog=exogenous,
--> 653             alpha=alpha)
    654 
    655         if return_conf_int:

/opt/anaconda/envs/shared/lib/python3.7/site-packages/pmdarima/arima/arima.py in _seasonal_prediction_with_confidence(arima_res, start, end, exog, alpha, **kwargs)
     81     conf_int = results.conf_int(alpha=alpha)
     82     return check_endog(f, dtype=None, copy=False), \
---> 83         check_array(conf_int, copy=False, dtype=None)
     84 
     85 

/opt/anaconda/envs/shared/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     70                           FutureWarning)
     71         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72         return f(**kwargs)
     73     return inner_f
     74 

/opt/anaconda/envs/shared/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    643         if force_all_finite:
    644             _assert_all_finite(array,
--> 645                                allow_nan=force_all_finite == 'allow-nan')
    646 
    647     if ensure_min_samples > 0:

/opt/anaconda/envs/shared/lib/python3.7/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
     97                     msg_err.format
     98                     (type_err,
---> 99                      msg_dtype if msg_dtype is not None else X.dtype)
    100             )
    101     # for object dtype data, we only check for NaNs (GH-13254)

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
-----------------------------------------------------------------------------------------------------

To Reproduce
Steps to reproduce the behavior:

data:

[1872.0, 1452.0, 1476.0, 1404.0, 3048.0, 1788.0, 1080.0, 888.0, 2184.0, 2220.0, 1680.0,
612.0, 2124.0, 486.0, 1968.0, 924.0, 888.0, 1756.0, 1104.0, 876.0, 888.0, 1608.0, 1896.0,
648.0, 1524.0, 804.0, 816.0, 1944.0, 1512.0, 900.0, 1464.0, 876.0, 1464.0, 2136.0, 732.0, 
1764.0, 840.0, 1860.0, 792.0, 1728.0, 768.0, 1080.0, 876.0, 1716.0, 900.0, 1740.0, 888.0, 
2172.0, 486.0]

Code:

from pmdarima.arima import auto_arima

model_arima = auto_arima(data, start_p = 0, start_q = 0, max_p = 12, max_q = 12, m = 12, start_P = 0, start_Q = 0, seasonal = False, error_action = 'ignore', suppress_warnings = True, stepwise = True)
forecast_arima = model_arima.predict(n_periods = 18, return_conf_int = False, alpha = 0.05)

Versions

import pmdarima; pmdarima.show_versions()

System:
    python: 3.7.9 (default, Aug 31 2020, 12:42:55)  [GCC 7.3.0]
executable: /opt/anaconda/envs/shared/bin/python
   machine: Linux-4.4.0-1114-aws-x86_64-with-debian-stretch-sid

Python dependencies:
        pip: 20.2.3
 setuptools: 49.6.0.post20200917
    sklearn: 0.23.2
statsmodels: 0.12.0
      numpy: 1.19.1
      scipy: 1.5.2
     Cython: 0.29.21
     pandas: 0.25.3
     joblib: 0.16.0
   pmdarima: 1.7.1

Expected behavior There should be no error.

Actual behavior

Additional context

tgsmith61591 commented 3 years ago

Can you try updating your version? This works on 1.8.0:

In [1]: data = [1872.0, 1452.0, 1476.0, 1404.0, 3048.0, 1788.0, 1080.0, 888.0, 2184.0, 2220.0, 1680.0,
   ...: 612.0, 2124.0, 486.0, 1968.0, 924.0, 888.0, 1756.0, 1104.0, 876.0, 888.0, 1608.0, 1896.0,
   ...: 648.0, 1524.0, 804.0, 816.0, 1944.0, 1512.0, 900.0, 1464.0, 876.0, 1464.0, 2136.0, 732.0,
   ...: 1764.0, 840.0, 1860.0, 792.0, 1728.0, 768.0, 1080.0, 876.0, 1716.0, 900.0, 1740.0, 888.0,
   ...: 2172.0, 486.0]

In [2]: from pmdarima.arima import auto_arima
   ...:
   ...: model_arima = auto_arima(data, start_p = 0, start_q = 0, max_p = 12, max_q = 12, m = 12, start_P = 0, start_Q = 0, seasonal = False, error_action = 'ignore', suppress_warnings = True, stepwise = True)
   ...: forecast_arima = model_arima.predict(n_periods = 18, return_conf_int = False, alpha = 0.05)

In [3]: forecast_arima
Out[3]:
array([1742.03281905, 1038.44297599, 1677.5632002 , 1122.01177781,
       1504.58931217, 1021.85945799, 1588.49173444, 1202.38369947,
       1480.27656245, 1170.41755339, 1407.33114539, 1250.95355177,
       1452.51653705, 1248.930108  , 1375.22988857, 1258.86745029,
       1391.23966826, 1303.01297922])

pip install --upgrade pmdarima

joshi-abhishek commented 3 years ago

Its the same error even after the pmdarima upgrade..

import pmdarima; pmdarima.show_versions()

System:
    python: 3.7.9 (default, Aug 31 2020, 12:42:55)  [GCC 7.3.0]
executable: /opt/anaconda/envs/shared/bin/python
   machine: Linux-4.4.0-1114-aws-x86_64-with-debian-stretch-sid

Python dependencies:
        pip: 20.2.3
 setuptools: 49.6.0.post20200917
    sklearn: 0.23.2
statsmodels: 0.12.1
      numpy: 1.19.1
      scipy: 1.5.2
     Cython: 0.29.17
     pandas: 0.25.3
     joblib: 0.16.0
   pmdarima: 1.8.0

Error

    380     model_arima = auto_arima(data_tra, start_p = 0, start_q = 0, max_p = 12, max_q = 12, m = 12, start_P = 0, start_Q = 0, seasonal = False, error_action = 'ignore', suppress_warnings = True)
--> 381     forecast_arima = model_arima.predict(n_periods = len(tes), return_conf_int = False, alpha = ci_alpha)

/opt/anaconda/envs/shared/lib/python3.7/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
     97                     msg_err.format
     98                     (type_err,
---> 99                      msg_dtype if msg_dtype is not None else X.dtype)
    100             )
    101     # for object dtype data, we only check for NaNs (GH-13254)

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Could you let me know your supporting libraries versions as well with pmdarima.show_versions()? I read somewhere the pandas & statsmodel versions also matters.

tgsmith61591 commented 3 years ago
In [3]: pm.show_versions()

System:
    python: 3.7.9 (default, Nov 18 2020, 14:10:47)  [GCC 8.3.0]
executable: /usr/local/bin/python
   machine: Linux-5.4.39-linuxkit-x86_64-with-debian-10.6

Python dependencies:
        pip: 20.3.1
 setuptools: 50.3.2
    sklearn: 0.23.2
statsmodels: 0.12.1
      numpy: 1.19.4
      scipy: 1.5.4
     Cython: 0.29.17
     pandas: 1.1.5
     joblib: 0.17.0
   pmdarima: 1.8.0

Keep in mind if you're having environmental issues, you can always use the docker image, and mount a volume wherever you want to save your model:

$ docker run --rm -it alkalineml/pmdarima:latest
tgsmith61591 commented 3 years ago

This still an issue @joshi-abhishek ?

joshi-abhishek commented 3 years ago

Yes.. I am trying out in different machines to check if this is actually an issue..and then we'd have a root cause identified.

Shuvo-saha commented 3 years ago

I'm facing a similar issue with data that looks like this

test = [53930.25, 16575.5, 15593.1, 6751.15, 5408.95, 3853.0, 5119.9, 6761.55, 20449.1, 20458.05, 24501.8, 33300.4, 34285.9] 
arima_model = auto_arima(test)
arima_model.predict(n_periods=1)

The error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-264-f624bf4d9f84> in <module>
      1 test = [53930.25, 16575.5, 15593.1, 6751.15, 5408.95, 3853.0, 5119.9, 6761.55, 20449.1, 20458.05, 24501.8, 33300.4, 34285.9]
      2 arima_model = auto_arima(test)
----> 3 arima_model.predict(n_periods=1)

~\miniconda3\envs\arima\lib\site-packages\pmdarima\arima\arima.py in predict(self, n_periods, X, return_conf_int, alpha, **kwargs)
    674         end = arima.nobs + n_periods - 1
    675 
--> 676         f, conf_int = _seasonal_prediction_with_confidence(
    677             arima_res=arima,
    678             start=arima.nobs,

~\miniconda3\envs\arima\lib\site-packages\pmdarima\arima\arima.py in _seasonal_prediction_with_confidence(arima_res, start, end, X, alpha, **kwargs)
     86     conf_int = results.conf_int(alpha=alpha)
     87     return check_endog(f, dtype=None, copy=False), \
---> 88         check_array(conf_int, copy=False, dtype=None)
     89 
     90 

~\miniconda3\envs\arima\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

~\miniconda3\envs\arima\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    718 
    719         if force_all_finite:
--> 720             _assert_all_finite(array,
    721                                allow_nan=force_all_finite == 'allow-nan')
    722 

~\miniconda3\envs\arima\lib\site-packages\sklearn\utils\validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
    101                 not allow_nan and not np.isfinite(X).all()):
    102             type_err = 'infinity' if allow_nan else 'NaN, infinity'
--> 103             raise ValueError(
    104                     msg_err.format
    105                     (type_err,

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Versions used:

System:
    python: 3.8.10 (default, May 19 2021, 13:12:57) [MSC v.1916 64 bit (AMD64)]
executable: C:\Users\shuvo\miniconda3\envs\arima\python.exe
   machine: Windows-10-10.0.19042-SP0

Python dependencies:
        pip: 21.1.1
 setuptools: 52.0.0.post20210125
    sklearn: 0.24.2
statsmodels: 0.12.2
      numpy: 1.19.5
      scipy: 1.6.3
     Cython: 0.29.23
     pandas: 1.2.4
     joblib: 1.0.1
   pmdarima: 1.8.2
Shuvo-saha commented 3 years ago

The trace looks like this:

Performing stepwise search to minimize aic
 ARIMA(2,0,2)(0,0,0)[0] intercept   : AIC=inf, Time=0.13 sec
 ARIMA(0,0,0)(0,0,0)[0] intercept   : AIC=289.456, Time=0.01 sec
 ARIMA(1,0,0)(0,0,0)[0] intercept   : AIC=286.625, Time=0.03 sec
 ARIMA(0,0,1)(0,0,0)[0] intercept   : AIC=292.002, Time=0.01 sec
 ARIMA(0,0,0)(0,0,0)[0]             : AIC=300.810, Time=0.00 sec
 ARIMA(2,0,0)(0,0,0)[0] intercept   : AIC=289.358, Time=0.06 sec
 ARIMA(1,0,1)(0,0,0)[0] intercept   : AIC=289.564, Time=0.06 sec
 ARIMA(2,0,1)(0,0,0)[0] intercept   : AIC=193.418, Time=0.21 sec
 ARIMA(3,0,1)(0,0,0)[0] intercept   : AIC=inf, Time=0.15 sec
 ARIMA(1,0,2)(0,0,0)[0] intercept   : AIC=inf, Time=0.10 sec
 ARIMA(3,0,0)(0,0,0)[0] intercept   : AIC=288.315, Time=0.11 sec
 ARIMA(3,0,2)(0,0,0)[0] intercept   : AIC=inf, Time=0.15 sec
 ARIMA(2,0,1)(0,0,0)[0]             : AIC=inf, Time=0.06 sec

Best model:  ARIMA(2,0,1)(0,0,0)[0] intercept
Total fit time: 1.097 seconds
tgsmith61591 commented 3 years ago

@Shuvo-saha I get a different model with your data, and cannot reproduce the error:

In [5]: test = [53930.25, 16575.5, 15593.1, 6751.15, 5408.95, 3853.0, 5119.9, 6761.55, 20449.1, 20458.05, 24501.8, 33300.4, 34285.9]
   ...: arima_model = auto_arima(test, trace=True)
   ...: arima_model.predict(n_periods=1)
Performing stepwise search to minimize aic
 ARIMA(2,0,2)(0,0,0)[0] intercept   : AIC=inf, Time=0.11 sec
 ARIMA(0,0,0)(0,0,0)[0] intercept   : AIC=289.456, Time=0.00 sec
 ARIMA(1,0,0)(0,0,0)[0] intercept   : AIC=286.625, Time=0.03 sec
 ARIMA(0,0,1)(0,0,0)[0] intercept   : AIC=292.002, Time=0.01 sec
 ARIMA(0,0,0)(0,0,0)[0]             : AIC=300.810, Time=0.00 sec
 ARIMA(2,0,0)(0,0,0)[0] intercept   : AIC=289.358, Time=0.05 sec
 ARIMA(1,0,1)(0,0,0)[0] intercept   : AIC=289.564, Time=0.05 sec
 ARIMA(2,0,1)(0,0,0)[0] intercept   : AIC=inf, Time=0.06 sec
 ARIMA(1,0,0)(0,0,0)[0]             : AIC=287.546, Time=0.02 sec

Best model:  ARIMA(1,0,0)(0,0,0)[0] intercept
Total fit time: 0.345 seconds

Out[5]: array([31676.81437161])
aakashparsi commented 3 years ago

Hi @joshi-abhishek, Is your issue resolved?

shanemcquillan1994 commented 3 years ago

I am currently facing this exact issue. Did you ever manage to resolve this? @joshi-abhishek @aakashparsi

Shuvo-saha commented 3 years ago

The problem happens due to extremely large errors when the autoARIMA can't find a good solution

zenoprod commented 3 years ago

The problem happens due to extremely large errors when the autoARIMA can't find a good solution

yes, maybe you are right my get wrong series look like this. [41.0, 65.0, 80.0, 67.0, 49.0, 53.0, 54.0, 61.0, 36.0, 40.0, 37.0, 48.0, 40.0, 37.0, 32.0, 40.0, 41.0, 28.0, 37.0, 37.0, 29.0, 25.0, 46.0, 28.0, 41.0, 42.0, 87.0, 106.0, 64.0, 0, 17.0, 28.0, 31.0, 44.0, 38.0, 29.0, 42.0, 16.0, 34.0, 69.0, 64.0, 29.0, 55.0, 62.0, 68.0, 52.0, 42.0, 41.0, 40.0, 42.0, 37.0, 43.0, 62.0, 55.0, 62.0, 66.0, 94.0, 82.0, 88.0, 50.0, 2.0] [i can't paste a picture ,this is the data(weekly)]

vinson2233 commented 2 years ago

so the solution is not to use autoarima if the series is difficult to forecast?

AlexanderLavelle commented 2 years ago

I believe this may be caused also by a prediction of a NaN or Inf... I have had some 'letting up' of the issue by using a scaling technique before modeling. However, I do believe this should not be a requirement (to scale) because I do want/need to test on unscaled before progressing to looking at the effects of scaling.

As this is still an issue (apparently in R as well for auto_arima), it would be great to have some ability to try/except within the function itself -- otherwise, when pipelined there is the potential for a breaking failure during cross-val.

Even in a software-engineered pipeline, a try/except block often fails as I have found the program considers itself separate from the try/except block...but perhaps I wasn't excepting ValueError specifically?

Algrasso commented 2 years ago

My case is peculiar. Auto arima was iterated after a group by on different IDs. Each ID had between 25 and 28 dates and the prediction was for a single day. It was working fine until one day it threw the ominous error in subject. After deep research it turned out that the ID causing the failure was made of 25 dates of which 2 were non consecutive. Removing those non consecutive dates fixed the issue. What I still do not understand is why that happened, since the auto arima is run on numerical arrays with no reference to dates...

tgsmith61591 commented 2 years ago

Seems possibly related to #492 (caused by potential statsmodels bug). We have an open bug with them we're watching

Algrasso commented 2 years ago

Hi, does anyone know if there is any update anywhere about the NaN issue? Thanks!

JainShubham23 commented 1 year ago

Even I am having the same issue. This is the versions I am using.. But interestingly, I am getting this same error whenever I am breaking my timeseries in a test and train frame.. If i am taking the whole series and trying to run it.. There is no such issue. image

thiel-ph commented 1 year ago

One workaround is to multiply the target series by any factor other than 1.

For example:

data = np.array([1872.0, 1452.0, 1476.0, 1404.0, 3048.0, 1788.0, 1080.0, 888.0, 2184.0, 2220.0, 1680.0,
612.0, 2124.0, 486.0, 1968.0, 924.0, 888.0, 1756.0, 1104.0, 876.0, 888.0, 1608.0, 1896.0,
648.0, 1524.0, 804.0, 816.0, 1944.0, 1512.0, 900.0, 1464.0, 876.0, 1464.0, 2136.0, 732.0, 
1764.0, 840.0, 1860.0, 792.0, 1728.0, 768.0, 1080.0, 876.0, 1716.0, 900.0, 1740.0, 888.0, 
2172.0, 486.0])

data *= 0.1
from pmdarima.arima import auto_arima

model_arima = auto_arima(data, start_p = 0, start_q = 0, max_p = 12, max_q = 12, m = 12, start_P = 0, start_Q = 0, seasonal = False, error_action = 'ignore', suppress_warnings = True, stepwise = True)
forecast_arima = model_arima.predict(n_periods = 18, return_conf_int = False, alpha = 0.05)

forecast_arima /= 0.1
arzaan789 commented 10 months ago

One workaround is to multiply the target series by any factor other than 1.

For example:

data = np.array([1872.0, 1452.0, 1476.0, 1404.0, 3048.0, 1788.0, 1080.0, 888.0, 2184.0, 2220.0, 1680.0,
612.0, 2124.0, 486.0, 1968.0, 924.0, 888.0, 1756.0, 1104.0, 876.0, 888.0, 1608.0, 1896.0,
648.0, 1524.0, 804.0, 816.0, 1944.0, 1512.0, 900.0, 1464.0, 876.0, 1464.0, 2136.0, 732.0, 
1764.0, 840.0, 1860.0, 792.0, 1728.0, 768.0, 1080.0, 876.0, 1716.0, 900.0, 1740.0, 888.0, 
2172.0, 486.0])

data *= 0.1
from pmdarima.arima import auto_arima

model_arima = auto_arima(data, start_p = 0, start_q = 0, max_p = 12, max_q = 12, m = 12, start_P = 0, start_Q = 0, seasonal = False, error_action = 'ignore', suppress_warnings = True, stepwise = True)
forecast_arima = model_arima.predict(n_periods = 18, return_conf_int = False, alpha = 0.05)

forecast_arima /= 0.1

This seems to work perfectly :D

bloukanov commented 3 months ago

I am getting this issue but it seems to be dependent on what machine I am running on, even if the application is Dockerized. The same code runs in the container on my Mac, but in Google Cloud, the error arises. Same code, same data, etc.

bloukanov commented 3 months ago

Updating the optimization method worked for me -- from default lbfgs to bfgs. It's slower and requires more memory, but luckily I don't have those constraints.

btw, for me the issue was I believe only occurring during cross_val_score, and would occur when it called estimator.predict() under the hood for each cv split. I tried both RollingForecastCV and SlidingWindowForecastCV.

Then I noticed it was giving:

ModelFitWarning: Estimator fit failed. The score on this train-test partition will be set to nan. Details: 
    numpy.linalg.LinAlgError: LU decomposition error.

So updating the optimization alg worked.