Open joshi-abhishek opened 3 years ago
Can you try updating your version? This works on 1.8.0:
In [1]: data = [1872.0, 1452.0, 1476.0, 1404.0, 3048.0, 1788.0, 1080.0, 888.0, 2184.0, 2220.0, 1680.0,
...: 612.0, 2124.0, 486.0, 1968.0, 924.0, 888.0, 1756.0, 1104.0, 876.0, 888.0, 1608.0, 1896.0,
...: 648.0, 1524.0, 804.0, 816.0, 1944.0, 1512.0, 900.0, 1464.0, 876.0, 1464.0, 2136.0, 732.0,
...: 1764.0, 840.0, 1860.0, 792.0, 1728.0, 768.0, 1080.0, 876.0, 1716.0, 900.0, 1740.0, 888.0,
...: 2172.0, 486.0]
In [2]: from pmdarima.arima import auto_arima
...:
...: model_arima = auto_arima(data, start_p = 0, start_q = 0, max_p = 12, max_q = 12, m = 12, start_P = 0, start_Q = 0, seasonal = False, error_action = 'ignore', suppress_warnings = True, stepwise = True)
...: forecast_arima = model_arima.predict(n_periods = 18, return_conf_int = False, alpha = 0.05)
In [3]: forecast_arima
Out[3]:
array([1742.03281905, 1038.44297599, 1677.5632002 , 1122.01177781,
1504.58931217, 1021.85945799, 1588.49173444, 1202.38369947,
1480.27656245, 1170.41755339, 1407.33114539, 1250.95355177,
1452.51653705, 1248.930108 , 1375.22988857, 1258.86745029,
1391.23966826, 1303.01297922])
pip install --upgrade pmdarima
Its the same error even after the pmdarima upgrade..
import pmdarima; pmdarima.show_versions()
System:
python: 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0]
executable: /opt/anaconda/envs/shared/bin/python
machine: Linux-4.4.0-1114-aws-x86_64-with-debian-stretch-sid
Python dependencies:
pip: 20.2.3
setuptools: 49.6.0.post20200917
sklearn: 0.23.2
statsmodels: 0.12.1
numpy: 1.19.1
scipy: 1.5.2
Cython: 0.29.17
pandas: 0.25.3
joblib: 0.16.0
pmdarima: 1.8.0
Error
380 model_arima = auto_arima(data_tra, start_p = 0, start_q = 0, max_p = 12, max_q = 12, m = 12, start_P = 0, start_Q = 0, seasonal = False, error_action = 'ignore', suppress_warnings = True)
--> 381 forecast_arima = model_arima.predict(n_periods = len(tes), return_conf_int = False, alpha = ci_alpha)
/opt/anaconda/envs/shared/lib/python3.7/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
97 msg_err.format
98 (type_err,
---> 99 msg_dtype if msg_dtype is not None else X.dtype)
100 )
101 # for object dtype data, we only check for NaNs (GH-13254)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
Could you let me know your supporting libraries versions as well with pmdarima.show_versions()? I read somewhere the pandas & statsmodel versions also matters.
In [3]: pm.show_versions()
System:
python: 3.7.9 (default, Nov 18 2020, 14:10:47) [GCC 8.3.0]
executable: /usr/local/bin/python
machine: Linux-5.4.39-linuxkit-x86_64-with-debian-10.6
Python dependencies:
pip: 20.3.1
setuptools: 50.3.2
sklearn: 0.23.2
statsmodels: 0.12.1
numpy: 1.19.4
scipy: 1.5.4
Cython: 0.29.17
pandas: 1.1.5
joblib: 0.17.0
pmdarima: 1.8.0
Keep in mind if you're having environmental issues, you can always use the docker image, and mount a volume wherever you want to save your model:
$ docker run --rm -it alkalineml/pmdarima:latest
This still an issue @joshi-abhishek ?
Yes.. I am trying out in different machines to check if this is actually an issue..and then we'd have a root cause identified.
I'm facing a similar issue with data that looks like this
test = [53930.25, 16575.5, 15593.1, 6751.15, 5408.95, 3853.0, 5119.9, 6761.55, 20449.1, 20458.05, 24501.8, 33300.4, 34285.9]
arima_model = auto_arima(test)
arima_model.predict(n_periods=1)
The error message:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-264-f624bf4d9f84> in <module>
1 test = [53930.25, 16575.5, 15593.1, 6751.15, 5408.95, 3853.0, 5119.9, 6761.55, 20449.1, 20458.05, 24501.8, 33300.4, 34285.9]
2 arima_model = auto_arima(test)
----> 3 arima_model.predict(n_periods=1)
~\miniconda3\envs\arima\lib\site-packages\pmdarima\arima\arima.py in predict(self, n_periods, X, return_conf_int, alpha, **kwargs)
674 end = arima.nobs + n_periods - 1
675
--> 676 f, conf_int = _seasonal_prediction_with_confidence(
677 arima_res=arima,
678 start=arima.nobs,
~\miniconda3\envs\arima\lib\site-packages\pmdarima\arima\arima.py in _seasonal_prediction_with_confidence(arima_res, start, end, X, alpha, **kwargs)
86 conf_int = results.conf_int(alpha=alpha)
87 return check_endog(f, dtype=None, copy=False), \
---> 88 check_array(conf_int, copy=False, dtype=None)
89
90
~\miniconda3\envs\arima\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
~\miniconda3\envs\arima\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
718
719 if force_all_finite:
--> 720 _assert_all_finite(array,
721 allow_nan=force_all_finite == 'allow-nan')
722
~\miniconda3\envs\arima\lib\site-packages\sklearn\utils\validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
101 not allow_nan and not np.isfinite(X).all()):
102 type_err = 'infinity' if allow_nan else 'NaN, infinity'
--> 103 raise ValueError(
104 msg_err.format
105 (type_err,
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
Versions used:
System:
python: 3.8.10 (default, May 19 2021, 13:12:57) [MSC v.1916 64 bit (AMD64)]
executable: C:\Users\shuvo\miniconda3\envs\arima\python.exe
machine: Windows-10-10.0.19042-SP0
Python dependencies:
pip: 21.1.1
setuptools: 52.0.0.post20210125
sklearn: 0.24.2
statsmodels: 0.12.2
numpy: 1.19.5
scipy: 1.6.3
Cython: 0.29.23
pandas: 1.2.4
joblib: 1.0.1
pmdarima: 1.8.2
The trace looks like this:
Performing stepwise search to minimize aic
ARIMA(2,0,2)(0,0,0)[0] intercept : AIC=inf, Time=0.13 sec
ARIMA(0,0,0)(0,0,0)[0] intercept : AIC=289.456, Time=0.01 sec
ARIMA(1,0,0)(0,0,0)[0] intercept : AIC=286.625, Time=0.03 sec
ARIMA(0,0,1)(0,0,0)[0] intercept : AIC=292.002, Time=0.01 sec
ARIMA(0,0,0)(0,0,0)[0] : AIC=300.810, Time=0.00 sec
ARIMA(2,0,0)(0,0,0)[0] intercept : AIC=289.358, Time=0.06 sec
ARIMA(1,0,1)(0,0,0)[0] intercept : AIC=289.564, Time=0.06 sec
ARIMA(2,0,1)(0,0,0)[0] intercept : AIC=193.418, Time=0.21 sec
ARIMA(3,0,1)(0,0,0)[0] intercept : AIC=inf, Time=0.15 sec
ARIMA(1,0,2)(0,0,0)[0] intercept : AIC=inf, Time=0.10 sec
ARIMA(3,0,0)(0,0,0)[0] intercept : AIC=288.315, Time=0.11 sec
ARIMA(3,0,2)(0,0,0)[0] intercept : AIC=inf, Time=0.15 sec
ARIMA(2,0,1)(0,0,0)[0] : AIC=inf, Time=0.06 sec
Best model: ARIMA(2,0,1)(0,0,0)[0] intercept
Total fit time: 1.097 seconds
@Shuvo-saha I get a different model with your data, and cannot reproduce the error:
In [5]: test = [53930.25, 16575.5, 15593.1, 6751.15, 5408.95, 3853.0, 5119.9, 6761.55, 20449.1, 20458.05, 24501.8, 33300.4, 34285.9]
...: arima_model = auto_arima(test, trace=True)
...: arima_model.predict(n_periods=1)
Performing stepwise search to minimize aic
ARIMA(2,0,2)(0,0,0)[0] intercept : AIC=inf, Time=0.11 sec
ARIMA(0,0,0)(0,0,0)[0] intercept : AIC=289.456, Time=0.00 sec
ARIMA(1,0,0)(0,0,0)[0] intercept : AIC=286.625, Time=0.03 sec
ARIMA(0,0,1)(0,0,0)[0] intercept : AIC=292.002, Time=0.01 sec
ARIMA(0,0,0)(0,0,0)[0] : AIC=300.810, Time=0.00 sec
ARIMA(2,0,0)(0,0,0)[0] intercept : AIC=289.358, Time=0.05 sec
ARIMA(1,0,1)(0,0,0)[0] intercept : AIC=289.564, Time=0.05 sec
ARIMA(2,0,1)(0,0,0)[0] intercept : AIC=inf, Time=0.06 sec
ARIMA(1,0,0)(0,0,0)[0] : AIC=287.546, Time=0.02 sec
Best model: ARIMA(1,0,0)(0,0,0)[0] intercept
Total fit time: 0.345 seconds
Out[5]: array([31676.81437161])
Hi @joshi-abhishek, Is your issue resolved?
I am currently facing this exact issue. Did you ever manage to resolve this? @joshi-abhishek @aakashparsi
The problem happens due to extremely large errors when the autoARIMA can't find a good solution
The problem happens due to extremely large errors when the autoARIMA can't find a good solution
yes, maybe you are right my get wrong series look like this. [41.0, 65.0, 80.0, 67.0, 49.0, 53.0, 54.0, 61.0, 36.0, 40.0, 37.0, 48.0, 40.0, 37.0, 32.0, 40.0, 41.0, 28.0, 37.0, 37.0, 29.0, 25.0, 46.0, 28.0, 41.0, 42.0, 87.0, 106.0, 64.0, 0, 17.0, 28.0, 31.0, 44.0, 38.0, 29.0, 42.0, 16.0, 34.0, 69.0, 64.0, 29.0, 55.0, 62.0, 68.0, 52.0, 42.0, 41.0, 40.0, 42.0, 37.0, 43.0, 62.0, 55.0, 62.0, 66.0, 94.0, 82.0, 88.0, 50.0, 2.0] [i can't paste a picture ,this is the data(weekly)]
so the solution is not to use autoarima if the series is difficult to forecast?
I believe this may be caused also by a prediction of a NaN or Inf... I have had some 'letting up' of the issue by using a scaling technique before modeling. However, I do believe this should not be a requirement (to scale) because I do want/need to test on unscaled before progressing to looking at the effects of scaling.
As this is still an issue (apparently in R as well for auto_arima
), it would be great to have some ability to try/except within the function itself -- otherwise, when pipelined there is the potential for a breaking failure during cross-val.
Even in a software-engineered pipeline, a try/except block often fails as I have found the program considers itself separate from the try/except block...but perhaps I wasn't excepting ValueError specifically?
My case is peculiar. Auto arima was iterated after a group by on different IDs. Each ID had between 25 and 28 dates and the prediction was for a single day. It was working fine until one day it threw the ominous error in subject. After deep research it turned out that the ID causing the failure was made of 25 dates of which 2 were non consecutive. Removing those non consecutive dates fixed the issue. What I still do not understand is why that happened, since the auto arima is run on numerical arrays with no reference to dates...
Seems possibly related to #492 (caused by potential statsmodels bug). We have an open bug with them we're watching
Hi, does anyone know if there is any update anywhere about the NaN issue? Thanks!
Even I am having the same issue. This is the versions I am using.. But interestingly, I am getting this same error whenever I am breaking my timeseries in a test and train frame.. If i am taking the whole series and trying to run it.. There is no such issue.
One workaround is to multiply the target series by any factor other than 1.
For example:
data = np.array([1872.0, 1452.0, 1476.0, 1404.0, 3048.0, 1788.0, 1080.0, 888.0, 2184.0, 2220.0, 1680.0,
612.0, 2124.0, 486.0, 1968.0, 924.0, 888.0, 1756.0, 1104.0, 876.0, 888.0, 1608.0, 1896.0,
648.0, 1524.0, 804.0, 816.0, 1944.0, 1512.0, 900.0, 1464.0, 876.0, 1464.0, 2136.0, 732.0,
1764.0, 840.0, 1860.0, 792.0, 1728.0, 768.0, 1080.0, 876.0, 1716.0, 900.0, 1740.0, 888.0,
2172.0, 486.0])
data *= 0.1
from pmdarima.arima import auto_arima
model_arima = auto_arima(data, start_p = 0, start_q = 0, max_p = 12, max_q = 12, m = 12, start_P = 0, start_Q = 0, seasonal = False, error_action = 'ignore', suppress_warnings = True, stepwise = True)
forecast_arima = model_arima.predict(n_periods = 18, return_conf_int = False, alpha = 0.05)
forecast_arima /= 0.1
One workaround is to multiply the target series by any factor other than 1.
For example:
data = np.array([1872.0, 1452.0, 1476.0, 1404.0, 3048.0, 1788.0, 1080.0, 888.0, 2184.0, 2220.0, 1680.0, 612.0, 2124.0, 486.0, 1968.0, 924.0, 888.0, 1756.0, 1104.0, 876.0, 888.0, 1608.0, 1896.0, 648.0, 1524.0, 804.0, 816.0, 1944.0, 1512.0, 900.0, 1464.0, 876.0, 1464.0, 2136.0, 732.0, 1764.0, 840.0, 1860.0, 792.0, 1728.0, 768.0, 1080.0, 876.0, 1716.0, 900.0, 1740.0, 888.0, 2172.0, 486.0]) data *= 0.1 from pmdarima.arima import auto_arima model_arima = auto_arima(data, start_p = 0, start_q = 0, max_p = 12, max_q = 12, m = 12, start_P = 0, start_Q = 0, seasonal = False, error_action = 'ignore', suppress_warnings = True, stepwise = True) forecast_arima = model_arima.predict(n_periods = 18, return_conf_int = False, alpha = 0.05) forecast_arima /= 0.1
This seems to work perfectly :D
I am getting this issue but it seems to be dependent on what machine I am running on, even if the application is Dockerized. The same code runs in the container on my Mac, but in Google Cloud, the error arises. Same code, same data, etc.
Updating the optimization method worked for me -- from default lbfgs
to bfgs
. It's slower and requires more memory, but luckily I don't have those constraints.
btw, for me the issue was I believe only occurring during cross_val_score
, and would occur when it called estimator.predict()
under the hood for each cv split. I tried both RollingForecastCV
and SlidingWindowForecastCV
.
Then I noticed it was giving:
ModelFitWarning: Estimator fit failed. The score on this train-test partition will be set to nan. Details:
numpy.linalg.LinAlgError: LU decomposition error.
So updating the optimization alg worked.
Describe the bug The method abruptly exit with the below error... ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
But The data is clean and no sign of any reported behavior above.
To Reproduce
Steps to reproduce the behavior:
data:
Code:
Versions
Expected behavior There should be no error.
Actual behavior
Additional context