alkaline-ml / pmdarima

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
https://www.alkaline-ml.com/pmdarima
MIT License
1.57k stars 231 forks source link

ValueError when predicting #462

Closed pmoriano closed 2 years ago

pmoriano commented 2 years ago

Describe the bug

Hello, I am trying to predict a few values from a trained model, but got the below error. Please also find the code to replicate this later.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_234836/3650015805.py in <module>
      5 
      6 arima_model = pm.auto_arima(y=sample, error_action="ignore", supress_warnings=True)
----> 7 predictions = arima_model.predict(3)

~/.conda/envs/basic38/lib/python3.8/site-packages/pmdarima/arima/arima.py in predict(self, n_periods, X, return_conf_int, alpha, **kwargs)
    674         end = arima.nobs + n_periods - 1
    675 
--> 676         f, conf_int = _seasonal_prediction_with_confidence(
    677             arima_res=arima,
    678             start=arima.nobs,

~/.conda/envs/basic38/lib/python3.8/site-packages/pmdarima/arima/arima.py in _seasonal_prediction_with_confidence(arima_res, start, end, X, alpha, **kwargs)
     86     conf_int = results.conf_int(alpha=alpha)
     87     return check_endog(f, dtype=None, copy=False), \
---> 88         check_array(conf_int, copy=False, dtype=None)
     89 
     90 

~/.conda/envs/basic38/lib/python3.8/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

~/.conda/envs/basic38/lib/python3.8/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    718 
    719         if force_all_finite:
--> 720             _assert_all_finite(array,
    721                                allow_nan=force_all_finite == 'allow-nan')
    722 

~/.conda/envs/basic38/lib/python3.8/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
    101                 not allow_nan and not np.isfinite(X).all()):
    102             type_err = 'infinity' if allow_nan else 'NaN, infinity'
--> 103             raise ValueError(
    104                     msg_err.format
    105                     (type_err,

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

To Reproduce

import numpy as np
import pmdarima as pm

sample = np.array([63.75, 63.875, 64.44444444, 65., 64., 63., 62., 59.88888889, 53.33333333, 51.25, 48.875, 37.25])
print(np.isnan(sample)) # To see if there are NaNs. Do not see any. 

arima_model = pm.auto_arima(y=sample, error_action="ignore", supress_warnings=True)
predictions = arima_model.predict(3)

Versions

python=3.8.12
scikit-learn=0.24.2
statsmodels=0.13.0
mpdarima=1.8.2

Expected Behavior

An array with three predictions.

Actual Behavior

The error described above.

Additional Context

No response

aaronreidsmith commented 2 years ago

Can you post the full output of pm.show_versions()? I am unable to reproduce with the versions you provided

$ docker run --rm -it continuumio/miniconda3:4.10.3 /bin/bash
(base) root@eb34ce2b008d:/# conda create --name debug python=3.8.12
...
(base) root@eb34ce2b008d:/# conda activate debug
(debug) root@eb34ce2b008d:/# conda config --add channels conda-forge
(debug) root@eb34ce2b008d:/# conda config --set channel_priority strict
(debug) root@eb34ce2b008d:/# conda install pmdarima
...
(debug) root@eb34ce2b008d:/# conda install -c conda-forge scikit-learn=0.24.2 # Fix sklearn issue
(debug) root@eb34ce2b008d:/# python
Python 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:59:51) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import pmdarima as pm
>>>
>>> sample = np.array([63.75, 63.875, 64.44444444, 65., 64., 63., 62., 59.88888889, 53.33333333, 51.25, 48.875, 37.25])
>>> print(np.isnan(sample)) # To see if there are NaNs. Do not see any. 
[False False False False False False False False False False False False]
>>>
>>> arima_model = pm.auto_arima(y=sample, error_action="ignore", supress_warnings=True)
>>> predictions = arima_model.predict(3)
>>> predictions
array([30.29305912, 26.95415383, 26.27618976])
>>> pm.show_versions()

System:
    python: 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:59:51)  [GCC 9.4.0]
executable: /opt/conda/envs/debug/bin/python
   machine: Linux-5.10.47-linuxkit-x86_64-with-glibc2.10

Python dependencies:
        pip: 21.3
 setuptools: 49.6.0.post20210108
    sklearn: 0.24.2
statsmodels: 0.13.0
      numpy: 1.19.5
      scipy: 1.7.1
     Cython: 0.29.24
     pandas: 1.3.3
     joblib: 1.1.0
   pmdarima: 1.8.2
pmoriano commented 2 years ago

@aaronreidsmith. Thank you for your response. Below pm.show_versions() output.

System:
    python: 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:59:51)  [GCC 9.4.0]
executable: /home/8a6/.conda/envs/basic38/bin/python
   machine: Linux-3.10.0-1160.41.1.el7.x86_64-x86_64-with-glibc2.10

Python dependencies:
        pip: 21.3
 setuptools: 49.6.0.post20210108
    sklearn: 0.24.2
statsmodels: 0.13.0
      numpy: 1.19.5
      scipy: 1.7.1
     Cython: 0.29.24
     pandas: 1.3.3
     joblib: 1.1.0
   pmdarima: 1.8.2
pmoriano commented 2 years ago

@aaronreidsmith. Now it works. I deleted my old environment and created a new conda environment from scratch. pm.show_versions() shows the same as above. I, however, am not sure of the why. Thanks for the help.

aaronreidsmith commented 2 years ago

Glad you were able to get it figured out!

tgsmith61591 commented 2 years ago

I've still be unable to replicate this. Both examples provided (in this issue and in #464) produce predictions:

# this issue
>>> arima_model.predict(3)
# array([30.29305912, 26.95415383, 26.27618976])

# issue 464
>>> arima_model.predict(3)
# array([59.50482616 59.74060519 59.95876076])

Can you provide any other context or system info?

tgsmith61591 commented 2 years ago

I created a fresh conda env:

$ conda create python=3.8 -n pm-tmp

And was able to run the sample successfully:

In [1]: import numpy as np
   ...: import pmdarima as pm
   ...:
   ...: sample = np.array([63.75, 63.875, 64.44444444, 65., 64., 63., 62., 59.88888889, 53.33333333, 51.25, 48.875, 37.25])
   ...: print(np.isnan(sample)) # To see if there are NaNs. Do not see any.
   ...:
   ...: arima_model = pm.auto_arima(y=sample, error_action="ignore", supress_warnings=True)
   ...: predictions = arima_model.predict(3)

[False False False False False False False False False False False False]

In [2]:

In [2]: predictions
Out[2]: array([30.29305912, 26.95415383, 26.27618976])

Could you please install this exact env and try the example again? Trying to determine if this is an OS level issue.

  1. Copy this into environment.yml
  2. Install the env: conda env create -f environment.yml
name: pm-tmp
channels:
  - defaults
dependencies:
  - appnope=0.1.2=py38hecd8cb5_1001
  - backcall=0.2.0=pyhd3eb1b0_0
  - ca-certificates=2021.10.26=hecd8cb5_2
  - certifi=2021.10.8=py38hecd8cb5_0
  - decorator=5.1.0=pyhd3eb1b0_0
  - ipython=7.27.0=py38h01d92e1_0
  - jedi=0.18.0=py38hecd8cb5_1
  - libcxx=12.0.0=h2f01273_0
  - libffi=3.3=hb1e8313_2
  - matplotlib-inline=0.1.2=pyhd3eb1b0_2
  - ncurses=6.2=h0a44026_1
  - openssl=1.1.1l=h9ed2024_0
  - parso=0.8.2=pyhd3eb1b0_0
  - pexpect=4.8.0=pyhd3eb1b0_3
  - pickleshare=0.7.5=pyhd3eb1b0_1003
  - pip=21.2.4=py38hecd8cb5_0
  - prompt-toolkit=3.0.20=pyhd3eb1b0_0
  - ptyprocess=0.7.0=pyhd3eb1b0_2
  - pygments=2.10.0=pyhd3eb1b0_0
  - python=3.8.12=h88f2d9e_0
  - readline=8.1=h9ed2024_0
  - setuptools=58.0.4=py38hecd8cb5_0
  - sqlite=3.36.0=hce871da_0
  - tk=8.6.11=h7bc2e8c_0
  - traitlets=5.1.0=pyhd3eb1b0_0
  - wcwidth=0.2.5=pyhd3eb1b0_0
  - wheel=0.37.0=pyhd3eb1b0_1
  - xz=5.2.5=h1de35cc_0
  - zlib=1.2.11=h1de35cc_3
  - pip:
    - cython==0.29.24
    - joblib==1.1.0
    - numpy==1.21.3
    - pandas==1.3.4
    - patsy==0.5.2
    - pmdarima==1.8.3
    - python-dateutil==2.8.2
    - pytz==2021.3
    - scikit-learn==1.0.1
    - scipy==1.7.1
    - six==1.16.0
    - statsmodels==0.13.0
    - threadpoolctl==3.0.0
    - urllib3==1.26.7
prefix: /opt/miniconda3/envs/pm-tmp
tgsmith61591 commented 2 years ago

@pmoriano were you able to try this with the above env? ^

pmoriano commented 2 years ago

@tgsmith61591. Sorry for the late reply. Please look at #464 to see the data for which this is not working. I am putting that data here again. Thanks for the help.

import numpy as np
import pmdarima as pm

sample = np.array([65.375, 65.75, 66.11111111, 65.375, 66., 66.22222222, 66., 63.44444444, 62.375, 63.125, 60., 59.25])

arima_model = pm.auto_arima(y=sample, error_action="ignore", supress_warnings=True)
predictions = arima_model.predict(3)
print(predictions)
tgsmith61591 commented 2 years ago

Yep.. as mentioned here: https://github.com/alkaline-ml/pmdarima/issues/462#issuecomment-953764293 I was not able to replicate your error with that dataset. Can you please try the environment provided?

import numpy as np
import pmdarima as pm

sample = np.array([65.375, 65.75, 66.11111111, 65.375, 66., 66.22222222, 66., 63.44444444, 62.375, 63.125, 60., 59.25])

arima_model = pm.auto_arima(y=sample, error_action="ignore", supress_warnings=True)
predictions = arima_model.predict(3)

Out[3]: array([59.50482616, 59.74060519, 59.95876076])
pmoriano commented 2 years ago

@tgsmith61591 Thanks for your reply. I tried what you suggested but got the following. Any idea?

Collecting package metadata: done
Solving environment: failed

ResolvePackageNotFound: 
  - certifi==2021.10.8=py38hecd8cb5_0
  - appnope==0.1.2=py38hecd8cb5_1001
  - tk==8.6.11=h7bc2e8c_0
  - ipython==7.27.0=py38h01d92e1_0
  - setuptools==58.0.4=py38hecd8cb5_0
  - zlib==1.2.11=h1de35cc_3
  - pip==21.2.4=py38hecd8cb5_0
  - readline==8.1=h9ed2024_0
  - jedi==0.18.0=py38hecd8cb5_1
  - sqlite==3.36.0=hce871da_0
  - libcxx==12.0.0=h2f01273_0
  - ncurses==6.2=h0a44026_1
  - python==3.8.12=h88f2d9e_0
  - ca-certificates==2021.10.26=hecd8cb5_2
  - xz==5.2.5=h1de35cc_0
  - libffi==3.3=hb1e8313_2
  - openssl==1.1.1l=h9ed2024_0

This is my OS info:

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"