alkaline-ml / pmdarima

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
https://www.alkaline-ml.com/pmdarima
MIT License
1.57k stars 232 forks source link

different results across machines #52

Closed germayneng closed 5 years ago

germayneng commented 5 years ago

when creating different environment / across machines I noticed that auto arima returns different results despite having the same statsmodel version and pmdarima version. Are there any modules i should also take note?

In particular i am on the latest pmdarima 1.0.0 and statsmodel 0.9.0. I also noticed that with base environment vs venv, again with same module version of pmd and statsmodel, auto arima results seems to be different.

charlesdrotar commented 5 years ago

@germayneng If possible could you provide something in the way of screenshots, code snippets, pip freezes of your environments to debug version differences, as well as any other information you may find will help us debug.

On a high level the nature of the problem seems quite simple (not necessarily meaning that the root cause is simply explained), but understanding more of the particular nuances of your environment will help with a more focused effort in addressing the issue.

In short:

helpme_helpyou

:smile:

germayneng commented 5 years ago

:) Thanks for the prompt reply:

This is from (base) environment

#

Name Version Build Channel

_ipyw_jlab_nb_ext_conf 0.1.0 py36he6757f0_0 alabaster 0.7.10 py36hcd07829_0 altgraph 0.15 py_0 conda-forge anaconda-client 1.6.14 py36_0 anaconda-navigator 1.9.2 py36_0 anaconda-project 0.8.2 py36hfad2e28_0 asn1crypto 0.24.0 py36_0 astroid 1.6.3 py36_0 astropy 3.0.2 py36h452e1ab_1 attrs 18.1.0 py36_0 babel 2.5.3 py36_0 backcall 0.1.0 py36_0 backports 1.0 py36h81696a8_1 backports.shutil_get_terminal_size 1.0.0 py36h79ab834_2 beautifulsoup4 4.6.0 py36hd4cc5e8_1 bitarray 0.8.1 py36hfa6e2cd_1 bkcharts 0.2 py36h7e685f7_0 blas 1.0 mkl blaze 0.11.3 py36h8a29ca5_0 bleach 2.1.3 py36_0 blosc 1.14.3 he51fdeb_0 bokeh 0.12.16 py36_0 boto 2.48.0 py36h1a776d2_1 boto3 1.7.79 botocore 1.10.79 bottleneck 1.2.1 py36hd119dfa_0 bzip2 1.0.6 hfa6e2cd_5 ca-certificates 2018.03.07 0 certifi 2018.10.15 py36_0 cffi 1.11.5 py36h945400d_0 chardet 3.0.4 py36h420ce6e_1 click 6.7 py36hec8c647_0 cloudpickle 0.5.3 py36_0 clyent 1.2.2 py36hb10d595_1 colorama 0.3.9 py36h029ae33_0 comtypes 1.1.4 py36_0 conda 4.5.11 py36_0 conda-build 3.10.5 py36_0 conda-env 2.6.0 h36134e3_1 conda-verify 2.0.0 py36h065de53_0 console_shortcut 0.1.1 h6bb2dd7_3 contextlib2 0.5.5 py36he5d52c0_0 cryptography 2.2.2 py36hfa6e2cd_0 curl 7.60.0 h7602738_0 cycler 0.10.0 py36h009560c_0 cython 0.28.2 py36hfa6e2cd_0 cytoolz 0.9.0.1 py36hfa6e2cd_0 dash 0.21.0 dash-core-components 0.22.1 dash-html-components 0.10.0 dash-renderer 0.12.1 dash-table-experiments 0.6.0 dask 0.17.5 py36_0 dask-core 0.17.5 py36_0 datashape 0.5.4 py36h5770b85_0 decorator 4.3.0 py36_0 dill 0.2.8.2 py36_0 distributed 1.21.8 py36_0 docutils 0.14 py36h6012d8f_0 entrypoints 0.2.3 py36hfd66bb0_2 et_xmlfile 1.0.1 py36h3d2d736_0 fast-histogram 0.5 py36h452e1ab_1 fastcache 1.0.2 py36hfa6e2cd_2 feather-format 0.4.0 filelock 3.0.4 py36_0 flask 1.0.2 py36_1 Flask-Compress 1.4.0 flask-cors 3.0.4 py36_0 freetype 2.8 h51f8f2c_1 future 0.17.0 py36_1000 conda-forge get_terminal_size 1.0.0 h38e98db_0 gevent 1.3.0 py36hfa6e2cd_0 glob2 0.6 py36hdf76b57_0 glue-core 0.13.4 py36_0 glue-vispy-viewers 0.10 py36_0 glueviz 0.13.3 0 graphviz 0.10.1 graphviz 2.38 hfd603c8_2 greenlet 0.4.13 py36hfa6e2cd_0 h5py 2.7.1 py36h3bdd7fb_2 hdf5 1.10.2 hac2f561_1 heapdict 1.0.0 py36_2 html5lib 1.0.1 py36h047fa9f_0 icc_rt 2017.0.4 h97af966_0 icu 58.2 ha66f8fd_1 idna 2.6 py36h148d497_1 imageio 2.3.0 py36_0 imagesize 1.0.0 py36_0 intel-openmp 2018.0.0 8 ipykernel 4.8.2 py36_0 ipython 6.4.0 py36_0 ipython_genutils 0.2.0 py36h3c5d0ee_0 ipywidgets 7.2.1 py36_0 isort 4.3.4 py36_0 itsdangerous 0.24 py36hb6c5a24_1 jdcal 1.4 py36_0 jedi 0.12.0 py36_1 jinja2 2.10 py36h292fed1_0 jmespath 0.9.3 jpeg 9b hb83a4c4_2 jsonschema 2.6.0 py36h7636477_0 jupyter 1.0.0 py36_4 jupyter_client 5.2.3 py36_0 jupyter_console 5.2.0 py36h6d89b47_1 jupyter_contrib_core 0.3.3 py_2 conda-forge jupyter_contrib_nbextensions 0.5.0 py36_0 conda-forge jupyter_core 4.4.0 py36h56e9d50_0 jupyter_highlight_selected_word 0.2.0 py36_0 conda-forge jupyter_latex_envs 1.4.4 py36_0 conda-forge jupyter_nbextensions_configurator 0.4.0 py36_0 conda-forge jupyterlab 0.32.1 py36_0 conda-forge jupyterlab_launcher 0.10.5 py36_0 keyring 16.1.0 py36_0 kiwisolver 1.0.1 py36h12c3424_0 lazy-object-proxy 1.3.1 py36hd1c21d2_0 libcurl 7.60.0 hc4dcbb0_0 libiconv 1.15 h1df5818_7 libpng 1.6.34 h79bbb47_0 libsodium 1.0.16 h9d3ae62_0 libssh2 1.8.0 hd619d38_4 libtiff 4.0.9 hb8ad9f9_1 libxml2 2.9.8 hadb2253_1 libxslt 1.1.32 hf6f1972_0 lightgbm 2.2.2 lime 0.1.1.32 llvmlite 0.23.1 py36hcacf6c6_0 locket 0.2.0 py36hfed976d_1 lxml 4.2.1 py36heafd4d3_0 lzo 2.10 h6df0209_2 m2w64-gcc-libgfortran 5.3.0 6 m2w64-gcc-libs 5.3.0 7 m2w64-gcc-libs-core 5.3.0 7 m2w64-gmp 6.1.0 2 m2w64-libwinpthread-git 5.0.0.4634.697f757 2 macholib 1.11 py_0 conda-forge markupsafe 1.0 py36h0e26971_1 matplotlib 2.2.2 py36h153e9ff_1 mccabe 0.6.1 py36hb41005a_1 menuinst 1.4.14 py36hfa6e2cd_0 mistune 0.8.3 py36hfa6e2cd_1 mkl 2018.0.2 1 mkl-service 1.1.2 py36h57e144c_4 mkl_fft 1.0.1 py36h452e1ab_0 mkl_random 1.0.1 py36h9258bd6_0 mlcrate 0.1.0 more-itertools 4.1.0 py36_0 mpl-scatter-density 0.4 py36_0 mpmath 1.0.0 py36hacc8adf_2 msgpack-python 0.5.6 py36he980bc4_0 msys2-conda-epoch 20160418 1 multipledispatch 0.5.0 py36_0 multiprocess 0.70.6.1 navigator-updater 0.2.1 py36_0 nbconvert 5.3.1 py36h8dc0fde_0 nbformat 4.4.0 py36h3a5bc1b_0 networkx 2.1 py36_0 nltk 3.3.0 py36_0 nose 1.3.7 py36h1c3779e_2 notebook 5.5.0 py36_0 numba 0.38.0 py36h830ac7b_0 numexpr 2.6.5 py36hcd2f87e_0 numpy 1.14.3 py36h9fa60d3_1 numpy-base 1.14.3 py36h555522e_1 numpydoc 0.8.0 py36_0 odo 0.5.1 py36h7560279_0 olefile 0.45.1 py36_0 openpyxl 2.5.3 py36_0 openssl 1.0.2p hfa6e2cd_0 packaging 17.1 py36_0 pandas 0.23.0 py36h830ac7b_0 pandas-datareader 0.6.0 pandoc 1.19.2.1 hb2460c7_1 pandocfilters 1.4.2 py36h3ef6317_1 parso 0.2.0 py36_0 partd 0.3.8 py36hc8e763b_0 path.py 11.0.1 py36_0 pathlib2 2.3.2 py36_0 pathos 0.2.2.1 patsy 0.5.0 py36_0 pefile 2018.8.8 py_0 conda-forge pep8 1.7.1 py36_0 pickleshare 0.7.4 py36h9de030f_0 pillow 5.1.0 py36h0738816_0 pip 10.0.1 py36_0 pip 18.0 pkginfo 1.4.2 py36_1 plotly 3.4.1 py36h28b3542_0 plotly 2.5.1 pluggy 0.6.0 py36hc7daf1e_0 ply 3.11 py36_0 pmdarima 1.0.0 pox 0.2.4 ppft 1.6.4.8 prompt_toolkit 1.0.15 py36h60b8f86_0 psutil 5.4.5 py36hfa6e2cd_0 py 1.5.3 py36_0 pyarrow 0.10.0 pycodestyle 2.4.0 py36_0 pycosat 0.6.3 py36h413d8a4_0 pycparser 2.18 py36hd053e01_1 pycrypto 2.6.1 py36hfa6e2cd_8 pycurl 7.43.0.1 py36h74b6da3_0 pyflakes 1.6.0 py36h0b975d6_0 pygments 2.2.0 py36hb010967_0 pyinstaller 3.4 py36h7602738_0 conda-forge pylint 1.8.4 py36_0 pyodbc 4.0.23 py36h6538335_0 pyopengl 3.1.1a1 py36_0 pyopenssl 18.0.0 py36_0 pyparsing 2.2.0 py36h785a196_1 pyplotz 0.24 pyqt 5.9.2 py36h1aa27d4_0 PyQt5 5.11.3 PyQt5_sip 4.19.13 pyreadline 2.1 py36_1 pyreadline 2.1 pysocks 1.6.8 py36_0 pytables 3.4.3 py36he6f6034_1 pytest 3.5.1 py36_0 pytest-arraydiff 0.2 py36_0 pytest-astropy 0.3.0 py36_0 pytest-doctestplus 0.1.3 py36_0 pytest-openfiles 0.3.0 py36_0 pytest-remotedata 0.2.1 py36_0 python 3.6.5 h0c2934d_0 python-dateutil 2.7.3 py36_0 python-dateutil 2.7.2 pytz 2018.4 py36_0 pywavelets 0.5.2 py36hc649158_0 pywin32 223 py36hfa6e2cd_1 anaconda pywin32-ctypes 0.2.0 py36_1000 conda-forge pywinpty 0.5.1 py36_0 pyyaml 3.12 py36h1d1928f_1 pyzmq 17.0.0 py36hfa6e2cd_1 qt 5.9.5 vc14he4a7d60_0 [vc14] anaconda qtawesome 0.4.4 py36h5aa48f6_0 qtconsole 4.3.1 py36h99a29a9_0 qtpy 1.4.1 py36_0 requests 2.18.4 py36h4371aae_1 requests-file 1.4.3 requests-ftp 0.3.1 retrying 1.3.3 py36_2 rope 0.10.7 py36had63a69_0 ruamel_yaml 0.15.35 py36hfa6e2cd_1 s3transfer 0.1.13 scikit-image 0.13.1 py36hfa6e2cd_1 scikit-learn 0.19.1 py36h53aea1b_0 scipy 1.1.0 py36h672f292_0 seaborn 0.8.1 py36h9b69545_0 send2trash 1.5.0 py36_0 setuptools 39.1.0 py36_0 shap 0.23.1 simplegeneric 0.8.1 py36_2 singledispatch 3.4.0.3 py36h17d0c80_0 sip 4.19.8 py36h6538335_0 six 1.11.0 py36h4db2310_1 snappy 1.1.7 h777316e_3 snowballstemmer 1.2.1 py36h763602f_0 sortedcollections 0.6.1 py36_0 sortedcontainers 1.5.10 py36_0 sphinx 1.7.4 py36_0 sphinxcontrib 1.0 py36hbbac3d2_1 sphinxcontrib-websupport 1.0.1 py36hb5e5916_1 spyder 3.3.1 py36_1 spyder-kernels 0.2.6 py36_0 sqlalchemy 1.2.7 py36ha85dd04_0 sqlite 3.23.1 h35aae40_0 statsmodels 0.9.0 py36h452e1ab_0 sympy 1.1.1 py36h96708e0_0 tblib 1.3.2 py36h30f5020_0 terminado 0.8.1 py36_1 testpath 0.3.1 py36h2698cfe_0 tk 8.6.7 hcb92d03_3 toolz 0.9.0 py36_0 tornado 5.0.2 py36_0 tqdm 4.23.4 traitlets 4.3.2 py36h096827d_0 typing 3.6.4 py36_0 unicodecsv 0.14.1 py36h6450c06_0 urllib3 1.22 py36h276f60a_0 vc 14 h0510ff6_3 vs2015_runtime 14.0.25123 3 wcwidth 0.1.7 py36h3d5aa90_0 webencodings 0.5.1 py36h67c50ae_1 werkzeug 0.14.1 py36_0 wheel 0.31.1 py36_0 widgetsnbextension 3.2.1 py36_0 win_inet_pton 1.0.1 py36he67d7fd_1 win_unicode_console 0.5 py36hcdbd4b5_0 wincertstore 0.2 py36h7fe50ca_0 winpty 0.4.3 4 wrapt 1.10.11 py36he5f5981_0 xlrd 1.1.0 py36h1cb58dc_1 xlsxwriter 1.0.4 py36_0 xlwings 0.11.8 py36_0 xlwt 1.3.0 py36h1a4751e_0 yaml 0.1.7 hc54c509_2 zeromq 4.2.5 hc6251cf_0 zict 0.1.3 py36h2d8e73e_0 zlib 1.2.11 h8395fce_2

For the venv:

#

Name Version Build Channel

certifi 2018.10.15 py36_0 Cython 0.29.1 lightgbm 2.2.2 numpy 1.14.3 pandas 0.23.0 patsy 0.5.0 pip 18.1 py36_0 pmdarima 1.0.0 python 3.6.5 h0c2934d_0 python-dateutil 2.7.5 pytz 2018.7 scikit-learn 0.19.1 scipy 1.1.0 setuptools 40.6.2 py36_0 six 1.11.0 statsmodels 0.9.0 tqdm 4.23.4 vc 14.1 h0510ff6_4 vs2015_runtime 14.15.26706 h3a45250_0 wheel 0.32.3 py36_0 wincertstore 0.2 py36h7fe50ca_0

The arima code i used:

def arima_group(df):

    try:

        output2 = pd.DataFrame()
        for i in range(12,len(df)):
            output = pd.DataFrame()
            temp = df.iloc[:i,]

            if i == 12:
                stepwise_model = pmd.auto_arima(temp['Units'],start_p=0, start_q=0, start_P = 0, start_Q = 0,
                               m=12,
                               seasonal=True,trace=False,
                               error_action='ignore',  
                               suppress_warnings=True, 
                               stepwise=True)
            model = stepwise_model.fit(temp['Units'])
            result = model.predict(n_periods = 1)
            #output['arima'] = [result[len(result)-1]]
            output['arima'] = result
            output['Date'] = df.Date.iloc[i]
            output2 = pd.concat([output2,output], axis = 0)
            output2.reset_index(inplace = True,drop = True)
        return output2
    except:
        pass

I am not sure if the code matters. I applied the function using apply based a the grouping of the data to generate a one step arima forecast each step. between the base vs venv or even in AWS (which i create the same venv) there are differences for the arima output.

tgsmith61591 commented 5 years ago

Initially I wondered whether you had stepwise=False and it was a seeding issue, but it doesn't appear that way... Can you share the output and how it differs? Would also be nice to have some data to try to replicate this with. You don't need to send an entire file, just if you could copy 100 or so samples (that cause this behavior) into the issue.

As a side note and completely unrelated, you don't need to re-fit the result of autio_arima as it is already fit :)

germayneng commented 5 years ago

@tgsmith61591 let me try to get the sample so we can replicate.

Regarding refitting, the concept is that I have data of 34 months (2016-2018). I started to train the arima on the first 12, do a forecast of a step forward. Then i will update the model by refitting the first 13 months , do a forecast of a step forward. Repeat till end of month.

tgsmith61591 commented 5 years ago

@germayneng any update on the sample for us to try to replicate?

tgsmith61591 commented 5 years ago

Closing due to inactivity

thanasions commented 5 years ago

we have the same issue on our software

newini commented 2 years ago

I have the same issue too

some-guy1 commented 1 year ago

Hi, I had the same issue as well, but it was fixed by increasing maxiter from 50 to 150. Now I have identical results on Windows, Mac, and Linux.