alkaline-ml / pmdarima

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
https://www.alkaline-ml.com/pmdarima
MIT License
1.59k stars 234 forks source link

Version 1.5.0 seems to have introduced StepwiseContext issue #271

Closed NoahLiot closed 4 years ago

NoahLiot commented 4 years ago

Describe the bug Running pmdarima >= 1.5.0 with Cython >= 0.29.13 seems to return an attribute error when the auto_arima method is called.

To Reproduce
Steps to reproduce the behaviour:

model = auto_arima(**params)
preds, conf = model.predict(n_periods=forecast, return_conf_int=True, alpha=0.05)

Versions

Python 3.7.4 (default, Aug 13 2019, 15:17:50) 
[Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.17.2
SciPy 1.3.1
Scikit-Learn 0.21.3
Statsmodels 0.10.1

Expected behaviour The method should run and should return:

Fit ARIMA: order=(3, 0, 5) seasonal_order=(0, 0, 0, 1); AIC=1348.491, BIC=1378.597, Fit time=0.330 seconds
Fit ARIMA: order=(0, 0, 0) seasonal_order=(0, 0, 0, 1); AIC=1608.562, BIC=1614.583, Fit time=0.009 seconds
Fit ARIMA: order=(1, 0, 0) seasonal_order=(0, 0, 0, 1); AIC=1528.212, BIC=1537.244, Fit time=0.012 seconds
Fit ARIMA: order=(0, 0, 1) seasonal_order=(0, 0, 0, 1); AIC=1513.502, BIC=1522.534, Fit time=0.043 seconds
Fit ARIMA: order=(2, 0, 5) seasonal_order=(0, 0, 0, 1); AIC=1372.478, BIC=1399.574, Fit time=0.317 seconds
Fit ARIMA: order=(3, 0, 4) seasonal_order=(0, 0, 0, 1); AIC=1350.320, BIC=1377.416, Fit time=0.305 seconds
Fit ARIMA: order=(2, 0, 4) seasonal_order=(0, 0, 0, 1); AIC=1350.115, BIC=1374.200, Fit time=0.289 seconds
Total fit time: 1.307 seconds

Actual behaviour The following stack trace occurs:

  File "/usr/local/lib/python3.7/site-packages/pmdarima/arima/auto.py", line 582, in auto_arima
    with_intercept=with_intercept, **sarimax_kwargs)
  File "/usr/local/lib/python3.7/site-packages/pmdarima/arima/_auto_solvers.py", line 110, in __init__
    self.exec_context = ContextStore.get_or_empty(ContextType.STEPWISE)
  File "/usr/local/lib/python3.7/site-packages/pmdarima/arima/_context.py", line 137, in get_or_empty
    return ContextStore.get_or_default(context_type, _emptyContext())
  File "/usr/local/lib/python3.7/site-packages/pmdarima/arima/_context.py", line 127, in get_or_default
    ctx = ContextStore.get_context(context_type)
  File "/usr/local/lib/python3.7/site-packages/pmdarima/arima/_context.py", line 116, in get_context
    if context_type in _ctx.store and len(_ctx.store[context_type]) > 0:
AttributeError: '_thread._local' object has no attribute 'store'

Additional context Works perfectly with pmdarima==1.4.0

tgsmith61591 commented 4 years ago

It's not a Cython issue, it seems like an issue with the StepwiseContext that was introduced in 1.5.0. That said, I can't replicate it without knowing a bit more about what you have in **params. Is there any code that precedes this, and can you share what options you're passing to the auto_arima function? Also, can you replicate this with any of the built-in datasets so you don't have to share data?

NoahLiot commented 4 years ago

Ah ok. I had havoc in a parallel Cython process and hence wrongly assumed this was related since when this bug is not happening, the Cython process works fine. Anyway, I do not use stored data, I generate it using the following:

    @staticmethod
    def ma_order2(mean, T, m):
        np.random.seed(0)
        # create a MA series of order 2
        xma = np.random.normal(0, 20, T)
        ma = mean + xma + 0.8 * np.roll(xma, -1) + 0.6 * np.roll(xma, -2)
        ma = ma + (50 * np.sin(2 * np.pi / m * np.array(range(T))))
        return ma

The call to generate the time series is then:

self.randomgen.ma_order2(500, 150, 7)

The parameters I pass are:

p = 3
d = 0
q = 5
m = 1
P = 0
D = 0
Q = 0
n_periods = 5

The parameters do not make sense because it is a unit test. (How I found out about the issue)

tgsmith61591 commented 4 years ago

When I run your example through auto_arima:

y = cls.ma_order2(500, 150, 7)
fit = pm.auto_arima(y, d=0, m=1, D=0)

I get a valid fit:

ARIMA(maxiter=50, method='lbfgs', order=(3, 0, 5), out_of_sample_size=0,
      scoring='mse', scoring_args=None, seasonal_order=(0, 0, 0, 1),
      start_params=None, suppress_warnings=False, trend=None,
      with_intercept=True)

Even if I feed in params that don't necessarily make sense (e.g., setting P or Q in auto_arima apriori), I get the same fit:

>>> fit = pm.auto_arima(y, p=3, d=0, q=5, m=1, P=0, D=0, Q=0, n_periods=5)
ARIMA(maxiter=50, method='lbfgs', order=(3, 0, 5), out_of_sample_size=0,
      scoring='mse', scoring_args=None, seasonal_order=(0, 0, 0, 1),
      start_params=None, suppress_warnings=False, trend=None,
      with_intercept=True)

🤔 Is it possible any of your code modifies anything in pmdarima.arima._context? Specifically this line

For reference, here are my relevant versions:

Cython==0.29.14
joblib==0.14.1
numpy==1.17.2
pmdarima==1.5.1
scikit-learn==0.21.3
scipy==1.3.1
statsmodels==0.10.1
dmitry-danilov commented 4 years ago

I can confirm I'm also getting this error having upgraded from pmdarima 1.2.1 to 1.5.1. My model training routines are implemented as scheduled jobs that are managed by APScheduler. One interesting observation that I made - if I run those jobs standalone via main() - no problems are experienced, but if those jobs are triggered by the scheduler I get that error immediately. Reproduced on both Windows desktop and Linux server, Python 3.6.8.

P.S. My code is not trying to mess around with pmdarima.arima._context.

Stack trace:

Traceback (most recent call last):

  File "C:\Users\dimson\Anaconda3\lib\site-packages\apscheduler\executors\base.py", line 125, in run_job
    retval = job.func(*job.args, **job.kwargs)

  File "C:\QTS\power-prediction\training_tool\model_builder\daily_cpy_kw_job.py", line 120, in main
    build_model(el['site'], el['cpy'], el['timezone'])

  File "C:\QTS\power-prediction\training_tool\model_builder\daily_cpy_kw_job.py", line 46, in build_model
    aa_model = pm.auto_arima(daily_df, error_action='ignore', trace=0, seasonal=True, m=7, suppress_warnings=True)

  File "C:\Users\dimson\Anaconda3\lib\site-packages\pmdarima\arima\auto.py", line 582, in auto_arima
    with_intercept=with_intercept, **sarimax_kwargs)

  File "C:\Users\dimson\Anaconda3\lib\site-packages\pmdarima\arima\_auto_solvers.py", line 110, in __init__
    self.exec_context = ContextStore.get_or_empty(ContextType.STEPWISE)

  File "C:\Users\dimson\Anaconda3\lib\site-packages\pmdarima\arima\_context.py", line 137, in get_or_empty
    return ContextStore.get_or_default(context_type, _emptyContext())

  File "C:\Users\dimson\Anaconda3\lib\site-packages\pmdarima\arima\_context.py", line 127, in get_or_default
    ctx = ContextStore.get_context(context_type)

  File "C:\Users\dimson\Anaconda3\lib\site-packages\pmdarima\arima\_context.py", line 116, in get_context
    if context_type in _ctx.store and len(_ctx.store[context_type]) > 0:

AttributeError: '_thread._local' object has no attribute 'store'
tgsmith61591 commented 4 years ago

Ah, thanks for the context @dmitry-danilov ! That definitely helps, and I'm fairly certain I understand what is happening now. I'll get a fix in asap.

tgsmith61591 commented 4 years ago

@dmitry-danilov @NoahLiot I just merged a fix for this. Would either of you be willing or able to build master from source and attempt to run your code again? If this is solved, we'll get a patch release out this week.

dmitry-danilov commented 4 years ago

@tgsmith61591 Happy to confirm I wasn't able to reproduce the issue in my local environment using master built from source. Many thanks!

tgsmith61591 commented 4 years ago

Just released 1.5.2 that should have this patched. Thanks for the help tracking this down, guys.