Model optimization hangs with CPU at 100% and does not throw an error

stefanproell commented 7 years ago

I am iterating over several hundreds of quite small time series data sets and try to forecast a value for each one of the data sets. I am aware that Facebook Prophet is built for large data sets, but I am experiencing good results also for smaller data sets, except for single outliers, where no useful forecast can be made. One of these outliers causes my script to run forever, without throwing an error or aborting.

Minimal example to reproduce the problem

The following example causes the model optimization for the data set to run for ever and therefore freezes the script which is executing the forecasts.

import pandas as pd
import numpy as np
from fbprophet import Prophet
data_dict = [{'ds':'2014-11-30', 'y':'9'},
             {'ds':'2015-01-31', 'y':'50'},
             {'ds':'2015-02-28', 'y':'193'},
             {'ds':'2015-03-31', 'y':'645'},
             {'ds':'2015-04-30', 'y':'544'},
             {'ds':'2015-05-31', 'y':'658'},
             {'ds':'2015-06-30', 'y':'694'}
             ]
df_test = pd.DataFrame(data_dict)
df_test['y'] = df_test['y'].astype(float)
df_test['y'] = np.log(df_test['y'])
print('Test data set')
print(df_test)
m = Prophet(weekly_seasonality='auto', yearly_seasonality='auto')
m.fit(df_test);
future = m.make_future_dataframe(periods=12, freq='M',include_history=True)
future.tail()
forecast = m.predict(future)
print(forecast)

This is the output, the script freezes at the fit method:

Test data set
           ds         y
0  2014-11-30  2.197225
1  2015-01-31  3.912023
2  2015-02-28  5.262690
3  2015-03-31  6.469250
4  2015-04-30  6.298949
5  2015-05-31  6.489205
6  2015-06-30  6.542472
Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
Initial log joint probability = -2.08382
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
      24       12.6189   1.70452e-05       89.5855   1.981e-07       0.001       73  LS failed, Hessian reset 
      30         12.62   8.50625e-06       99.8521   8.007e-08       0.001      121  LS failed, Hessian reset

Expected result

The problem is related to the data set. When I change a single error, e.g. the first one, then the little program runs through:

Test data set
           ds         y
0  2014-11-30  2.079442
1  2015-01-31  3.912023
2  2015-02-28  5.262690
3  2015-03-31  6.469250
4  2015-04-30  6.298949
5  2015-05-31  6.489205
6  2015-06-30  6.542472
Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
Initial log joint probability = -2.09018
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
      19       12.4997   0.000218154       83.8255   1.995e-06       0.001       69  LS failed, Hessian reset 
      69        12.799   1.60779e-06       100.093   1.456e-08       0.001      194  LS failed, Hessian reset 
      84       12.7992   2.23223e-09       90.7073     0.02835           1      220   
Optimization terminated normally: 
  Convergence detected: absolute parameter change was below tolerance
           ds         t      trend  seasonal_lower  seasonal_upper  \
0  2014-11-30  0.000000   2.690022             0.0             0.0   
1  2015-01-31  0.292453   4.068725             0.0             0.0   
2  2015-02-28  0.424528   4.691364             0.0             0.0   
3  2015-03-31  0.570755   5.380715             0.0             0.0   
4  2015-04-30  0.712264   6.047829             0.0             0.0   
5  2015-05-31  0.858491   6.737180             0.0             0.0   
6  2015-06-30  1.000000   7.404293             0.0             0.0   
7  2015-07-31  1.146226   8.093644             0.0             0.0   
8  2015-08-31  1.292453   8.782995             0.0             0.0   
9  2015-09-30  1.433962   9.450109             0.0             0.0   
10 2015-10-31  1.580189  10.139460             0.0             0.0   
11 2015-11-30  1.721698  10.806573             0.0             0.0   
12 2015-12-31  1.867925  11.495924             0.0             0.0   
13 2016-01-31  2.014151  12.185275             0.0             0.0   
14 2016-02-29  2.150943  12.830152             0.0             0.0   
15 2016-03-31  2.297170  13.519503             0.0             0.0   
16 2016-04-30  2.438679  14.186616             0.0             0.0   
17 2016-05-31  2.584906  14.875967             0.0             0.0   
18 2016-06-30  2.726415  15.543081             0.0             0.0   

    trend_lower  trend_upper  yhat_lower  yhat_upper  seasonal       yhat  
0      2.690022     2.690022    1.738759    3.543975       0.0   2.690022  
1      4.068725     4.068725    3.279621    4.912263       0.0   4.068725  
2      4.691364     4.691364    3.779862    5.587587       0.0   4.691364  
3      5.380715     5.380715    4.495832    6.234057       0.0   5.380715  
4      6.047829     6.047829    5.156757    6.865072       0.0   6.047829  
5      6.737180     6.737180    5.824482    7.614358       0.0   6.737180  
6      7.404293     7.404293    6.472549    8.300300       0.0   7.404293  
7      8.093644     8.093644    7.251213    8.979675       0.0   8.093644  
8      8.782995     8.782995    7.979853    9.645504       0.0   8.782995  
9      9.450109     9.450109    8.634570   10.290984       0.0   9.450109  
10    10.139460    10.139460    9.235909   11.100990       0.0  10.139460  
11    10.806573    10.806574    9.913225   11.741765       0.0  10.806573  
12    11.495924    11.495925   10.654597   12.356846       0.0  11.495924  
13    12.185275    12.185276   11.287038   13.031297       0.0  12.185275  
14    12.830151    12.830152   11.976170   13.784151       0.0  12.830152  
15    13.519502    13.519503   12.651313   14.316010       0.0  13.519503  
16    14.186616    14.186617   13.313926   15.090459       0.0  14.186616  
17    14.875966    14.875968   14.024588   15.745605       0.0  14.875967  
18    15.543080    15.543082   14.737588   16.418176       0.0  15.543081  

Process finished with exit code 0

So that works as expected, just by changing one of the values.

Investigation

When debugging the script with IntelliJ IDEA Ultimate, I noticed that the problem is caused by the line 538 in forecaster.py params = model.optimizing(dat, init=stan_init, iter=1e4, **kwargs) In cases where my data cannot be processed by Prophet, I would like to have an error thrown on which I can react upon. I found the following issue, which is but not identical to mine here, as I do not receive any error. In the issue, there is a link to the commit with a suggested fix [here] (https://github.com/facebookincubator/prophet/commit/f7becb0942cd0a005d72ae307aefee431fa962d7) I tried to replace the code snippet in a similar way, but catching generic Exceptions instead of only RuntimeErrors. This is my example code:

try:
    params = model.optimizing(dat, init=stan_init, iter=1e4, **kwargs)
except Exception:
    print('error')
    params = model.optimizing(dat, init=stan_init, iter=1e4, algorithm='Newton',**kwargs)
for par in params:
    self.params[par] = params[par].reshape((1, -1))

The following does not work, as the optimizing method does not throw any error, but hangs. Trying to investigate on the problem, I noticed that the assignment of the StanFit4Model to the fit_class attribute fails in model.py line 324. In line 472 of model.py, I can see the following error with the debugger: Unable to get repr for <class 'stanfit4anon_model_7a788197ac3493030b72020b9ffdbe7d_5923200972637382724.StanFit4Model'> Most properties of the fit object are then undefined (None) and cause the subsequent execution to fail. What would be a quick fix to catch errors and proceed with the tool execution? The following versions are used:

Used Python Packages

conda==4.3.14
Cython==0.25.2
fbprophet==0.1.1
h5py==2.6.0
numpy==1.13.1
pandas==0.20.3
pystan==2.16.0.0
statsmodels==0.6.1

I use Python 3.6.0 :: Anaconda 4.3.1 (64-bit) on an Ubuntu 14.04 LTS machine. I also tested the same in a Debian Docker container and receive the same problem.

Quick fix?

What would be a quick fix to catch errors and proceed with the tool execution? Thank you!

bletham commented 7 years ago

Thanks for the really awesome description and analysis of the issue. As you said, the LBFGS is hanging and doing so without raising an error that can be caught. Interestingly this does not happen in the R with these same data, the issue seems to be specific to pystan.

I don't know what exactly is causing the optimization to hang with these data, but I think there are things that we can do in prophet to make the optimization problem nicer for small datasets like this. Prophet by default inserts 25 potential changepoints in the time series, which in this case is more than the number of datapoints in the history. This means there are a bunch of unidentifiable parameters in the model, which isn't necessarily a problem for the optimization but I think could be.

In this case of this dataset, using

m = Prophet(n_changepoints=2)

makes it work. I'm not sure though if this is actually fixing the issue generally or if it just fixing it for this one dataset. If you have a whole bunch of datasets like this, could you try all of them with this setting?

Generally we need to adapt the default number of changepoints based on the number of data points.

Ramblurr commented 7 years ago

Is there a hard rule or a rule-of-thumb on the # of change points needed for x data points?

stefanproell commented 7 years ago

Thank you very much for the quick reply and help. I can confirm that the example works when the number of change points is reduced. For my current application, I set the number of change points equal to the number of data points for data sets with 25 records or less. For data sets having more than 25 historical records, I use the default of 25. This works for my project where I forecast smaller data sets.

n_data_points = len(df.index)
potential_change_points = 25
if(n_data_points < potential_change_points ):
    potential_change_points  = n_data_points

m = Prophet(weekly_seasonality='auto', yearly_seasonality='auto',n_changepoints=potential_change_points )

The documentation states that the potential change points are selected by the first 80 percent of the historical data.


n_changepoints: Number of potential changepoints to include. Not used
    if input `changepoints` is supplied. If `changepoints` is not supplied,
    then n.changepoints potential changepoints are selected uniformly from
    the first 80 percent of the history.

I suppose I could also select n=0.8* len(df.index) and thereby all potential change points , but this would probably not scale for larger data sets. Is there a generic way of finding a proper value for this setting?

Thanks again, Cheers Stefan

bletham commented 7 years ago

If you look at the dashed lines in the first figure on this page https://facebookincubator.github.io/prophet/docs/trend_changepoints.html you can see what you get with 25 changepoints. You would want to increase past that when the time series experiences trend changes that you need to capture that are happening at a faster rate than that. However, if the trend really is changing every few observations (or, very frequently relative to the forecast horizon), the hope for a useful forecast may be a bit slim.

I think around 25 is a good default choice for longer time series (>a few hundred observations) and I think in most cases there won't be too much value from increasing past there. It would give a quite small change to the quality of the trend fit, and an even smaller change in the quality of the seasonality estimation (the most important part of having a good trend fit). Of course this might not apply if the time series is experiencing very frequent, large trend shifts.

For short time series - having more changepoints than observations introduces variables in the model that are never used and so is entirely unecessary. Having a changepoint for each observation gives the model the potential to change the trend at every data point - which also seems unlikely to be a good strategy. I don't think see much reason to have more than 1 changepoint per 3 or so observations. A changepoint per 3 observations would mean that the location of a trend change would be off by at most 1 observation. This seems pretty reasonable.

In these low-data settings I would also strongly recommend using MCMC sampling (e.g. argument mcmc_samples = 500) which would then incorporate the trend changepoint parameter uncertainty into the forecast, and will still be relatively fast.

bletham commented 7 years ago

https://github.com/facebookincubator/prophet/commit/0b4ec4a9b3c6d91e2ae70dbc02dd37d25b9758e7 makes it so we don't use more changepoints than we have datapoints, which should prevent this in the future.

DBCerigo commented 7 years ago

I ran into this same problem but with a different series - specifically one that is 490 in length, so longer than the number of changepoints.

Notebook that reproduces issue

This example differs from OP in that it is a significantly longer series (~490 in length) which should mean that it is not that case that there will be more changepoints than datapoints. (It also includes some NaNs.) The STAN output also differs from above in that I've never seen it get to reporting on a single Iter - I have so far always seen it hang with the same output as in the notebook.

So far, making changes of + or - 1 to any of the values solves the issue (similarly to OP). Also changes in the ds series solves it.

This is my first exposure to Prophet and STAN, and I think I've reached the soft-limit of my debugging capabilities on this, of course happy to answer any clarifications/questions on the contents of the notebook.

bletham commented 7 years ago

Thanks for the notebook - that's unfortunate to have this issue in a longer time series, I was really hoping it was a matter just of the very short time series.

Things seem to be working on the current v0.2 branch - the notebook runs successfully in both Py 2 and 3. There have been a lot of changes, but I'm suspecting that it will now work for this time series but not for some other. We'll push v0.2 out in a few days and then if the problem persists or there is a time series that produces it I'll debug further. In the meantime, you can try changing up the optimizer by passing along algorithm='Newton' to fit(). That might tell if it's something specific to the L-BFGS.

DBCerigo commented 7 years ago

Thanks for the speedy reply. Looking forward to v0.2 as it also solves #258 & #251 which I've been running into.

Do you think it's worth trying v0.2 right now (i.e. cloning branch and installing with setup.py)?

bletham commented 7 years ago

That'd be great! Right now everything is done for Py, just have a couple R features to wrap up and then documentation update.

bletham commented 7 years ago

This fix has been pushed to CRAN and pypi in v0.2.

FarukBuldur commented 2 years ago

Hello @bletham I have run into same issue for a series of 252 data points. It hits to fit method and hangs there like an infinite loop. I am not able to catch with python timeout since it occurs in the stan backend.

sneha-kathuria commented 2 years ago

Hi @bletham ,

I have run into the same issue. In my case, I use growth='flat'. It hangs forever without any error. But when I change growth='linear', it works fine.

waliahmed24 commented 11 months ago

Hey @bletham, I also have been stuck with a similar problem. Most time series in my dataset are of length 400. When the default change_points i.e. 25 are kept and growth is set to 'linear', it works fine. But when the growth is set to 'logistic', it hangs infinitely.

Also note that I have been iterating over each individual time series not with a python for loop but using the applyInPandas utility for executing it on a spark cluster. Reducing the n_changepoints solves the issue but not sure why since default value works well with growth 'linear'.

facebook / prophet