Prophet Uncertainty intervals

facebook / prophet

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

https://facebook.github.io/prophet

MIT License

18.26k stars 4.51k forks source link

Prophet Uncertainty intervals #1124

Closed ruksarsultana closed 4 years ago

ruksarsultana commented 5 years ago

@bletham I'm seeing different forecast values in uncertainty intervals in each run. Hence different users are unable to replicate the same uncertainty interval forecast (even with the same variables, changepoint.prior.scale & interval.width) Could you please help understand the reason for this & how this can be tackled. Please let me know If the issue is unclear to you.

APramov commented 4 years ago

@ruksarsultana, short answer: I think there is randomness in the optimization procedure but the good news is that you can control for it so that every user gets the same result by setting the seed. Just put set.seed(1234) at the top of your code and execute the whole code (if you are using R) or

import random
random.seed(1234)

if you are using Python. You don't have to use 1234, you can use any sequence of numbers there.

Hope this helps!

long answer: I think that must be down be down to some randomness in the inferential procedure (Prophet uses Stan's penalized maximum likelihood estimation with optimization by default, and MCMC if you ask it to). I have to read up a little bit the Stan's documentation on the penalized ml estimation method before I can narrow down the answer to it (been a while since I used any Bayesian analysis). I will update my answer then accordingly. If you were using the full posterior MCMC then it's clear that you would have randomness. EDIT: See my update below.

Now to the point of "how to fix this" - it is very easy. R (and Python) allow for the random number generator to be initiated from a fixed, known state. In this way you always get the same random numbers and results are reproducible. In fact, I will always recommend that you do that in any type of research work or analysis that involves random number generation. Just before you run the prophet analysis, just use set.seed(1234) in Ror set the seed in Python using random.seed(1234)from the randommodule and run the whole code. Make sure that all the users do the same (they have to use the same seed).

Check out this topic on stackoverflow for a bit more info: https://stackoverflow.com/questions/13605271/reasons-for-using-the-set-seed-function

EDIT UPDATE: OK, here is what I think happens. When you run prophet, by default you do inference using penalized ML which in turn uses an LFBGS optimization algorithm. At some point for the L-FBGS optimization procedure, Prophet calls the stanfunction, and in stanone of the arguments that you have is a choice on how you initialize the starting parameters for the optimization (the argument is called init. I presume that Prophet takes the default value, which is to initialize the starting parameters randomly - and this is where you randomness in the output comes from. You can take a look here for more details: https://mc-stan.org/docs/2_18/stan-users-guide/efficiency-for-probabilistic-models-and-algorithms.html Not sure if I am right, @bletham might confirm or correct me.

bletham commented 4 years ago

@APramov thanks for the tips! I'll add a few things.

It is correct that by default the fitting uses a MAP estimate. Stan by default uses random initialization, but Prophet actually does not - it uses a fixed initialization for the fitting: https://github.com/facebook/prophet/blob/190d3239fd2172c9bfcedd57b7cdefde56108bf8/python/fbprophet/forecaster.py#L1092

So the optimization is in principle deterministic. I say in principle because L-BFGS optimization is not actually totally deterministic, it depends on things like machine precision and the linear algebra packages installed on the system. My experience is that if I do the fitting on a single computer multiple times it will get the same result, but there are several issues on the github here that report examples where fitting the same time series on different machines has produced slightly different results due purely to differences in the L-BFGS optimization. These differences are always very small, but they can be present and unfortunately I do not think there is anything that can be done about them.

Though it sounds like the question here is mostly about uncertainty intervals, and for that there is a much large source of variability across runs. Trend uncertainty is estimated using Monte Carlo sampling from the trend change generative model. This is by default done with 1000 samples, and then the e.g. 80% intervals are computed by taking the 10th and 90th quantiles from these samples. I would expect differences in the uncertainty intervals across runs to primarily be MC variance from this sampling. You can tell if this is the case by calling predict twice on the same fitted model; any difference in predictions is 100% due to the MC estimate of trend uncertainty. There are two things that can be done. When instantiating the Prophet object, there is an argument uncertainty_samples that specifies how many MC samples to do. Increasing this will reduce the MC variance, though at a cost of taking more time to run predict because it has to do more simulations. You might have to increase it very high to get things to be totally repeatable. The other approach would be to set a random seed, as @APramov suggests. In Python, all of the random sampling is done in numpy so you'll have to set the seed in numpy:

import numpy as np
np.random.seed(1234)

sammo commented 4 years ago

So the optimization is in principle deterministic. I say in principle because L-BFGS optimization is not actually totally deterministic, it depends on things like machine precision and the linear algebra packages installed on the system. My experience is that if I do the fitting on a single computer multiple times it will get the same result, but there are several issues on the github here that report examples where fitting the same time series on different machines has produced slightly different results due purely to differences in the L-BFGS optimization. These differences are always very small, but they can be present and unfortunately I do not think there is anything that can be done about them.

I've seen differences in the MAP estimation up to 12% between different OS that was causing functional tests to fail, so I dug into it and here is what I got to.

The forecast is deterministic on the same OS as @bletham mentioned.
The forecast is different between two different OS (info on OS used below).
The difference starts at the result of fit._call_sampler(stan_args) in pystan's model module here even though stan_args are the same in both cases.
_call_sampler is defined in the cython file here.
The _call_sampler is a template script that uses $model_cppname.hpp header file in the compilation process.
Both machines use the same MKL library (see OS info below)

Where it gets interesting is that the conda's python on linux is compiled with GCC, conda's python on mac is compiled with Clang (info below).

My best guess at this point is that having two different compilers generating the libraries is causing the L-BFGS optimization to take different routes. Any other thoughts or did I miss anythin here @bletham ?

OS 1: Debian GNU/Linux 9 (stretch) conda 4.7.10 python 3.7.4 [GCC 7.3.0] :: Anaconda, Inc. on linux fbprophet 0.5 pystan 2.19.0.0 mkl 2019.4 mkl_random 1.1.0

OS 2: Darwin Kernel Version 18.7.0 conda 4.7.12 python 3.7.4 [Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin fbprophet 0.5 pystan 2.19.0.0 mkl 2019.4 mkl_random 1.1.0

bletham commented 4 years ago

@sammo that is a very thorough analysis! And consistent with my expectation.

One thing I didn't note above is that this seems to be a particular issue on problems where the likelihood surface is very flat near the optimum, which can particularly be the case in short time series where the model identifiability can be poor. Stan's Newton optimizer does seem to be more robust in these situations, and in past issues where this has come up people seem to find more reproducible results using that (e.g. #253 I think was the first report of this issue). You can set the optimizer like

m.fit(algorithm='Newton')

sammo commented 4 years ago

Thanks for the response @bletham . I'll try it out with Newton then.

ravise5 commented 3 years ago

@bletham

I'm trying to make forecasts for multiple time series having the past 2 years data. I'm interested more in the yhat_upper of the forecasts but as discussed here I'm unable to get reproducible results.

tried using np.random.seed(value) but it doesn't seem to resolve the problem.
As suggested by you I tried increasing the uncertanity_samples from 1000 to 2000 and this seems to work for a major chunk of the time series that I'm forecasting. But on the downside, the time taken has increased exponentially. Changing the optimiser to 'newton' took even more time.
Is the Monte Carlo sampling used for generating the predictive samples deterministic? Is there a way by which we can get the same predictive samples every time we run the model rather than increasing uncertainity_samples and reducing the variance? (what the seed should have done). Or is the variance arising from some random initialization in STAN which we are not able to control using the seed?
I read in the docs that assigning a non-zero value for the mcmc_samples would also account for uncertainty in seasonality. How will this added uncertainty lead to a more stable confidence interval?

bletham commented 3 years ago

If doing MAP fitting (default, mcmc_samples=0), the fitting is totally deterministic, including a deterministic initialization to Stan. You can look at the fitted parameters in m.params to verify that they are consistent.

Stochasticity comes in the prediction stage, and derives entirely from the trend uncertainty estimation. Increasing uncertainty_samples reduces variance in yhat_upper but it is still stochastic.

The actual randomness is in selecting the number of future trend changes: https://github.com/facebook/prophet/blob/c72ed7abcd0321c51bbf82ad504ddffac9a4c18f/python/prophet/forecaster.py#L1507 in the locations of those changes: https://github.com/facebook/prophet/blob/c72ed7abcd0321c51bbf82ad504ddffac9a4c18f/python/prophet/forecaster.py#L1511 and in the magnitude of trend change at each of these: https://github.com/facebook/prophet/blob/c72ed7abcd0321c51bbf82ad504ddffac9a4c18f/python/prophet/forecaster.py#L1520

As you can see all of these are using np.random, so setting the seed via np.random before calling predict makes this reproducible. When I run this code, I get exactly the same yhat_upper in the two different calls:

from prophet import Prophet  # v1.0 new pakage name
import numpy as np

df = pd.read_csv('../examples/example_wp_log_peyton_manning.csv')

m = Prophet()
m.fit(df)
future = m.make_future_dataframe(periods=365)
np.random.seed(1000)
forecast = m.predict(future)
print(forecast['yhat_upper'].tail())

# Fit and predict again, with same seed
m = Prophet()
m.fit(df)
np.random.seed(1000)
forecast = m.predict(future)
print(forecast['yhat_upper'].tail())  # Same as above

This is only true with the MAP fitting. If you use MCMC (with mcmc_samples>0) then the fitting becomes stochastic. For a fixed model you can get deterministic predictions by setting the numpy seed as above, but if you re-fit you'll get different parameters with MCMC so you'd get different results. You'd have to give Stan the seed to use for fitting, which you could look at the pystan sampling docs to figure out how to do.

ravise5 commented 3 years ago

@bletham Thanks for responding I had the code running parallelized using pool. The issue was that the child processes were not getting access to the same random seed, hence the results were varying.. Just wanted to put it out there if anyone else is having the same issue. Thanks again!

gabrieldaiha commented 2 years ago

@bletham Thanks for responding I had the code running parallelized using pool. The issue was that the child processes were not getting access to the same random seed, hence the results were varying.. Just wanted to put it out there if anyone else is having the same issue. Thanks again!

@ravise5 Man, how did you solve this issue while running parallelized using pool?

ravise5 commented 2 years ago

@gabrieldaiha You should declare the random seed inside the function that you want to parallelize. That way all child processes get the same seed. If u are using jupyter notebook declaring a random seed inside a cell doesn't actually guarantee that the random seed will be applied globally across all the cells of your notebook. This is an issue with jupyter notebook. Hope this helps!😊

dipranjan commented 1 year ago

@bletham Thanks for responding I had the code running parallelized using pool. The issue was that the child processes were not getting access to the same random seed, hence the results were varying.. Just wanted to put it out there if anyone else is having the same issue. Thanks again!

Not related to the mentioned issue.. we are also trying to parallelize prophet.. any guide to how best to do that?