Understanding sources of error in uncertainty

I was trying to better understand the source of the uncertainty in the main forecast - i initially assumed it was a combination of trend uncertainty and random noise (sigma_noise parameter). However, after doing some analysis on the quick_start.ipynb , im not sure i fully understand - ive attached the plots and analysis below (apologies for the long post)

Ive re run the model with a smaller training history (4 years) and future forecast periods of 99 days instead of 365

Screenshot 2020-03-01 at 23 13 41

The associated trend component - ive zoomed into the forecast period to allow the uncertainty band to be better visualised

Screenshot 2020-03-01 at 23 09 27

_Now, I re-run the prediction with predictive_samples and plot every 3rd sample of the trend projected forward for the duration of the forecast period. Im assuming the main trend component is somewhere in the high density region (MAP) of the plot below with the shaded uncertainty region bounds computed to include 95% of samples from the mode_

Screenshot 2020-03-01 at 23 23 19

Similarily - do the same for the main yhat forecast - ive only plotted a few here to allow it to be visualised

Screenshot 2020-03-01 at 23 07 00

Plot histogram of samples for a random point for both the trend and yhat

Screenshot 2020-03-01 at 23 07 12

Compute the width of the intervals from the median in both directions by computing the 2.5, 50 and 97.5 percentiles and taking the differences for upper an lower.

Screenshot 2020-03-01 at 23 40 56

Ive used @bletham suggestion here https://github.com/facebook/prophet/issues/1007#issuecomment-502205616 to compute the uncertainty from sigma_obs parameter (assuming this is just the mode of probability density distribution over sigma_obs if not doing full MCMC ?). This is similar to yhat 0.95 intervals computed from yhat predictive samples in the results above but not exactly the same. The trend contributes 0.03 - but is this already accounted for as part of sigma_obs or is the trend uncertainty added to it (and if so why does the total sum not equal the yhat uncertainty).

Also, from the trend component - the uncertainty gets wider the further into the future but the similar is not apparent in the main forecast (seems the uncertainty is pretty consistent ly between 0.96 to 1 or just over for all the forecasted points). ?

From section 3.1.4 in the paper - im not fully clear what the rate scale parameter is ? Is it a replacement for tau in laplce distribution for delta from which new samples are drawn for projecting simulated trend models forward ?

Any help would be greatly appreciated :)

Trend vs. noise uncertainty

Without MCMC, there are only two sources of uncertainty: future trend uncertainty, and the observation noise. sigma_obs is the stdev of the observation noise, so does not include the trend uncertainty. The samples of trend include the future trend uncertainty. The samples of yhat will include both; they are actually computed as trend_sample + seasonality + noise_sample. You can see that in the code here: https://github.com/facebook/prophet/blob/952b5449285abbd1ecf655dd8ef3bb782609b077/python/fbprophet/forecaster.py#L1399 and the particular lines to note are these: https://github.com/facebook/prophet/blob/952b5449285abbd1ecf655dd8ef3bb782609b077/python/fbprophet/forecaster.py#L1422 https://github.com/facebook/prophet/blob/952b5449285abbd1ecf655dd8ef3bb782609b077/python/fbprophet/forecaster.py#L1425

So given that

yhat = trend_sample + seasonality + noise_sample

the relationship that we expect is

var(yhat) = var(trend) + var(noise)

(since seasonality is constant without MCMC). This relationship holds, with some Monte Carlo noise in the later digits.

m = Prophet(uncertainty_samples=10000)
m.fit(df)

predict_date = pd.DataFrame({'ds': ['2016-09-01']})
samples = m.predictive_samples(predict_date)

print(np.var(samples['yhat']))
# 0.25820409106348313
print(np.var(samples['trend']) + (m.params['sigma_obs'][0, 0] * m.y_scale) ** 2
# 0.2575438788452501

This is what sums, and not the percentiles (the distribution of trend is not normal).

Does yhat uncertainty get larger Yes, the yhat interval will be expanding along with the trend according to the relationship above (var(yhat) = var(trend) + var(noise)). I agree that it looks pretty constant, but this is just because the trend variance is very low in this example relative to the observation noise. On the date I chose above, the trend variance is 0.023, and the noise variance is 0.233; 10x higher. https://facebook.github.io/prophet/docs/non-daily_data.html#sub-daily-data has an example where the trend variance is much higher, and can be seen in the variance of yhat.

Scale parameter for trend uncertainty If I understand correctly, then yes. In 3.1.4, tau is the prior on the magnitude of slope changes at each changepoint. In the code, this is the changepoint_prior_scale, and shows up here in the Stan model: https://github.com/facebook/prophet/blob/952b5449285abbd1ecf655dd8ef3bb782609b077/python/stan/unix/prophet.stan#L111 When simulating future trends, changepoint magnitudes (delta) are not drawn from this prior Laplace(tau), rather (in order for them to be conditional on the data) they are drawn from a Laplace(lambda), where lambda is the average fitted trend change in the history. That happens here: https://github.com/facebook/prophet/blob/952b5449285abbd1ecf655dd8ef3bb782609b077/python/fbprophet/forecaster.py#L1460-L1461 As noted in 3.1.4, This is the maximum likelihood estimate for the distribution given the fitted trend changes from the history.

Does that answer all of your questions?

Thanks for the detailed response, yes it makes a lot more sense now. I think i was searching in the stan code for the scale parameter - didnt think to search the forecatser.py file. I can now see how the uncertainties add up and make sense. So basically is the scale parameter for trend uncertainty drawn from a hierarchial/hyper prior on original delta prior ? since in the code we have delta_new ~ Laplace(0, lambda_) and as you said lambda_ is computed from delta

One more thing regarding the error assumption - so is the parametric assumption for the error term specifically for a single forecast ? In section 4.2 of the paper, equation 8 assumes a non parametric assumption - is this specifically only when one is making forecasts across different horizons using cv ? Is there somewhere in the code where this assumption is made ?

Thanks again for the response.

So basically is the scale parameter for trend uncertainty drawn from a hierarchial/hyper prior on original delta prior ?

Not really - that would be the fully Bayesian way to do it, but would mean we could only get trend uncertainty if we did MCMC. Instead, we just compute it directly from the fitted delta, which is more like an empirical Bayes.

non-parametric error estimation

Eq. 8 is basically about combining error estimates that were made at different cutoffs and horizons (the output of cross_validation), to get a single estimate for the error as a function of horizon. As it notes in the paper, we generally want this estimate for error(h) to e.g. be smooth, and it notes some ways this could be done (local regression, isotonic regression). What is actually shipped in the package is the simplest possible "model", which is a rolling average. This is what the performance_metrics function uses to combine error estimates into an error curve error(h), as in the figure at the bottom of https://facebook.github.io/prophet/docs/diagnostics.html . This rolling average approach is described in a similar way as in the paper in https://github.com/facebook/prophet/issues/839#issuecomment-462524968 (where "what we really want to do" is what we are currently doing there)

@bletham Great, thanks for the explanation - answers all my questions now. Im closing the issue.

facebook / prophet

Understanding sources of error in uncertainty #1371