facebook / prophet

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
https://facebook.github.io/prophet
MIT License
18.38k stars 4.52k forks source link

How is yhat uncertainty estimated in the historical period #727

Closed tommylees112 closed 5 years ago

tommylees112 commented 5 years ago

Data: evi_mean.txt Code:

df = pd.read_table("evi_mean.txt",delimiter=" ")

m = Prophet.Prophet(yearly_seasonality=4, 
                    daily_seasonality=False, 
                    weekly_seasonality=False,
                    seasonality_mode="multiplicative",
                    changepoint_prior_scale=5e-2)

m.fit(df)

future = m.make_future_dataframe(periods=365*3)
forecast = m.predict(future)
fig1 = m.plot(forecast);
fig2 = m.plot_components(forecast);

evi_prophet1

evi_prophet2

I understand that the parameters have been fit using the Stan optimisation and so there are no parameter uncertainties estimated. But I have been looking through the code and I don't understand how yhat_upper and yhat_lower are estimated in the forecast dataframe above. All of the component parts (trend_upper, trend_lower, yearly_upper ...) are constant throughout, so where does this uncertainty come from?

Could you let me know how it is calculated and also point me to the function in which it is calculated?

Thanks very much

Tommy

tommylees112 commented 5 years ago

I understand the uncertainties for FUTURE are based on the following process: i) trend has a constant rate (the solid line) ii) there are S changepoints in the history (T timesteps) iii) therefore, a rate of S/T changepoints with an average magnitude of $\delta$ iv) given the historical average we simulate this many changes to the trend into the future

How is this done for the data that already exists? for the data that we have fit our model to?

Thanks for your help guys!

bletham commented 5 years ago

@tommylees112 There are three types of uncertainty in the model. The first is future trend uncertainty, which you correctly described. The second is parameter uncertainty, which you also correctly observed is not present because MCMC was not done. The third type, and the source of the uncertainty in the past here, is that there is a noise term in the model. Specifically, the model is

y(t) = trend(t) + seasonality(t) + regressors(t) + noise

where noise ~ Normal(0, sigma) and the noise variance sigma is fit to the data. yhat_lower and yhat_upper are in the history just the quantiles of that noise distribution.