facebook / prophet

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
https://facebook.github.io/prophet
MIT License
18.47k stars 4.53k forks source link

How can we investigate the error / noise term. #1886

Open sarah2397 opened 3 years ago

sarah2397 commented 3 years ago

Dear all,

I noticed that Prophet uses a decomposable time series model, which includes a trend, saisonality, holiday factors and whatever additional regressor, you want to use. But there is (of course) also a noise term. So is this noise term stochastic oder determinstic? How is the error term calculated? Can I visualize or calculate the error term in my model by myself?

I couldn't find some information about this part of the model and hope you can help me at this point.

Best regards and thank you!

tcuongd commented 3 years ago

Heya, I'm not 100% confident in the modelling details (Ben Letham can probably provide a much better answer here), but the model structure is described here: https://github.com/facebook/prophet/blob/0616bfb5daa6888e9665bba1f95d9d67e91fed66/python/stan/unix/prophet.stan#L127-L142

Basically we have y, the observed the data, and we're fitting it to a Normal distribution, with mu = trend + seasonality + regressors, and sd = sigma_obs. sigma_obs is fitted to the data and I think that's what you're referring to as the noise term. For MAP estimation (i.e. mcmc_samples = 0), there will just be a single value for sigma_obs (which you can access in model.params['sigma_obs']), and this represents the average variability of a given datapoint.

For the predicted value yhat, we don't actually use this sigma term, because yhat represents the expected value of each future data point. i.e. yhat ~ Normal(trend + seasonality + regressors, sigma_obs), but the expected value of this is just trend + seasonality + regressors.

sigma_obs does get used when we do uncertainty estimation. We would sample from the normal distribution described above, so sigma_obs affects how widely those samples can range. You can see the code for this here: https://github.com/facebook/prophet/blob/0616bfb5daa6888e9665bba1f95d9d67e91fed66/python/prophet/forecaster.py#L1477-L1483

bletham commented 3 years ago

@tcuongd awesome explanation, I was just going to add that the noise term is included in the yhat_upper and yhat_lower columns in the forecast dataframe, and by extension it is part of the shaded uncertainty region that you see in the m.plot() visualization.

sarah2397 commented 3 years ago

Thank you very much, this was very helpful!

So in the forecast dataframe, i have yhat_upper and yhat_lower, but I cannot see the exact value for sigma_obs, which is used to calculated in the bounds of the confidence intervals? We just sample from the normal distribution and we will include these sigma_obs values in the uncertainty intervals, right?

If I understand it right, for yhat itself, we just use one fix sigma_obs. So in this case, it's not relevant to figure out the exact value for me.

tcuongd commented 3 years ago

We just sample from the normal distribution and we will include these sigma_obs values in the uncertainty intervals, right?

Yep that's correct for yhat_lower and yhat_upper :)

for yhat itself, we just use one fix sigma_obs

yhat doesn't actually rely on sigma_obs at all, since yhat is just the mean of the distribution (so yhat = trend + seasonality + holidays).

skannan-maf commented 1 year ago

What is "y_scale" in the snippet above? Is there a way we can tell prophet not to standardize 'y' ?