MCMC sampling in Prophet

facebook / prophet

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

https://facebook.github.io/prophet

MIT License

18.27k stars 4.51k forks source link

MCMC sampling in Prophet #669

Closed dsvrsec closed 5 years ago

dsvrsec commented 6 years ago

Can anyone explain briefly(may be mathematically)how MCMC.samples makes difference in forecast accuracy .(From my experience when I used mcmc.samples-1 ,there is decrease in accuracy,when I increased it further,the accuracy increased substantially.)

Thanks

bletham commented 6 years ago

MCMC estimates posterior distributions for each of the parameters, like trend slopes and seasonality parameters. There is a lot that could be said about MCMC (too much for github comments), this looks like a decent reference: http://twiecki.github.io/blog/2015/11/10/mcmc-sampling/

When making a prediction, we use the posterior mean as the estimate instead of the maximum a posteriori parameters found by just optimizing. In order for this to be a good estimate, you need enough MCMC samples so that the chain has reached the stationary distribution and is actually drawing posterior samples. With very few MCMC samples, the chains have likely not reached the posterior and I would expect bad parameter estimates (and so bad forecasts) as a result.

In the example in the documentation we used 300 samples so it would run quickly, I would probably recommend more like 1000.

dsvrsec commented 6 years ago

Where optimatization and MCMC sampling techniques are used exactly as both seems to estimate the unknown parameters

bletham commented 6 years ago

They're just two different approaches for estimating the unknown parameters. Optimization finds the MAP estimate (https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation), whereas with MCMC the posterior mean is used. See the Wikipedia article for some description of both of these estimates.

allthingssecurity commented 1 year ago

Thanks @bletham for the explanation. I just wanted to check one more thing. Are the samples which are used for MCMC sampling during prediction, generated for different time stamps in the future? My understanding is that for each time stamp in future we use the laplacian (say for the trend parameter) learnt on historical data and sample that to predict number of samples (default 1000) per timestep and then use the MCMC to get the posterior mean which is the used to predict the data point value