Compute confidence intervals and posteriors

Hi. I am running an additive model with weekly and monthly seasonality. Base on documentation, my understanding is that there are three contributors to uncertainty: (a) trend, (b) seasonality, (c) noise. Uncertainty (a) can be estimated through from a Monte Carlo sampling process, and the samples can be retrieved via m.predictive_samples(future) [Py] or predictive_samples(m, future) [R]; Uncertainty (b) can be estimated via MCMC sampling.

Here are my questions:

How can we obtain the MCMC samples?
In the forecast output, we get the bounds for all trend and seasonality, as well as for yhat. How is the bound for yhat computed? If it is also sampling based, does it mean that all the trend/seasonality samples get plugged into the equation y = trend(t) + seasonality(t) + beta*X(t)?
How is Uncertainty (c) estimated and accounted in final yhat_lower and yhat_upper?

Thanks in advanced!

Hey,

I will try to answer what I can (good questions!) based on my understanding, Just one thing - I assume that there is no external regressor (so I don't take the last term in your equation from point 2).

On to what I think is the answer: ad 1) I think for the seasonality coefficients (see eq. 6 in the documentaton paper), it should be with myfit$params$beta. From there on you should be able to construct what you need (again as per equation 6) for the seasonality component itself (see also point 2 below).

ad 2) My guess would be that, yes. I was looking at the STAN code but couldn't find the generated quantities there, but if you look at the source code here: https://github.com/facebook/prophet/blob/master/R/R/prophet.R

you have that

sample_model <- function(m, df, seasonal.features, iteration, s_a, s_m) {
  trend <- sample_predictive_trend(m, df, iteration)

  beta <- m$params$beta[iteration,]
  Xb_a = as.matrix(seasonal.features) %*% (beta * s_a) * m$y.scale
  Xb_m = as.matrix(seasonal.features) %*% (beta * s_m)

  sigma <- m$params$sigma_obs[iteration]
  noise <- stats::rnorm(nrow(df), mean = 0, sd = sigma) * m$y.scale

  return(list("yhat" = trend * (1 + Xb_m) + Xb_a + noise,
              "trend" = trend))
}

you see that the last argument returns the trend (separately) and the yhat. The betas needed for the seasonality come from the chain (and these betas are needed to compute the seasonality, as per equation 6 in the paper).

This function "sample_model" is called whenever any prediction is called. So basically the same equation that you wrote, yes.

One last point - I think the y.scale part rescales the quantities back to the original scale of y (because the fit is done on a scaled version)

ad 3) Uncertainty term is normally distributed with mu = 0 and sigma that has a noninformative prior (I believe, because they do not set it explicitly in the STAN formulation and I think it defaults to noninformative in this case: https://github.com/facebook/prophet/blob/master/R/inst/stan/prophet.stan#L104).

It is accounted in the final y_hat in the way described above - just sampling from the normal with the corresponding sigma.

One last point - regarding the external regressors, I cannot quickly find it in the source code, but I am pretty sure that corresponding betas are added also in either Xb_a above. It might be a bit confusing notation that they use beta both for seasonality and for external regressors and hence I wrote my answer under the assumption of no external regressors.

Hope that this helps - I am still studying the code and the model class myself :)

@APramov thanks for the thorough description! I can add a couple more details, in case they are useful.

Seasonality is fit by constructing a big matrix of Fourier terms X, and then taking

seasonality = X @ beta

where @ is matrix multiplication. That's the convenient thing about the Fourier representation, is that it makes learning seasonality equivalent to a linear regression. Extra regressors are handled by just adding extra columns to X. So for the purposes of the model inference and prediction, there is no difference between an extra regressor and a seasonality component. In the code, this happens here: seasonal_features is the X matrix, and the extra regressors are just appended in: https://github.com/facebook/prophet/blob/190d3239fd2172c9bfcedd57b7cdefde56108bf8/python/fbprophet/forecaster.py#L754 https://github.com/facebook/prophet/blob/190d3239fd2172c9bfcedd57b7cdefde56108bf8/R/R/prophet.R#L885

The parameters are all stored in m.params (Py, or m$params R). If you do MCMC with n_samp samples, then m.params['beta'] will be an n_samp x n_param matrix, where n_param is the number of seasonal components (that is, the number of columns of X in the X @ beta part of the model).

As you describe, uncertainty in yhat is computed with MC sampling from the generative model using uncertainty_samples samples (which is a specifiable parameter). Each of these is a simulation of the future, in which a future trend is sampled, and then seasonality and regressors are added to it, followed by randn noise. The simulation happens in the sample_model function identified by @APramov. If MCMC was done for the parameters, then these future simulations will be done with different samples from the parameter posteriors. For instance, if uncertainty_samples is 1000 and there are 200 MCMC samples for the parameters, then each MCMC sample of the parameters will be used for 5 of the future simulations. This happens here: https://github.com/facebook/prophet/blob/190d3239fd2172c9bfcedd57b7cdefde56108bf8/R/R/prophet.R#L1499

So yes, the trend and seasonality samples are plugged into the model to produce a sample of yhat, and then bounds come from taking quantiles across those samples.

@bletham Really helpful info! I have a situation where I am looking at using prophet less as a pure forecasting tool and more as a means to decompose a time series, drawing inference about the impact of interventions on sales data - specially the impacts of promotions.

If I train on daily data (1/1/2017 - 09/30/2019 for example) and for say a week in Sept 2019 I have (to make it simple) an indicator set to 1 (zero all other times). I want to accumulate a credible interval over the impact of this event during this week. Can this be done? I was thinking I would need to do the following:

Determine which model.params['beta'] columns is the beta I am interested in
Summarize an interval of interest

This is just a linear regression correct, so the mean value of the beta of interest is the average effect each day that the promotion was active (denoted by a '1' in the row for the training data)?

I am confused which rows of model.params['beta'] I should use - all of them or only some based on uncertainty and mcmc draw? I am unsure the distinction. If I set both uncertainity samples and mcmc to 1000, model.params['beta'] has 2000 rows.

As an aside, I saw in another issue that it was suggested the easiest way to get access to the coefficients was to divide x*beta in the forecasted data frame by the regressor (including history in my case since it is the training data period I am interested in). If I do this I get a difference value for each record - shouldn't the coefficient be fixed?

Let me clarify something about mcmc_samples and the shape of model.params['beta']. Stan by default does MCMC with four chains, and discards the first half as warm-up. mcmc_samples is the total number of iterations per chain, so after discarding the warm-up there are 2 * mcmc_samples total samples in model.params['beta']. uncertainty_samples is used only at prediction time and so changing that won't affect beta.

So each column of model.params['beta'] is the coefficient for a particular component of the model. #361 describes how you can determine which column corresponds to which component. For any particular coefficient, the 2000 rows are 2000 posterior samples for that coefficient, so if you wanted to see if it was significantly different than 0, you would look at that distribution.

You would probably be most interested in getting the distribution of the effect size (that is, the amount of y attributed to the regressor) - there isn't a direct way to do this, and things are complicated a bit by normalization/scaling happening in both y and the extra regressor.

When you call predict, the forecast dataframe has a column for the extra regressor (let's call it z) that has the amount of y that is being explained with z. Throughout the documentation and issues this is described as beta * z, but in reality y is scaled prior to fitting and z is by default standardized prior to fitting unless it is binary. So the actual way to compute this is

forecast['z'] = beta * (z_scaled) * model.y_scale

where z_scaled = (z - mean(z)) / std(z)) if z is standardized (by default unless z is binary), and jus tz otherwise. If you do MCMC, this will be computed with the posterior mean of beta (average across rows of model.params['beta']).

If you take the mean across rows of the correct column and compute forecast['z'] as above, then it should match what is produced in the forecast dataframe by predict. If you compute this separately for each sample of beta (each row), then that will give you a distribution for the effect size of z which is probably what would be most useful.

Thank you for the helpful info, @APramov and @bletham!!

@APramov and @bletham Appreciate the comments - here is a follow-up useful to me and it seems based on the many questions, others...I wanted to check my understanding but it doesnt seem to match...

If I set up a model like this, without standardizing the regressors:

mod = Prophet(growth='linear',seasonality_mode='additive',n_changepoints=25,
                   daily_seasonality=False, weekly_seasonality=False,yearly_seasonality=5,
                     holidays_prior_scale=3,mcmc_samples=2000)

mod.add_regressor('APP_3_LAG',standardize=False)
mod.add_regressor('Q_MOS_3_LAG',standardize=False)

mod.fit(X)

#to forecast 3 months 
future_dates = mod.make_future_dataframe(include_history=True,periods=3, freq='MS')

#add to future_dates the known regressor values and the next 3 for forecasting
future_dates['APP_3_LAG']=np.append(X.APP_3_LAG.values,np.array([4889,5205,4016]))
future_dates['Q_MOS_3_LAG']=np.append(X.Q_MOS_3_LAG.values,np.array([141,149,131]))

#predictions (111 of them - past values of the series and 3 to forecast)
preds=mod.predict(future_dates)
preds[['ds','yhat','yhat_lower','yhat_upper']].tail()

#shows that the regressor APP_3_LAG is at col index 10 and Q-MOS_3_LAG is at 11
mod.train_component_cols

#here are 2000 simulated trend and yhat for each of the dates in future_dates (111 dates)
ps=mod.predictive_samples(future_dates)

Question 1: If I average the yhat for each date will it match the original yhat from preds above? NO

pd.Series(np.mean(ps['yhat'],axis=1)[-5:])

Question 2: Here I add to the prediction df, the original values of 'APP_3_LAG' and divide these values to see coefficient for 'APP_3_LAG'

preds['APP_3_LAG_INPUT']=future_dates['APP_3_LAG']
preds['Is_This_the_COEFFICIENT']=preds['APP_3_LAG']/preds['APP_3_LAG_INPUT']
preds[['ds','APP_3_LAG','APP_3_LAG_INPUT','Is_This_the_COEFFICIENT']].tail()

The coeff for APP_3_LAG appears to be 4495.360 (each unit increase in APP_3_LAG adds 4495.360 to the outcome y)

but…..It doesnt match

np.mean(mod.params['beta'][:,10]*mod.y_scale) 4205.19654601779

This should be the method to get the coeff for the regressors though correct (GIVEN standardized regressor variables)?

beta_x=mod.params['beta'][10]
beta_unstandardized_x=(beta_x/(standard_dev_of_x) )*mod.y_scale

Question 3:

Should we be able to reconstruct exactly the predictions from predict(future_dates)using the trend from mod.predictive_samples(future_dates) and mod.params['beta'] and somehow get the seasonal values? It would be incredibly useful to essentially get the entire posterior distribution to be able to summarize and explore the model.

1: yhat in output of predict vs output of predictive_samples

There are two differences between yhat as computed in predict and in predictive_samples that make it so these will not necessarily be equal. The first difference is that in predict, the trend is computed using the posterior mean point estimate, not as the posterior mean of the trend samples. So if you compare forecast['trend'] with samples['trend'].mean(axis=1) you will see that they are very slightly different (the difference will be larger for the logistic trend, where there is a nonlinear relationship between parameter and trend). The second source of difference is that in predict, the noise term is left out when computing yhat (since it is mean 0). In predictive_samples, on the other hand, each yhat sample has the noise term added to it. If you compute samples['yhat'].mean(axis=1) you will see that the result is different each time - this is because of averaging over the noise. This random noise is the most significant source of difference between forecast['yhat'] and samples['yhat'].mean(axis=1). If you increase the number of samples it will reduce the variance of samples['yhat'].mean(axis=1) and they will start to be more similar, though some small difference will remain from the 1st point above.

2: Relationship between beta and the value in the forecast dataframe That is the correct relationship. I'm not sure what is happening for you to see a different result. When I run this code using the example csv (https://github.com/facebook/prophet/blob/master/examples/example_wp_log_peyton_manning.csv), they match exactly:

from fbprophet import Prophet
import pandas as pd
import numpy as np

df = pd.read_csv('../examples/example_wp_log_peyton_manning.csv')
df = df.loc[:60,]  # Limit to first two months
z = np.linspace(0, 4, 61) ** 2  # Make up a nonlinear extra regressor
df['z'] = z
df['y'] = df['y'] + z

m = Prophet(mcmc_samples=1000).add_regressor('z', standardize=False)
m.fit(df)

future = m.make_future_dataframe(periods=5)
future['z'] = np.hstack((z, 16. * np.ones(5)))  # Give z=16 on the 5 future values
forecast = m.predict(future)

m.train_component_cols  # z is column 6

np.mean(m.params['beta'][:, 6] * m.y_scale)
# yields 0.9653219145804971

forecast['z'].values[-5:] / 16.  # 16 is the input value
# yields array([0.96532191, 0.96532191, 0.96532191, 0.96532191, 0.96532191])

3: Reconstructing the output of predict The actual predict method is fairly lightweight and calls out to other methods to do most of the work, so you can see how to reconstruct the output of predict by looking at the code there: https://github.com/facebook/prophet/blob/e1a2b9c2977a0817eb245c704d2e478211585886/python/fbprophet/forecaster.py#L1154

You will not be able to reconstruct exactly the output of predict using predictive_samples because of the differences described in (1) above. But you can see in the code where each of the components is predicted. In particular seasonality comes from m.predict_seasonal_components(df) (important note: the df here is not the raw future dataframe, you can see in line 1175 that you have to call df = m.setup_dataframe(future.copy()) which does the normalizing). That function is itself pretty lightweight: https://github.com/facebook/prophet/blob/e1a2b9c2977a0817eb245c704d2e478211585886/python/fbprophet/forecaster.py#L1284 and you can see that all it is doing is computing beta * X for each sample of beta, and then taking percentiles to compute _lower and _upper.

I agree it would be great to have predictive_samples return more model components, it's just a matter of plumbing.

facebook / prophet

Compute confidence intervals and posteriors #1145