WillianFuks / tfcausalimpact

Python Causal Impact Implementation Based on Google's R Package. Built using TensorFlow Probability.
Apache License 2.0
600 stars 72 forks source link

On confidence intervals and uncertainty intervals #11

Closed IvanUkhov closed 3 years ago

IvanUkhov commented 3 years ago

Given that this is a Bayesian method, it is strange that the uncertainty is summarized using confidence intervals as opposed to Bayesian uncertainty/probability/credibility intervals via, for instance, HDIs and also taking the mean instead of the median as the point estimate. @WillianFuks, I am curious to hear your thoughts on what is in compile_posterior_inferences:

https://github.com/WillianFuks/tfcausalimpact/blob/master/causalimpact/inferences.py#L52

Having computed samples of the target time series, it should be straightforward to summarize them pointwise via, say, hdi in arviz. Or do I miss something?

WillianFuks commented 3 years ago

Hi @IvanUkhov ,

The intervals computed not only in this package but in the original R's already are the credible intervals. In fact, the original paper even compares confidence intervals with their credible ones at page 265 (or 19 in this pdf). Before latest paragraph, it says "The latter is expected, given that our intervals represent prediction intervals, not confidence intervals...".

What probably happened here is that throughout the code I wrote "confidence" instead of "credible" which is really a typo. The tensorflow_probability used in this package also returns credible intervals.

One point to consider though is that for this package I computed the lower and upper pointwise predictions using a similar approach as described by tfp comment referenced above, i.e., using z-scores with their means and stddevs whereas the original R package do so by selecting the proper percentiles via the posterior trajectory samples.

I'll see what happens by doing the same (I'm not expecting results to change much but it's probably worth the trial).

As for the mean vs median, as the model is built upon Kalman filters then posterior at each point t follows a normal distribution so it'd probably give the same results IIUC.

IvanUkhov commented 3 years ago

Thank you for the extended answer! Yes, I was referring to the logic based on a critical value and a standard deviation, which gives the right interval only if the underlying distribution is Gaussian. There is a lot of Gaussianity everywhere, but then the priors are non-Gaussian, which should lead to non-Gaussian posteriors in general. I suppose, if fitted via variational inference, the resulting posterior is indeed a product of independent Gaussians, but with MCMC, it should not be the case. It seems to be an unnecessary assumption.

Please do not get me wrong. I am by no means criticizing. It is a great package, and I simply wanted to discuss to see if there is anything that can be made even better.

WillianFuks commented 3 years ago

Hi @IvanUkhov ,

Please feel free to ask as many questions as you like :)!

I think I better understand now what you mean. Notice that despite of which distribution we choose for the initial state model (in fact, default model of this package uses an inverse gamma for local level component) the probability of y` (prediction of observed time series) will always follow a normal distribution.

This is as per implementation of the LinearGaussian which holds for both Python and R packages. You can see that for TFP the probability of the state for time t is already a normal from samples taken from the chosen priors.

So extracting credible intervals either with z-scores or percentiles could be considered safe operations as it's guaranteed they'll be applied on Gaussians, regardless of the posterior of the state model we choose.

WillianFuks commented 3 years ago

Hi @IvanUkhov ,

Newest version of tfcausalimpact is already publicly available. Any issues please let me know :)

IvanUkhov commented 3 years ago

Hello, thank you for the response! I was hoping to get more answers before I share this, but my reasoning is as follows:

https://github.com/tensorflow/probability/issues/1252

What are your thoughts? Would you agree that the predictive distribution is non-Gaussian?

WillianFuks commented 3 years ago

That's an interesting question, I'm not quite sure now what's the answer.

AFAIK, the predictive posterior follows something like (considering local level as an example):

image

(these equations were taken from Wei Yi's medium post).

So the predictive posterior should follow a normal from the posterior of the computed states. But then when we see in the tfp.sts.forecast code as you mentioned there's indeed the returning of the MixtureSameFamily which uses as input a previous state model already fitted with the posterior samples of the parameters. In the comments they state something like: "build a batch over state space models (...)" so I'm not sure if this is some technique to improve precision (just guessing).

I'm also awaiting (and hoping) the TFP team will give us some help here :)

IvanUkhov commented 3 years ago

The above formula is for one particular assignment of the model parameters, and it is simply the Kalman filter. But from the inference, we get a full posterior for the parameters in the form of samples. For each sample, we get the above expression, and the final distribution is a mixture of these Gaussian ones. Recall what distinguishes structural time series from Bayesian structural time series. It is one extra component: Bayesian model averaging.

But again, this is just my interpretation. Yes, let us hope someone will answer in https://github.com/tensorflow/probability/issues/1252 🙂

WillianFuks commented 3 years ago

Maybe steve-the-bayesian can also help on this. He's the original creator of the R bsts package.

IvanUkhov commented 3 years ago

I have also asked on Stack Exchange:

https://stats.stackexchange.com/questions/512666/on-the-predictive-distribution-of-a-bayesian-structural-time-series-model

IvanUkhov commented 3 years ago

Here we have got an answer from the author of forecast:

https://github.com/tensorflow/probability/issues/1252

The conclusion is that the predictive distribution is non-Gaussian in general.

WillianFuks commented 3 years ago

Nice! Now it makes sense why they have the MixtureFamily.

I don't quite understand why they create the state space model twice but this is probably related to the inner workings of their implementation.

IvanUkhov commented 3 years ago

I close this then. I think switching to quantile intervals was a good move. Next time around, one can consider HDIs, but probably it would not make much of a difference unless very skewed distributions are expected.

Thank you, and sorry for this much noise here 🙂

IvanUkhov commented 3 years ago

I ended up writing a custom function for summarizing inferences, as I wanted to have medians and HDIs instead of means and quantile intervals. I will leave it here in case it can be helpful some time in the future:

def _summarize_original(
    data_before: pd.DataFrame,
    data_after: pd.DataFrame,
    posterior: tp.distributions.Distribution,
    predictive: tp.distributions.Distribution,
    standardization: Tuple[float, float],
    alpha: float = 0.05,
    draw_count: int = 1000,
    random_state: int = 42,
) -> pd.DataFrame:
    from causalimpact.inferences import build_cum_index
    from causalimpact.misc import maybe_unstandardize

    def _hdi(data: np.ndarray) -> np.ndarray:
        from arviz.stats import hdi
        return np.array([hdi(data, 1 - alpha) for data in data.T]).T

    y_before = predictive.sample(draw_count, seed=random_state)
    y_before = maybe_unstandardize(np.squeeze(y_before.numpy()), standardization)
    y_after = posterior.sample(draw_count, seed=random_state)
    y_after = maybe_unstandardize(np.squeeze(y_after.numpy()), standardization)

    pre_preds_means = np.median(y_before, axis=0)
    pre_preds_lower, pre_preds_upper = _hdi(y_before)
    pre_preds_means = pd.Series(pre_preds_means, index=data_before.index)
    pre_preds_lower = pd.Series(pre_preds_lower, index=data_before.index)
    pre_preds_upper = pd.Series(pre_preds_upper, index=data_before.index)

    post_preds_means = np.median(y_after, axis=0)
    post_preds_lower, post_preds_upper = _hdi(y_after)
    post_preds_means = pd.Series(post_preds_means, index=data_after.index)
    post_preds_lower = pd.Series(post_preds_lower, index=data_after.index)
    post_preds_upper = pd.Series(post_preds_upper, index=data_after.index)

    complete_preds_means = pd.concat([pre_preds_means, post_preds_means])
    complete_preds_lower = pd.concat([pre_preds_lower, post_preds_lower])
    complete_preds_upper = pd.concat([pre_preds_upper, post_preds_upper])

    data = pd.concat([data_before, data_after])
    point_effects_means = data.iloc[:, 0] - complete_preds_means
    point_effects_upper = data.iloc[:, 0] - complete_preds_lower
    point_effects_lower = data.iloc[:, 0] - complete_preds_upper

    z_after = np.cumsum(data_after.iloc[:, 0].values - y_after, axis=1)
    post_cum_effects_means = np.median(z_after, axis=0)
    post_cum_effects_lower, post_cum_effects_upper = _hdi(z_after)
    index = build_cum_index(data_before.index, data_after.index)
    post_cum_effects_lower = pd.Series(
        np.concatenate([[0], post_cum_effects_lower]).ravel(),
        index=index,
    )
    post_cum_effects_means = pd.Series(
        np.concatenate([[0], post_cum_effects_means]).ravel(),
        index=index,
    )
    post_cum_effects_upper = pd.Series(
        np.concatenate([[0], post_cum_effects_upper]).ravel(),
        index=index,
    )

    data = dict(
        complete_preds_means=complete_preds_means,
        complete_preds_lower=complete_preds_lower,
        complete_preds_upper=complete_preds_upper,
        point_effects_means=point_effects_means,
        point_effects_lower=point_effects_lower,
        point_effects_upper=point_effects_upper,
        post_cum_effects_means=post_cum_effects_means,
        post_cum_effects_lower=post_cum_effects_lower,
        post_cum_effects_upper=post_cum_effects_upper,
    )
    return pd.DataFrame(data)