WillianFuks / tfcausalimpact

Python Causal Impact Implementation Based on Google's R Package. Built using TensorFlow Probability.
Apache License 2.0
620 stars 73 forks source link

Power analysis using UC or pyBats dynamic linear models #103

Open MilaimKas opened 3 months ago

MilaimKas commented 3 months ago

Hello there, first, many thanks for providing a stable and Bayesian based python version of the original R package.

I was previously using the deprecated pycausalimpact package to perform analysis of intervention. Sure it was not Bayesian but was nevertheless providing some useful insights. In addition, I used to perform "power analysis" (so far as this term can be used in the context of time series) to assess the "power" of my model, i.e. to answer the question "what effect size would I be able to detect at a given level of confidence". In order to do that, I would:

I would then obtain a plot similar to this: image Where the blue filled region correspond to the 95% interval and the gray to the 80%. I would then interpret the results as follows: at a 95% level of confidence I would be able to "detect" an effect of ~6% or larger, at a 80% level of confidence I would be able to detect a very small effect (~1%).

Of course, this is a typical frequentest approach, but I still found it very useful to assess the "power" of a model. In addition, since I often encountered the case where it was possible to "tune" the model (varying the pre and post period length, adding or removing certain control variables) to get significant or non-significant result, I also used the same approach described above but, in addition to looping over effect size, I would also loop over pre- and post-period lengths. This would give me an idea on how stable the results are with respect to these parameters.

This approach needs many of causal impact calculations, which is not possible with the current tensor flow based model. I was therefore wondering how difficult it would be to allow the user to switch from a tensor flow bsts model to, for example, a UC model ? The idea would be to perform these "power analysis" with the simple model, decide on which parameters to use, and asses the final impact using the more accurate tensor flow model. Alternatively, to stay within the Bayesian framework, one could also rely on Normal approximations and used conjugated priors to get the posteriors. I believe, this is what the python package pyBats does.

WillianFuks commented 3 months ago

Hi @MilaimKas ,

I like this idea. I do consider doing a refactoring to implement the concept of Engines which would add to the API the option to choose which algorithm to select on the fitting process.

I still couldn't work on it though, I suspect it'd take around an entire month to implement.

I'll leave this one open, hopefully I can start working on it soon - if the community wants to contribute ideas and PRs they are upmost valuable as well.

As for your approach on power analysis, I couldn't entirely understand your approach but it looks like you have the necessary confirmations that it's working. One idea that came to mind was to fit a sts model on the posterior and sample various series from it and then count how many fall beyond the 95% series of the counter-factual. The proportion between all series would yield - I suppose - something that resembles a power analysis as well.

MilaimKas commented 3 months ago

Hi @WillianFuks Many thanks for your answer. Indeed, your suggestion using samples from the posterior to get the "power" should lead to similar results. The only difference is that, within my approach, I can add some additional effect such as noise and lag to simulate the intervention, but to be honest, it's quite difficult to justify actual values for both effects. I am not sure about how to interpret this Bayesian-frequentist mixing approach but, as you pointed out, I had the confirmation that it's working :) Would this approach to power analysis be possible with the current version of the package ?

WillianFuks commented 3 months ago

To run the analysis as I mentioned probably you'd need to use tfp directly to fit a model on the posterior, i.e., current tfci package won't offer you this analysis by default.