WillianFuks / tfcausalimpact

Python Causal Impact Implementation Based on Google's R Package. Built using TensorFlow Probability.
Apache License 2.0
600 stars 72 forks source link

Question: How can I extract the betas like in the R MarketMatching package ? #27

Open sarisboo opened 2 years ago

sarisboo commented 2 years ago

Hi Will,

Thanks for converting causal impact to python! In my company, we are currently trying to convert a causal impact analysis in R (it used the MarketMatching package) to python.

We hit a small roadblock which is how to extract the values from what would be the equivalent of the analyze_betas() function in R. This is the function in R I am talking about. Do you think there is a way to extract these values using the _fit_model() method or any other method in tfcausalimpact package ?

Kind Regards,

Sara

WillianFuks commented 2 years ago

Hi Sara,

Unfortunately I'm not well acquainted with this R package (despite seeing more questions related to it before) nor how they interpret the beta evaluations.

But following the function you referenced it's seems to be feasible doing the same using the Python package.

First two lines could be replaced with a pandas Dataframe:

betas <- data.frame(matrix(nrow=11, ncol=4))
names(betas) <- c("SD", "SumBeta", "DW", "MAPE")

Then the looping from 0 to 20 would be more cumbersome in Python because the package is slower than R (thanks to differing algorithms implementations). This probably would have to be implemented using multiprocessing, like ProcessExecutor or multiprocessing.Poll.

A function that receives an integer would be required, something like:

def run_ci(i):
    step = (max(0.1, prior_level_sd) - min(0.001, prior_level_sd)) / 20
    sd  = min(0.001, prior_level_sd) + step * i
    m = CausalImpact(ts, pre_period, post_period, alpha=alpha, model_args=args)
    ...
    # (rest of calculations)
    ...

And then execute the whole thing in parallel:

with ProcessPoolExecutor() as executor:
      results = executor.map(run_ci, range(20))

Still in Python it should be noted this is going to consume way more resources (CPU and memory) than R does. TFP team is implementing new algorithms so maybe we'll have faster algs soon which will make this implementation much lighter.

Unfortunately I do not have free time for now for implementing this package in Python as well, if I get the opportunity eventually I might give it a try :).

Hope this helps,

Best,

Will