WillianFuks / tfcausalimpact

Python Causal Impact Implementation Based on Google's R Package. Built using TensorFlow Probability.
Apache License 2.0
600 stars 72 forks source link

Multiple controls supported? #42

Closed ocesp98 closed 2 years ago

ocesp98 commented 2 years ago

Hey, I was wondering if multiple controls were supported?

For instance, imagine you have one treated country and multiple untreated countries and you want to estimate the effect (on revenue e.g.) of marketing in the treated country with respect to the untreated control countries. Is it possible to indicate which country is the treated one and which are the control groups? Otherwise, if you feed the revenue for each day for each country to the causal model, how does it know that country 1 is treated and the others are untreated (and should hence be used to create the synthetic control)?

Thanks in advance!

WillianFuks commented 2 years ago

Hi @ocesp98 ,

You sure can! Notice in the code itself that the very first column is always considered to be the response variable y (or in your case, the country that received the marketing campaign).

Other columns are, therefore, the controls that join a linear regression later on for explaining y (these would be your untreated countries).

So your data can be something like:

pd.DataFrame([[1, 2, 3], [2, 3, 4], [5, 6, 7]])

The vector [1, 2, 5] would be your y and the rest would be your control X. The getting started notebook has a few more examples that might help you out.

Hope that helps,

Will

ocesp98 commented 2 years ago

Yes I see, this helps. Thank you very much! One final question, this case described above only considered the outcome variable revenue, but what if you wanted to consider multiple covariates and multiple countries. At the moment, you have for each country the revenue with the first column being the target country. What if you for instance want to additionally control for market size etc?

WillianFuks commented 2 years ago

You can add as many control variables as you see fit, as long as they follow a temporal series. But you can only have one response variable, so you can investigate for instance just the impact on revenue and nothing else.

If you also want to analyze the impact on market size then you'd have to run another script but this time the first column would contain the market size you want to investigate.

Hope that helps,

Will

ocesp98 commented 2 years ago

Okay thanks a lot! Is there any way to see the weights assigned to each of the covariates? I tried the ci.model_samples[2].numpy().mean(axis=0), but it only gives back one value. I would like to know the importance of each control country to get an estimate of the target country :)

WillianFuks commented 2 years ago

Hi @ocesp98 ,

I thought I had already answered your question, just now that you closed that I realized I didn't. Sorry for that.

So as for your question, I'd recommend reading the getting_started notebook, you'll find some examples that might help you a little bit on finding the impact that each covariate has on observed variable.

Note still that it's not a rigorous statistical hypothesis test though, it's simply an averaging applied to the samples of the posterior distribution. It can already be helpful but it's not a rigorous analysis.

Hope that helps!

Best,

Will