WillianFuks / tfcausalimpact

Python Causal Impact Implementation Based on Google's R Package. Built using TensorFlow Probability.
Apache License 2.0
600 stars 72 forks source link

Credible Interval size #45

Closed DavidJ005 closed 2 years ago

DavidJ005 commented 2 years ago

Hi everybody,

I am currently trying to quantify the effect of marketing campaign on sales data. Although I am pretty sure that the model is able to tell me if the campaign has an effect or not, the quantification of that effect is not very precise (eg: 95% CI size of the relative effect is > 20%).

So I am wondering if anyone was able to get something smaller with real world data. In the example notebook the CI is also pretty big except for the simulated data.

In other words, is there any leverage to reduce the size of the CI ?

Edit: Instinctively for what I understand of Bayesian CI I will answer "no" since the interval is "fixed", but maybe I am wrong.

WillianFuks commented 2 years ago

Hi @DavidJ005 ,

Usually the wide intervals are an expression of a local level component with a wide standard deviation. Local levels are essentially random walks which means the more they are fit to predict the data the less capable the covariates and the model as a whole were to explain it.

In order words, in order to decrease its interval range, best approach would be to find better covariates and better models that explain your observed data. This is heuristic based and you'll have to try several hypothesis to find what works best. Maybe adding seasonal components will help, maybe finding new covariates or new model components will reduce this interval.

Another thing you can try is setting the prior for the local level standard deviation equal to 0.01. This only makes sense if you are confident that the covariates you used are a good fit and a local level component shouldn't be used much to explain observed data.

You can do so by instantiating the ci object with something like:

ci = CausalImpact(data, pre_period, post_period, model_args={'prior_level_sd': 0.01})

It may help you on this case, but it should only be done if you are confident that the covariates are good explanatory variables of the observed data.

Hope that helps,

Best,

Will

DavidJ005 commented 2 years ago

Thanks for the help ! To give some feed back, I did try to lower the 'prior_level_sd', although I had to push it to 0,001. The results are quite good, it tends to reduce the CI size to 6% (ie, >20% => <6%). In general it push the analyses in the edge of the middle area to the positive or negative while keeping some in the middle.

In my case I think it's pretty acceptable to set the hypothesis that the covariates are very explanatory. For example several shops in the same city are good covariates of each other in matter of sales under the hypothesis that they are not close enough to steal each other customers. Plus, in retail there is a lot of one-time events that can create local spikes (eg, a very sunny day, holidays, etc...) without having clear link with "Y", so ignoring these spikes is a very good thing to do since that they are difficult to replicate.

Anyway, it was very helpful.

Cheers !

David