facebookincubator / GeoLift

GeoLift is an end-to-end geo-experimental methodology based on Synthetic Control Methods used to measure the true incremental effect (Lift) of ad campaign.
https://facebookincubator.github.io/GeoLift/
MIT License
182 stars 55 forks source link

CI and P-values Contradict #122

Closed BrianMiner closed 1 year ago

BrianMiner commented 1 year ago

It seems there is an issue with confidence intervals and p-value from GeoLift

Running the walk through code in https://facebookincubator.github.io/GeoLift/docs/GettingStarted/Walkthrough but adding confidence intervals:

GeoTest <- GeoLift(Y_id = "Y",
                   data = GeoTestData_Test,
                   locations = c("chicago", "portland"),
                   treatment_start_time = 91,
                   treatment_end_time = 105,
                   ConfidenceIntervals = TRUE)

summary(GeoTest) 

Produces the following where the cumulative treatment effect is highly significant but the 90% CI for this lift is not? And the CI is pretty enormous.

image

How to reconcile this?

ArturoEsquerra commented 1 year ago

Hi @BrianMiner, thanks for pointing this out!

This is a known issue related to conformal inference when calculating the aggregate confidence interval across all the treatment time periods. Given that the conformal confidence interval depends on a larger number of factors such as the grid of values we use, there's a small chance to see conflicting results between the p-value and CI. With this in mind, we're currently working on further improving the conformal inference procedure for aggregate CIs as well as providing additional information such as plots that can help the user better understand their values! We hope to release this in a new version soon. In the meantime, we recommend using the p-value as the foundation of our aggregate analyses and experimenting with the grid_size parameter of the GeoLift() function whenever you encounter these issues.

Thanks once again for your feedback!

BrianMiner commented 1 year ago

Hmm. Yeah that's definitely an issue. I didnt dig in enough to see if gsynth is an option instead as I know all bootstrap samples are exposed and a p-value of sorts as well as any CI can be generated of course from those.