facebookincubator / GeoLift

GeoLift is an end-to-end geo-experimental methodology based on Synthetic Control Methods used to measure the true incremental effect (Lift) of ad campaign.
https://facebookincubator.github.io/GeoLift/
MIT License
181 stars 55 forks source link

GeoMarketSelection and GeoLiftPower give different results for power #146

Closed Snowcatcat closed 1 year ago

Snowcatcat commented 1 year ago

@ArturoEsquerra Hi Arturo,

I'm trying to follow the example provided in the walkthrough and found that the results for effect size are different for the same set of inputs under GeoMarketSelection and GeoLiftPower, specifically for market_id =1, when the test markets are: chicago, cincinnati, houston, portland.

When we run the sample code for the GeoMarketSelection sample code, we get (market_id =1) chicago, cincinnati, houston, portland as one of the top 2 ranked test markets. The EffectSize is 0.05, and the Average_MDE is 0.04829. So the minimum lift required to have a well-powered test for this particular market selection is 5%, if I'm not mistaken.

GeoMarketSelection

I tried to run a similar test now using the sample code for GeoLiftPower, instead of having match_id = 2, as shown in the example, I changed it to market_id =1. I also changed the lookback_window to 1, consistent to what we had in the Market Selection step (GeoMarketSelection ). However, now the EffectSize becomes 0.09. Scaled2Imbalance, Investment and ATT are all different from what we saw in the previous market selection step. Would you know why? Thanks

GeoPowerLift1 GeoPowerLift2 GeoPowerLift3
ArturoEsquerra commented 1 year ago

Hi @Snowcatcat and thanks for opening this GH issue!

There shouldn't be any differences between these two results. Could you share the code you used to run the MarketSelection?

Thanks! Arturo

Snowcatcat commented 1 year ago

Thanks for your reply! @ArturoEsquerra It's the second screenshot in my question. Let me paste it here as well:

market_id = 1 market_row <-MarketSelections$BestMarkets %>% dplyr::filter(ID == market_id) treatment_locations <- stringr::str_split(market_row$location, ",")[[1]] treatment_duration <-market_row$duration lookback_window <- 1 ### not 7 days but 7 possible tests (7 * 10 or 15 days)

########Use GeoLiftPower(Power Calculation for GeoLift for known test locations) to conduct more power analysis######

power_data <-GeoLiftPower( data = GeoTestData_PreTest, locations = treatment_locations, effect_size = seq(0, 0.5, 0.01), lookback_window = lookback_window, treatment_periods = treatment_duration, cpic = 7.5, fixed_effects = TRUE, side_of_test = "two_sided" )

plot(power_data, show_mde = TRUE, smoothed_values = TRUE, breaks_x_axis = 10) + labs(caption = unique(power_data$location))

ArturoEsquerra commented 1 year ago

Hi @Snowcatcat! I meant the inputs to GeoLiftMarketSelection()

Snowcatcat commented 1 year ago

@ArturoEsquerra sorry for the confusion. I just used the sample code for market selection and I was able to get exactly the same results as shown in the walkthru:

GeoMarketSelection1 GeoMarketSelection2
ArturoEsquerra commented 1 year ago

Hi again @Snowcatcat and thanks for sharing more information. The reason why the results aren't exactly the same is that GeoLiftPower() and GeoLiftMarketSelection() are using different granularities in the simulated lift (specifically in the effect_size parameter). In the latter we are increasing the simulated effects by 5% while in the former the jumps between simulations are just of 1%. Due to this difference, the first iteration of GeoLiftMarketSelection() that found a well-powered test was with effect_size = 10% (it first tried with 5% but that ES didn't provide a robust model so it "jumped" to 10%). In contrast, GeoLiftPower() was able to perform smaller jumps between iterations and it found that one with effect_size=9% was good enough.

If you specify the same sequence in the effect_size parameter, you'll obtain the same results with both functions.

Snowcatcat commented 1 year ago

@ArturoEsquerra Thanks again. The effect size for the first test (market_id =1) should be 5%, right? The effect size is 10% when market_id=2, but we are looking at market_id =1 (which is: chicago, cincinnati, houston, portland) As shown in the walkthru:

GeoMarketSelection3

This 5% can also been when we plot the results for MarketSelections: image

The walkthru also shows an example for GeoPowerLift for maket_id =2 (chicago, portland), not for market_id=1. I was using the sample code and changed only 1 parameter to look at the effect size for market_id =1, but getting different results.

The ScaledL2Imbalance for the test (marekt_id =1 ) is 0.1971864, when using the GeoMarketSelect() The ScaledL2Imbalance for the test (marekt_id =1 ) is 0.259691, when using the GeoPowerLift()