facebookincubator / GeoLift

GeoLift is an end-to-end geo-experimental methodology based on Synthetic Control Methods used to measure the true incremental effect (Lift) of ad campaign.
https://facebookincubator.github.io/GeoLift/
MIT License
175 stars 54 forks source link

How to select the Best Markets out of GeoLift Results? #195

Open ankitpeltoton opened 2 weeks ago

ankitpeltoton commented 2 weeks ago

Bug description

On running the GeoLift package with 80 markets, the model is giving like 1000 combinations among which Rank 1 shows that the AvgScaledL2Imbalance = 0.54 which is a very high value. I believe AvgScaledL2Imbalance is almost like the model fit and a value very close to 0 would indicate a good and unbiased model fit.

Q1) Need to understand how does the model gives this ranking? Like on what factors does the model gives the ranking? In my case

Session information

Please paste the output after running sessionInfo() in your R session.


library(GeoLift)

GeoLift_PreTest <- read.csv('geo_lift_transformed.csv')

GeoTestData_PreTest <- GeoDataRead(data = GeoLift_PreTest,
                                   date_id = "week",
                                   location_id = "DMA",
                                   Y_id = "traffic",
                                   X = c(), #empty list as we have no covariates
                                   format = "yyyy-mm-dd",
                                   summary = TRUE)

GeoPlot(GeoTestData_PreTest,
        Y_id = "Y",
        time_id = "time",
        location_id = "location")

MarketSelections <- GeoLiftMarketSelection(data = GeoTestData_PreTest,
                                           treatment_periods = c(35,42,49,56,63), #Duration
                                           N = c(10, 15, 20, 25), # Markets
                                           Y_id = "Y", 
                                           location_id = "location",
                                           time_id = "time",
                                           effect_size = seq(0, 0.25, 0.05), #Lift
                                           lookback_window = 1,
                                           holdout = c(0.2, 1.0), 
                                           cpic = 60,
                                           alpha = 0.05,
                                           Correlations = TRUE,
                                           fixed_effects = TRUE,
                                           side_of_test = "one_sided"
)

Reproduction steps

We have 3 columns in our data i.e. week, DMA and traffic. The traffic data is present across multiple DMAs at a weekly level. We are passing all this data to the GeoDataRead to get data in our required format. Once that is done passing it to the GeoLift package with the below arguments :

  1. treatment_periods : c(35,42,49,56,63) which indicates we want the model to test it out across 5, 6, 7, 8, 9 weeks of testing
  2. N = c(10, 15, 20, 25) which indicates we want the model to test it out across 10, 15, 20 and 25 DMAs in the test markets.
  3. effect_size = seq(0, 0.25, 0.05) which indicates we want the model to test for effect size ranging from 0 to 0.25 in increments of 5%.
  4. Holdout = c(0.2, 1.0) which indicates we want the model to test with holdouts ranging from 20% to 100% holdout.
  5. cpic = 60 is a value which we have given based on previous experimentations.

Expected behavior

I believed that the RANK 1 should be having a very less AvgScaledL2Imbalance value and would be reliable to pick those markets in order to get statistical significant results. But right now, I see the opposite trend in the model where initial ranks are having a AvgScaledL2Imbalance of high value and higher ranking options are having a AvgScaledL2Imbalance value of lowest. How can this happen? And how do we select the BEST markets among all options?

## Output

duration = 49
Effect Size = 0.05
Power = 1
AvgScaledL2Imbalance = 0.59
Holdout = 0.79
Rank = 1

Additional context

No only need to understand how to select the best markets out of all these 1000 combinations. Suggestions would be really helpful.

ankitpeltoton commented 1 week ago

hello! wondering if there are any thoughts on this issue? Kindly let me know. Thanks!

michael-khalil commented 1 week ago

Q1) Ranking is based on a combination (1) minimum detectable effect, (2) Scaled L2 imbalance (3) Absolute Lift in Zero. Essentially we want to find market combinations that detect as small a lift as possible, where the synthetic control method shows improvement over a raw simple average and does not detect lift when no lift is present (i.e. doesn't give false positives).

A higher Scaled L2 imbalance (close to 1) means there is little to no benefit with regards to using synthetic controls vs. just simply comparing a simple average of control to treatment markets. It doesn't necessarily mean that the markets are not useable. A lower Scaled L2 imbalance (close to 0) however does mean there is close to a perfect fit between treatment and control markets.

Q2) Power is a function of the number of simulations you run as set by the look back window parameter. If you run just one simulation your power (the number of simulations that detected lift when it was simulated) is either 0 or 100%. Typically we use 1 so that the code runs faster and then use a larger lookback window with the best combinations to ensure the results generalize over multiple simulations/time periods. This is using the GeoLiftPower function.

Q3) In addition to the three variables used to Rank, you can look at correlation, the % of historical conversions in treatment/holdout and dispersion of locations throughout your geography (this one would be manual).