facebookincubator / GeoLift

GeoLift is an end-to-end geo-experimental methodology based on Synthetic Control Methods used to measure the true incremental effect (Lift) of ad campaign.
https://facebookincubator.github.io/GeoLift/
MIT License
175 stars 54 forks source link

Power is equal 1 when effect size is 1e-20 #171

Closed sdaza closed 4 months ago

sdaza commented 9 months ago

Bug description

When the effect size is 0 or pretty close to zero, the power by definition should be close to alpha (0.05). When I run the code, power is always equal to 1.

Session information

R version 4.2.3 (2023-03-15)
Platform: aarch64-apple-darwin22.3.0 (64-bit)
Running under: macOS 14.0

Matrix products: default
BLAS:   /opt/homebrew/Cellar/openblas/0.3.23/lib/libopenblasp-r0.3.23.dylib
LAPACK: /opt/homebrew/Cellar/r/4.2.3/lib/R/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.1.4   GeoLift_2.7.4 pwr_1.3-0    

loaded via a namespace (and not attached):
 [1] tidyr_1.3.0            jsonlite_1.8.4         foreach_1.5.2         
 [4] Formula_1.2-5          assertthat_0.2.1       doRNG_1.8.6           
 [7] progress_1.2.2         globals_0.16.2         MarketMatching_1.2.0  
[10] pillar_1.9.0           lattice_0.20-45        glue_1.6.2            
[13] quadprog_1.5-8         uuid_1.1-0             digest_0.6.33         
[16] colorspace_2.1-0       sandwich_3.0-2         htmltools_0.5.5       
[19] Matrix_1.6-3           plyr_1.8.9             pkgconfig_2.0.3       
[22] listenv_0.9.0          purrr_1.0.2            xtable_1.8-4          
[25] mvtnorm_1.2-3          scales_1.2.1           dtw_1.23-1            
[28] tibble_3.2.1           proxy_0.4-27           bsts_0.9.9            
[31] generics_0.1.3         ggplot2_3.4.4          withr_2.5.2           
[34] repr_1.1.6             cli_3.6.1              magrittr_2.0.3        
[37] crayon_1.5.2           evaluate_0.23          future_1.33.0         
[40] fansi_1.0.5            Boom_0.9.11            parallelly_1.36.0     
[43] doParallel_1.0.17      MASS_7.3-58.2          CausalImpact_1.3.0    
[46] xts_0.13.1             prettyunits_1.2.0      tools_4.2.3           
[49] directlabels_2023.8.25 hms_1.1.3              augsynth_0.2.0        
[52] lifecycle_1.0.4        stringr_1.5.1          munsell_0.5.0         
[55] BoomSpikeSlab_1.2.5    rngtools_1.5.2         compiler_4.2.3        
[58] lfe_2.9-0              panelView_1.1.17       rlang_1.1.2           
[61] grid_4.2.3             pbdZMQ_0.3-9           iterators_1.0.14      
[64] IRkernel_1.3.2         base64enc_0.1-3        gtable_0.3.4          
[67] codetools_0.2-19       abind_1.4-5            reshape2_1.4.4        
[70] R6_2.5.1               gridExtra_2.3          zoo_1.8-12            
[73] knitr_1.45             fastmap_1.1.1          utf8_1.2.4            
[76] gsynth_1.2.1           stringi_1.8.2          parallel_4.2.3        
[79] IRdisplay_1.1          Rcpp_1.0.11            vctrs_0.6.4           
[82] tidyselect_1.2.0       xfun_0.41     

Reproduction steps

Using the data from your tutorials:

library(GeoLift)
data(GeoLift_PreTest)
GeoTestData_PreTest <- GeoDataRead(data = GeoLift_PreTest,
    date_id = "date",
    location_id = "location",
    Y_id = "Y",
    X = c(),
    format = "yyyy-mm-dd",
    summary = TRUE)

MarketSelections <- GeoLiftMarketSelection(data = GeoTestData_PreTest,
        treatment_periods = c(10),
        N = c(3),
        Y_id = "Y",
        location_id = "location",
        time_id = "time",
        effect_size = c(1e-20),
        lookback_window = 1,
        # include_markets = c("chicago"),
        # exclude_markets = c("honolulu"),
        holdout = c(0.5, 1),
        cpic = 7.50,
        budget = 100000,
        alpha = 0.05,
        Correlations = TRUE,
        fixed_effects = TRUE,
        side_of_test = "two_sided")

Output

Power is always equal to 1, doesn't matter how small the effect is.

Setting up cluster.

Importing functions into cluster.

Calculating which the best treatment groups are.

Deterministic setup with 3 locations in treatment.

  ID                                location duration EffectSize Power
1  1         kansas city, milwaukee, oakland       10      1e-20     1
2  2 cleveland, oklahoma city, san francisco       10      1e-20     1
3  3          austin, oakland, oklahoma city       10      1e-20     1
4  4       cleveland, oakland, oklahoma city       10      1e-20     1
5  5       atlanta, los angeles, san antonio       10      1e-20     1
6  6  jacksonville, los angeles, san antonio       10      1e-20     1
  AvgScaledL2Imbalance   Investment    AvgATT Average_MDE ProportionTotal_Y
1            0.2668047 1.720590e-14  598.5357  0.08491638        0.12663961
2            0.2509540 1.556460e-14 -686.9409 -0.09033298        0.12584605
3            0.3286840 1.866390e-14  771.4000  0.10252979        0.13760642
4            0.2699562 1.911405e-14  853.3734  0.11167236        0.14262462
5            0.8007780 9.627300e-15  476.3634  0.12527846        0.06528056
6            0.5992751 8.282925e-15  507.0723  0.15974666        0.05446245
  abs_lift_in_zero   Holdout rank correlation
1            0.085 0.8733604    1   0.9676098
2            0.090 0.8741539    2   0.9626041
3            0.103 0.8623936    3   0.9588542
4            0.112 0.8573754    4   0.9677921
5            0.125 0.9347194    5   0.9141780
6            0.160 0.9455375    6   0.8636818
amanabdullayev commented 7 months ago

Any updates here? I am also facing a similar issue, where I am getting a power of 1, no matter how small the effect size I choose.

sdaza commented 7 months ago

Hi @amanabdulla296 , I got this answer time ago:

Power is computed based on simulation of the historical data. The number of simulations is determined by the look back window. By default in market selection it’s set to 1. So power is either 0 or 100% for each effect size simulated. You can increase the look back window to get more granular power output. This can be done in market selection or using the Geolift power as a second step after market selection to generate more robust power curves. In this small effect size simulation it’s most likely a false positive and the synthetic control is consistently above the treatment. I quick check with more simulations would determine if it’s truly a strong experimental design.

So, I concluded that:

amanabdullayev commented 7 months ago

Okay, thanks for the comprehensive explanation @sdaza !

JussanN commented 4 months ago

Hi, I'm closing this issue as it was answered above. Thank you.