Power is equal 1 when effect size is 1e-20

sdaza commented 9 months ago

Bug description

When the effect size is 0 or pretty close to zero, the power by definition should be close to alpha (0.05). When I run the code, power is always equal to 1.

Session information

R version 4.2.3 (2023-03-15)
Platform: aarch64-apple-darwin22.3.0 (64-bit)
Running under: macOS 14.0

Matrix products: default
BLAS:   /opt/homebrew/Cellar/openblas/0.3.23/lib/libopenblasp-r0.3.23.dylib
LAPACK: /opt/homebrew/Cellar/r/4.2.3/lib/R/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.1.4   GeoLift_2.7.4 pwr_1.3-0    

loaded via a namespace (and not attached):
 [1] tidyr_1.3.0            jsonlite_1.8.4         foreach_1.5.2         
 [4] Formula_1.2-5          assertthat_0.2.1       doRNG_1.8.6           
 [7] progress_1.2.2         globals_0.16.2         MarketMatching_1.2.0  
[10] pillar_1.9.0           lattice_0.20-45        glue_1.6.2            
[13] quadprog_1.5-8         uuid_1.1-0             digest_0.6.33         
[16] colorspace_2.1-0       sandwich_3.0-2         htmltools_0.5.5       
[19] Matrix_1.6-3           plyr_1.8.9             pkgconfig_2.0.3       
[22] listenv_0.9.0          purrr_1.0.2            xtable_1.8-4          
[25] mvtnorm_1.2-3          scales_1.2.1           dtw_1.23-1            
[28] tibble_3.2.1           proxy_0.4-27           bsts_0.9.9            
[31] generics_0.1.3         ggplot2_3.4.4          withr_2.5.2           
[34] repr_1.1.6             cli_3.6.1              magrittr_2.0.3        
[37] crayon_1.5.2           evaluate_0.23          future_1.33.0         
[40] fansi_1.0.5            Boom_0.9.11            parallelly_1.36.0     
[43] doParallel_1.0.17      MASS_7.3-58.2          CausalImpact_1.3.0    
[46] xts_0.13.1             prettyunits_1.2.0      tools_4.2.3           
[49] directlabels_2023.8.25 hms_1.1.3              augsynth_0.2.0        
[52] lifecycle_1.0.4        stringr_1.5.1          munsell_0.5.0         
[55] BoomSpikeSlab_1.2.5    rngtools_1.5.2         compiler_4.2.3        
[58] lfe_2.9-0              panelView_1.1.17       rlang_1.1.2           
[61] grid_4.2.3             pbdZMQ_0.3-9           iterators_1.0.14      
[64] IRkernel_1.3.2         base64enc_0.1-3        gtable_0.3.4          
[67] codetools_0.2-19       abind_1.4-5            reshape2_1.4.4        
[70] R6_2.5.1               gridExtra_2.3          zoo_1.8-12            
[73] knitr_1.45             fastmap_1.1.1          utf8_1.2.4            
[76] gsynth_1.2.1           stringi_1.8.2          parallel_4.2.3        
[79] IRdisplay_1.1          Rcpp_1.0.11            vctrs_0.6.4           
[82] tidyselect_1.2.0       xfun_0.41

Reproduction steps

Using the data from your tutorials:

library(GeoLift)
data(GeoLift_PreTest)
GeoTestData_PreTest <- GeoDataRead(data = GeoLift_PreTest,
    date_id = "date",
    location_id = "location",
    Y_id = "Y",
    X = c(),
    format = "yyyy-mm-dd",
    summary = TRUE)

MarketSelections <- GeoLiftMarketSelection(data = GeoTestData_PreTest,
        treatment_periods = c(10),
        N = c(3),
        Y_id = "Y",
        location_id = "location",
        time_id = "time",
        effect_size = c(1e-20),
        lookback_window = 1,
        # include_markets = c("chicago"),
        # exclude_markets = c("honolulu"),
        holdout = c(0.5, 1),
        cpic = 7.50,
        budget = 100000,
        alpha = 0.05,
        Correlations = TRUE,
        fixed_effects = TRUE,
        side_of_test = "two_sided")

Output

Power is always equal to 1, doesn't matter how small the effect is.

Setting up cluster.

Importing functions into cluster.

Calculating which the best treatment groups are.

Deterministic setup with 3 locations in treatment.

  ID                                location duration EffectSize Power
1  1         kansas city, milwaukee, oakland       10      1e-20     1
2  2 cleveland, oklahoma city, san francisco       10      1e-20     1
3  3          austin, oakland, oklahoma city       10      1e-20     1
4  4       cleveland, oakland, oklahoma city       10      1e-20     1
5  5       atlanta, los angeles, san antonio       10      1e-20     1
6  6  jacksonville, los angeles, san antonio       10      1e-20     1
  AvgScaledL2Imbalance   Investment    AvgATT Average_MDE ProportionTotal_Y
1            0.2668047 1.720590e-14  598.5357  0.08491638        0.12663961
2            0.2509540 1.556460e-14 -686.9409 -0.09033298        0.12584605
3            0.3286840 1.866390e-14  771.4000  0.10252979        0.13760642
4            0.2699562 1.911405e-14  853.3734  0.11167236        0.14262462
5            0.8007780 9.627300e-15  476.3634  0.12527846        0.06528056
6            0.5992751 8.282925e-15  507.0723  0.15974666        0.05446245
  abs_lift_in_zero   Holdout rank correlation
1            0.085 0.8733604    1   0.9676098
2            0.090 0.8741539    2   0.9626041
3            0.103 0.8623936    3   0.9588542
4            0.112 0.8573754    4   0.9677921
5            0.125 0.9347194    5   0.9141780
6            0.160 0.9455375    6   0.8636818

amanabdullayev commented 7 months ago

Any updates here? I am also facing a similar issue, where I am getting a power of 1, no matter how small the effect size I choose.

sdaza commented 7 months ago

Hi @amanabdulla296 , I got this answer time ago:

Power is computed based on simulation of the historical data. The number of simulations is determined by the look back window. By default in market selection it’s set to 1. So power is either 0 or 100% for each effect size simulated. You can increase the look back window to get more granular power output. This can be done in market selection or using the Geolift power as a second step after market selection to generate more robust power curves. In this small effect size simulation it’s most likely a false positive and the synthetic control is consistently above the treatment. I quick check with more simulations would determine if it’s truly a strong experimental design.

So, I concluded that:

The purpose of the simulation is to evaluate whether I can detect an effect of a specific magnitude conditional on a model. In the case of an extreme scenario (effect size of 0), we are estimating power but at the same time assessing how well specified is the synthetic control model. This could potentially lead to misleading conclusions when designing an experiment.
When evaluating power, it is important to closely examine abs_lift_in_zero. Ideally, it should be zero or pretty close to zero. If it's not, the power estimation is inaccurate and can result in incorrect conclusions about our ability to detect an effect (and reflect only the bias of the model).

amanabdullayev commented 7 months ago

Okay, thanks for the comprehensive explanation @sdaza !

JussanN commented 4 months ago

Hi, I'm closing this issue as it was answered above. Thank you.

facebookincubator / GeoLift