`generate_placebos = TRUE` does not match manual permutation

williamlief commented 3 years ago

Hello,

When we manually permute the tidysynth command chain over units we find that we get different results - particularly for the pre_mspe, post_mspe and mspe_ratio - than when we use generate_placebos = TRUE. It appears as though the designated treated unit is processed differently than the donor units. We see that the output for California matches exactly across both versions.

The following code uses the example from your vignette.

This code can take a while to run, i've also attached compare.csv with the relevant output so you can inspect it more easily.

library(tidysynth)
library(tidyverse)

set.seed(4810)

# Example implementation

smoking_out <-
  smoking %>%
  synthetic_control(outcome = cigsale,
                    unit = state,
                    time = year,
                    i_unit = "California",
                    i_time = 1988,
                    generate_placebos = TRUE) %>%
  generate_predictor(time_window=1980:1988,
                     lnincome = mean(lnincome, na.rm = TRUE),
                     retprice = mean(retprice, na.rm = TRUE),
                     age15to24 = mean(age15to24, na.rm = TRUE)) %>%
  generate_predictor(time_window=1984:1988,
                     beer = mean(beer, na.rm = TRUE)) %>%
  generate_predictor(time_window=1975,
                     cigsale_1975 = cigsale) %>%
  generate_predictor(time_window=1980,
                     cigsale_1980 = cigsale) %>%
  generate_predictor(time_window=1988,
                     cigsale_1988 = cigsale) %>%
  generate_weights(optimization_window =1970:1988,
                   Margin.ipop=.02,Sigf.ipop=7,Bound.ipop=6) %>%
  generate_control()

smoking_mspe <- smoking_out %>% 
  grab_signficance() 

# Manually run the permutation test ---------------------------------------

smoking_wrap <- function(i_unit, generate_placebos = F) {
  smoking %>%
    synthetic_control(outcome = cigsale,
                      unit = state,
                      time = year,
                      i_unit = i_unit, # iterate the treated unit
                      i_time = 1988,
                      generate_placebos= generate_placebos) %>%
    generate_predictor(time_window=1980:1988,
                       lnincome = mean(lnincome, na.rm = TRUE),
                       retprice = mean(retprice, na.rm = TRUE),
                       age15to24 = mean(age15to24, na.rm = TRUE)) %>%
    generate_predictor(time_window=1984:1988,
                       beer = mean(beer, na.rm = TRUE)) %>%
    generate_predictor(time_window=1975,
                       cigsale_1975 = cigsale) %>%
    generate_predictor(time_window=1980,
                       cigsale_1980 = cigsale) %>%
    generate_predictor(time_window=1988,
                       cigsale_1988 = cigsale) %>%
    generate_weights(optimization_window =1970:1988,
                     Margin.ipop=.02,Sigf.ipop=7,Bound.ipop=6) %>%
    generate_control()
}

units <- unique(smoking$state)

manual_permute <- units %>% 
  map(., smoking_wrap,
      generate_placebos = FALSE) # for faster processing - we are highlighting that 
                                 # the when processed as the treated unit the mspe 
                                 # values are different than when it is permutted

manual_mspe <- manual_permute %>% 
  map_dfr(grab_signficance)

# -------------------------------------------------------------------------

compare <- manual_mspe %>% 
  left_join(smoking_mspe, by = "unit_name", 
            suffix = c(".manual", ".gen_pla")) 

# Big differences in pre_mspe
compare %>% mutate(diff = pre_mspe.manual - pre_mspe.gen_pla) %>% 
  pull(diff) %>% summary()

# California matches
compare %>% 
  filter(unit_name == "California") %>% 
  select(pre_mspe.gen_pla, pre_mspe.manual)

write_csv(compare, "compare.csv")

cc @davidnathanlang

edunford commented 3 years ago

Great point! So internally, the package uses the weights that are optimized for the treatment when computing the placebos. Put differently, it's not re-optimizing for each placebo case. This has obvious efficiency benefits as we only need to perform the nested optimization once, but as you noted, this approach yields a slightly different result from the per donor optimization approach. In my opinion, your example doesn't go far enough. If you were really tuning each placebo as if it were the treated unit, then we'd also want to carefully select each of the predictors we include to yield an optimal pre-period fit (i.e. we'd need to feed it a customized set of predictors for each placebo so that the pre-period trends map as closely as possible onto one another).

I stuck with convention when computing the placebos in the package, but that doesn't mean this is the best approach. I could see adding an argument that allows one to select how the placebos are computed (re-using the initially optimized weights, or re-computing them for each placebo), but this still wouldn't solve the problem for the predictors not being optimally chosen for each placebo fit.

IMO if one is preparing to publish their findings, then I think it makes sense to go through the more arduous task of optimizing each placebo both in terms of predictors and weights and then using those for the reported significance in the paper. Doing it that way would operate as a type of robustness check to see if the reported significance isn't an artifact of the chosen predictors and optimized weights matrix.

williamlief commented 3 years ago

Thank you for the explanation and confirming that this is intended behavior. Your point about results potentially being an artifact of the chosen predictors and weights matrix is very well taken. We are currently investigating that exact possibility.

edunford commented 3 years ago

This is great! Would you be willing to report some of your findings here? I'm quite curious how much this impacts the results of a synthetic control. I'm sure others who might stumble across this thread will be interested as well.

williamlief commented 2 years ago

Our results are out in the Journal of Experimental Political Science - open access link

The multiverse analysis section investigates the sensitivity of synth results to modelling choices. Thanks for putting together this great package!

edunford commented 2 years ago

Amazing and congratulations on the publication!

edunford / tidysynth

`generate_placebos = TRUE` does not match manual permutation #6