edunford / tidysynth

A tidy implementation of the synthetic control method in R
Other
98 stars 14 forks source link

Forming synthetic control without predictors #29

Closed ocramest closed 1 year ago

ocramest commented 1 year ago

Hello and thank you for your time on making this amazing library. Just a quick question: Is it possible to generate the weights and get the synthetic control without using predictor variables? I've seen that this is not an uncommon practice, to only employ the outcome variable in the optimization problem, with the idea that this takes into account all unobserved predictors that cannot be included in the model. Thanks again!

edunford commented 1 year ago

I see what you're getting at. It's not that you wouldn't include predictors, rather you'd only including predictors that are prior states of the outcome in the pre-treatment period. Essentially you just need to define what time points from the outcome in the pretreatment period you want to use in the optimization problem.

require(tidyverse)
require(tidysynth)

data("smoking")

smoking_out <-

  smoking %>%

  # initial the synthetic control object
  synthetic_control(outcome = cigsale, # outcome
                    unit = state, # unit index in the panel data
                    time = year, # time index in the panel data
                    i_unit = "California", # unit where the intervention occurred
                    i_time = 1988, # time period when the intervention occurred
                    generate_placebos=T # generate placebo synthetic controls (for inference)
  ) %>%

  # Generate the aggregate predictors used to fit the weights

  # Lagged cigarette sales 
  generate_predictor(time_window = 1975,
                     cigsale_1975 = cigsale) %>%
  generate_predictor(time_window = 1980,
                     cigsale_1980 = cigsale) %>%
  generate_predictor(time_window = 1988,
                     cigsale_1988 = cigsale) %>%

  # Generate the fitted weights for the synthetic control
  generate_weights(optimization_window = 1970:1988, # time to use in the optimization task
                   margin_ipop = .02,sigf_ipop = 7,bound_ipop = 6 # optimizer options
  ) %>%

  # Generate the synthetic control
  generate_control()

smoking_out %>% plot_trends()

I hope that helps! Also it attach a link/citation to a paper that does this. I haven't seen it done before (but this practice is essentially what makes the method work even when including covariates). I'd love to see an example. Take care!

ocramest commented 1 year ago

Thank you very much. I implemented the following code so I didn't have to repeat the generate_predictor function for each year, just in case someone finds it useful:

require(dplyr)
require(tidysynth)

data("smoking")

time_windows <- c(1970:1988)

smoking_out <-

  smoking %>%

  # initial the synthetic control object
  synthetic_control(outcome = cigsale, # outcome
                    unit = state, # unit index in the panel data
                    time = year, # time index in the panel data
                    i_unit = "California", # unit where the intervention occurred
                    i_time = 1988, # time period when the intervention occurred
                    generate_placebos=T # generate placebo synthetic controls (for inference)
  ) %>%

  # Generate the aggregate predictors used to fit the weights

  # Lagged cigarette sales 
  purrr::reduce(
    .init = .,
    .f = function(df, time_window) {
      df %>%
        generate_predictor(
          time_window = time_window,
          !!sym(paste0("cigsale_", time_window)) := cigsale
        )
    },
    .x = time_windows
  ) %>%

  # Generate the fitted weights for the synthetic control
  generate_weights(optimization_window = 1970:1988, # time to use in the optimization task
                   margin_ipop = .02,sigf_ipop = 7,bound_ipop = 6 # optimizer options
  ) %>%

  # Generate the synthetic control
  generate_control()

smoking_out %>% plot_trends()