edunford / tidysynth

A tidy implementation of the synthetic control method in R
Other
98 stars 14 forks source link

how to use year and month as time index #30

Closed schneiderpy closed 1 year ago

schneiderpy commented 1 year ago

Hello Eric,

I am struggling how to use year and month as time index in the synthetic_control(), since (as it seems to me) it just takes numeric values. Thank you

ocramest commented 1 year ago

Hello Mr.,

In order to overcome this problem, what I did was to create a vector of integers from 1 to the last month of my sample and use it as the time index. Then I just replaced the numbers with the dates to present my results, according to the order assigned in the time index vector.

schneiderpy commented 1 year ago

Thank you

edunford commented 1 year ago

Apologies on the delay, @schneiderpy! @ocramest has the right idea. You can overcome this issue by generating an index, but it's even easier to just combine your dates by creating a new variable.

See below re: what i mean. The key is to make sure your date strings are entered as Date classes using as. Date().

require(tidyverse)
require(tidysynth)

data("smoking")

# Here let's simulate "as-if" conditions of you having a month field 
# This month is fixed, but this would work even if it varied.
new_smoking <- 
  smoking %>% 
  mutate(month = "jan") %>% 
  mutate(Date = ymd(paste(year,month,1,sep="-"))) 

# We can use dates just as easily as fixed integers (like years)
output <- 

  new_smoking %>%

  # initial the synthetic control object
  synthetic_control(outcome = cigsale, # outcome
                    unit = state, # unit index in the panel data
                    time = Date, # time index in the panel data
                    i_unit = "California", # unit where the intervention occurred
                    i_time = as.Date("1988-01-01"), # time period when the intervention occurred
                    generate_placebos=T # generate placebo synthetic controls (for inference)
  ) %>%

  # Generate the aggregate predictors used to fit the weights

  # average log income, retail price of cigarettes, and proportion of the
  # population between 15 and 24 years of age from 1980 - 1988
  generate_predictor(time_window = as.Date("1980-01-01"):as.Date("1988-01-01"),
                     ln_income = mean(lnincome, na.rm = T),
                     ret_price = mean(retprice, na.rm = T),
                     youth = mean(age15to24, na.rm = T)) %>%

  # average beer consumption in the donor pool from 1984 - 1988
  generate_predictor(time_window = as.Date("1984-01-01"):as.Date("1988-01-01"),
                     beer_sales = mean(beer, na.rm = T)) %>%

  # Lagged cigarette sales 
  generate_predictor(time_window = as.Date("1975-01-01"),
                     cigsale_1975 = cigsale) %>%
  generate_predictor(time_window = as.Date("1980-01-01"),
                     cigsale_1980 = cigsale) %>%
  generate_predictor(time_window = as.Date("1988-01-01"),
                     cigsale_1988 = cigsale) %>%

  # Generate the fitted weights for the synthetic control
  generate_weights(optimization_window = as.Date("1970-01-01"):as.Date("1988-01-01"), 
                   margin_ipop = .02,sigf_ipop = 7,bound_ipop = 6 # optimizer options
  ) %>%

  # Generate the synthetic control
  generate_control()

# Here we can plot things to see that the dates carry through without issue. 
output %>%  plot_trends()

I hope this helps!

schneiderpy commented 1 year ago

Thank you Eric I solved this issue some days ago. My question appeared since I had already a class(Date) for my year_month column (mutated with lubridate), but tidysynth did not work ... with the help of the Internet I found a solution and converted my Date(class) column with as.Date again to class(Date) and it work fine. However, I appreciate your answer. Thank you