edunford / tidysynth

A tidy implementation of the synthetic control method in R
Other
98 stars 14 forks source link

Small but inconvenient bug in tidysynth when using only 1 pre-treatment period #9

Closed hjuerges closed 2 years ago

hjuerges commented 2 years ago

Thanks for writing this great package. I started to change my teaching material from Synth to tidysynth but then encountered a small issue. For demonstration I use a small selection of the smoking data (with Colorado and Utah as donors and 1988 cigsale and age15to24 as predictors). Here are the v and w weights when using Synth. (I attach the full replicable R syntax in this file):

$tab.v v.weights cigsale 1
age15to24 0

$tab.w w.weights unit.names unit.numbers 2 0.886 2 2 3 0.114 3 3


This only replicates with tidysynth if I drop all data from before 1988 from the data.frame.

grab_unit_weights(df_out) A tibble: 2 x 2 unit weight

1 Colorado 0.886 2 Utah 0.114 grab_predictor_weights(df_out) A tibble: 2 x 2 variable weight 1 age15to24_1988 1.30e-25 2 cigsale_1988 1 e+ 0 That does not allow me to plot the data from 1970 onward, only from 1988 onward. --- Using the full data (to be able to draw the graph across the entire range) yields a (wrong) solution that ignores pre-treatment cigsale: grab_unit_weights(df_out) A tibble: 2 x 2 unit weight 1 Colorado 0.668 2 Utah 0.332 grab_predictor_weights(df_out) A tibble: 2 x 2 variable weight 1 age15to24_1988 1.00e+ 0 2 cigsale_1988 6.83e-14 This feels like a minor glitch and I hope you can fix it. Kind regards Hendrik
edunford commented 2 years ago

Thank you for pointing this out! This was actually a really subtle bug that took me a little bit to sort through. The issue was due to how the donor units are ordered when generating the predictor matrix. That ordering wasn't guaranteed on different subsets of the input data (e.g. if Utah comes before Colorado when generating the matrix). The difference in the ordering was driving the discrepancy. I've now implemented logic that guarantees a consistent ordering, despite how the data is subsetted (f27859a6).

That said, I think this is a prime example of how fragile this method is. Given minor changes in the ordering in of the predictor matrix, the weights will change. Though this doesn't substantively change the pretreatment fit, it's something to keep in mind when implementing the method. Thanks again for the detailed description of the bug! Really helpful.

edunford commented 2 years ago

Note: the dev version of the package contains the necessary fix. I'll push an updated version of the package to CRAN later this month.