edunford / tidysynth

A tidy implementation of the synthetic control method in R
Other
97 stars 14 forks source link

Odd errors that only occur in certain cases - synthetic control estimation of South Africa entrance into BRIC's effects #32

Closed PedroMilreuCunha closed 2 months ago

PedroMilreuCunha commented 7 months ago

Hello!

I am having a bit of trouble with your package. For clarity, I am trying to check the impact of South Africa's entrance into BRIC(S) in 2010 on several outcomes. The data I have is as follows (the complete data is in the attached file): Dataset

'data.frame':   1044 obs. of  16 variables:
 $ country_code             : chr  "AUS" "AUS" "AUS" "AUS" ...
 $ year                     : num  2002 2003 2004 2005 2006 ...
 $ gdp_per_capita           : num  47535 48453 49959 50911 51605 ...
 $ gdp_growth               : num  3.99 3.11 4.22 3.15 2.74 3.78 3.57 1.87 2.21 2.39 ...
 $ imports                  : num  117 132 150 169 183 ...
 $ exports                  : num  171 172 174 180 185 ...
 $ trade_openness           : num  40.3 37.1 39.3 41.6 42.1 ...
 $ population               : num  1.95e+08 1.97e+08 1.99e+08 2.02e+08 2.05e+08 ...
 $ mortality_rate           : num  6 5.9 5.8 5.7 5.6 5.4 5.2 5 4.8 4.5 ...
 $ life_expectancy          : num  79.9 80.2 80.5 80.8 81 81.3 81.4 81.5 81.7 81.9 ...
 $ inflation                : num  3 2.7 2.3 2.7 3.6 2.3 4.4 1.8 2.9 3.3 ...
 $ foreign_direct_investment: num  -7.68e+10 9.61e+10 -3.31e+11 -7.62e+10 -6.47e+10 ...
 $ gross_capital_formation  : num  24.4 25.9 27.1 27.5 27.5 27.5 28.6 27.4 26.8 26.5 ...
 $ country_name             : chr  "Australia" "Australia" "Australia" "Australia" ...
 $ imports_bric             : num  10.4 13.6 19 22.6 25.4 ...
 $ exports_bric             : num  6.78 9.08 13.33 19.24 22.99 ...

When I run the code for the effects on exports and imports (below), it works normally:

library(tidyverse)
library(tidysynth)

# Import panel with complete data ---- 

df <- readRDS("data/cleaned_data/complete_data.rds") %>%
      as.data.frame()

# 1. Effect on exports ---- good evidence of a structural break in the series due to the 2008-2009 crisis

synthetic_out_exports <- df %>% 
    synthetic_control(
    outcome = exports,
    unit = country_name,
    time = year,
    i_unit = "South Africa",
    i_time = 2010,
    generate_placebos = TRUE) %>%
    generate_predictor( # Lagged dependent variable
        time_window = 2002:2009,
        exports = mean(exports, na.rm = TRUE)
    ) %>%
    generate_weights(optimization_window = 2002:2009, optimization_method = "All") %>%
    generate_control()

synthetic_out_exports %>% plot_trends()
synthetic_out_exports %>% plot_differences()
synthetic_out_exports %>% plot_weights()
synthetic_out_exports %>% plot_placebos()
synthetic_out_exports %>% plot_mspe_ratio()
synthetic_out_exports %>% grab_balance_table()
synthetic_out_exports %>% grab_significance()

# 2. Effect on imports ---- good evidence of a structural break in the series due to the 2008-2009 crisis

synthetic_out_imports <- df %>% 
    synthetic_control(
        outcome = imports,
        unit = country_name,
        time = year,
        i_unit = "South Africa",
        i_time = 2010,
        generate_placebos = TRUE) %>%
    generate_predictor( # Lagged dependent variable
        time_window = 2002:2009,
        imports = mean(imports, na.rm = TRUE)
    ) %>%
    generate_weights(optimization_window = 2002:2009, optimization_method = "All") %>%
    generate_control()

synthetic_out_imports %>% plot_trends()
synthetic_out_imports %>% plot_differences()
synthetic_out_imports %>% plot_weights()
synthetic_out_imports %>% plot_placebos()
synthetic_out_imports %>% plot_mspe_ratio()
synthetic_out_imports %>% grab_balance_table()
synthetic_out_imports %>% grab_significance()

As I mentioned, this code works just fine. However, once I move on to any of the other variables, such as gdp_per_capita, using the same code:

# 3. Effect on gdp per capita ---- 

synthetic_out_gdp_per_capita <- df %>% 
    synthetic_control(
        outcome = gdp_per_capita,
        unit = country_code,
        time = year,
        i_unit = "South Africa",
        i_time = 2010,
        generate_placebos = TRUE) %>%
    generate_predictor( # Lagged dependent variable
        time_window = 2002:2009,
        gdp_per_capita = mean(gdp_per_capita, na.rm = TRUE)
    ) %>%
    generate_weights(optimization_window = 2002:2009, optimization_method = "All") %>%
    generate_control()

synthetic_out_gdp_per_capita %>% plot_trends()
synthetic_out_gdp_per_capita %>% plot_differences()
synthetic_out_gdp_per_capita %>% plot_weights()
synthetic_out_gdp_per_capita %>% plot_placebos()
synthetic_out_gdp_per_capita %>% plot_mspe_ratio()
synthetic_out_gdp_per_capita %>% grab_balance_table()
synthetic_out_gdp_per_capita %>% grab_significance()

I get the errors:

> synthetic_out_gdp_per_capita %>% plot_trends()
Error in `dplyr::filter()`:
ℹ In argument: `time_unit %in% time_window`.
Caused by error in `time_unit %in% time_window`:
! object 'time_unit' not found
Run `rlang::last_trace()` to see where the error occurred.

> synthetic_out_gdp_per_capita %>% plot_differences()
Error in `dplyr::mutate()`:
ℹ In argument: `diff = real_y - synth_y`.
Caused by error:
! object 'real_y' not found
Run `rlang::last_trace()` to see where the error occurred.

> synthetic_out_gdp_per_capita %>% plot_weights()
Error in `dplyr::rename()`:
! Can't rename columns that don't exist.
✖ Column `variable` doesn't exist.
Run `rlang::last_trace()` to see where the error occurred.

> synthetic_out_gdp_per_capita %>% plot_placebos()
Error in `dplyr::filter()`:
ℹ In argument: `sqrt(pre_mspe) <= thres * 2`.
Caused by error:
! `..1` must be of size 19 or 1, not size 0.
Run `rlang::last_trace()` to see where the error occurred.

> synthetic_out_gdp_per_capita %>% plot_mspe_ratio()
(works, but outputs a figure with only "Donor" category, also uploaded as file in the issue)

>  synthetic_out_gdp_per_capita %>% grab_balance_table()
Error in `tidyr::gather()`:
! Can't subset columns that don't exist.
✖ Column `variable` doesn't exist.
Run `rlang::last_trace()` to see where the error occurred.

> synthetic_out_gdp_per_capita %>% grab_significance()
(works, but I can see on the table that South Africa is being treated as donor and there are no treated)

Problematic plot

The errors seem to be related to the masking of dplyr functions and some other problem that's keeping South Africa from being properly recognized as treated.

What do you think? I'd appreciate some help here.

Thank you very much.

Congratz on the great package!

Kind regards, Pedro Cunha

edunford commented 2 months ago

Hi @PedroMilreuCunha! My goodness! So sorry for the delayed response. I'm sure you've already figured this out by now, but just for anyone in the future that comes across this issue post and is running into a similar issue.

The problem with your third variation/example has to do with the argument you're supplying to unit =. In that example, you're supplying country_code rather than country_name. There is no "South Africa" in your data for the country_code variable. So either you need to change your unit to country_name as you do in your other examples, or you need to supply the correct intervention unit name to the i_unit = argument, which in this case would be i_unit = "ZAF".

Admittedly, the fact that tooling doesn't throw you an error that lets you know that you've made that mistake is not useful. The errors that do emerge is when we're referencing variables that don't exist yet. This is a good shout out that better error messaging needs to be incorporated into the existing version of the package to make these kinds of mistakes easier to identify and correct.

edunford commented 2 months ago

FYI I spun up #33 which specifies the ask underlying this issue.