ebenmichael / augsynth

Augmented Synthetic Control Method
MIT License
147 stars 52 forks source link

Bug when time_cohort = T #66

Open jtorcasso opened 2 years ago

jtorcasso commented 2 years ago

I get a subscript out of bounds error when I restrict to time_cohort = T:

ppool_syn <- multisynth(y ~ treat, zip, t, 
    fit.data, lambda=0.4, n_leads=13, n_lags=18, time_cohort=T)

Error in donors[[j]]: subscript out of bounds
Traceback:

1. multisynth(lops ~ ssd, zip, t, fit.data, lambda = 0.4, n_leads = 13, 
 .     n_lags = 18, time_cohort = T)
2. multisynth_formatted(wide = wide, relative = T, n_leads = n_leads, 
 .     n_lags = n_lags, nu = nu, lambda = lambda, V = V, force = force, 
 .     n_factors = n_factors, scm = scm, time_cohort = time_cohort, 
 .     time_w = F, lambda_t = 0, fit_resids = TRUE, eps_abs = eps_abs, 
 .     eps_rel = eps_rel, verbose = verbose, long_df = long_df, 
 .     how_match = how_match, ...)   # at line 90-98 of file <text>
3. multisynth_qp(X = bal_mat, trt = wide$trt, mask = wide$mask, 
 .     Z = wide$Z[, !colnames(wide$Z) %in% wide$match_covariates, 
 .         drop = F], n_leads = n_leads, n_lags = n_lags, relative = relative, 
 .     nu = 0, lambda = lambda, V = V, time_cohort = time_cohort, 
 .     donors = donors, eps_rel = eps_rel, eps_abs = eps_abs, verbose = verbose)   # at line 263-278 of file <text>
4. lapply(1:nrow(mask), function(j) X[[j]][donors[[j]], mask[j, 
 .     ] == 1, drop = F])   # at line 541-542 of file <text>
5. FUN(X[[i]], ...)

The error occurs in multisynth_qp, in the following block:

    ## handle X differently if it is a list
    if(typeof(X) == "list") {
        x_t <- lapply(1:J, function(j) colSums(X[[j]][which_t[[j]], mask[j,]==1, drop=F]))

        # Xc contains pre-treatment data for valid donor units
        Xc <- lapply(1:nrow(mask),
                 function(j) X[[j]][donors[[j]], mask[j,]==1, drop=F])

        # std dev of outcomes for first treatment time
        sdx <- sd(X[[1]][is.finite(trt)])
    } else {
        x_t <- lapply(1:J, function(j) colSums(X[which_t[[j]], mask[j,]==1, drop=F]))        

        # Xc contains pre-treatment data for valid donor units
        Xc <- lapply(1:nrow(mask),
                 function(j) X[donors[[j]], mask[j,]==1, drop=F])

        # std dev of outcomes
        sdx <- sd(X[is.finite(trt)])
    }

If I print the dimension of mask and donors, I get (21, 38) and (17,), respectively. So obviously, there will be no data for j=18,..,21 in donors, and this is why we get an indexing error. If I look more closely at mask, there are four rows with NAs (in just the last 6 columns). When I set time_cohort=F, the dimensions are (1580, 38) and (1580,), and we don't have this issue, however, the problem is too large in this case.

ebenmichael commented 2 years ago

Is the panel balanced? I.e. Do you have measured outcomes for all units at all times? If not, that might be what's causing this.

mikeguggis commented 8 months ago

I had the same problem. It was fixed by ensuring I had a balanced panel. It would be nice to have a pre-processing check for panel balance.