"Unbalanced Panel" when groups are different sizes

alecmcclean commented 4 years ago

Hi Guys,

First - thanks a bunch for translating this package to R, I really appreciate it. I just wanted to flag a small issue I've found when using the bacon() function.

It seems that bacon() does not currently allow our groups to be different sizes. I've appended the code to generate a minimal example. In the dataset I create, we have 3 groups (id == 1, 2, 3), where id == 1 | 3 contain one individual, and id == 2 contains two individuals (ind_id is the individual id).

If I run bacon(id_var == "group_id", ...) the function will throw an error for an "Unbalanced Panel", because group 2 has twice as many time periods within it as group 1 (because there are two individuals in group 2).

But, I don't think you want to call that an error; otherwise, you cannot demonstrate 2x2 weighting heterogeneity arising from the size of the groups. And, from what I understand, this is one of the key takeaways of the Bacon decomposition: the larger groups retain higher weights in the 2x2.

Alternatively, if you do want to call that an unbalanced panel, I don't think you need the code calculating "n_k, n_u, n_ku", because n_k = n_u by definition and n_ku = 0.5.

Thanks again, Alec

library(dplyr)

df <- 
  expand.grid(
    group_id = c(1, 2, 3), # Group ID (treatment level ID)
    t  = c(0, 1, 2)  # Time
  ) %>%
  mutate(
    # Treatment status
    a = case_when(
      group_id == 2 & t > 0 ~ 1, # 1 time period untreated 2 periods treated
      group_id == 3 & t > 1 ~ 1, # 2 untreated 1 treated
      T ~ 0 # id == 1 never treated
    )
  )

# Expand dataset with "individual" level observations 
df <- df %>% left_join(
  expand.grid(
    group_id = c(1, 2, 3), 
    ind_id = seq(1, 2)
    ) %>%
    filter(group_id == 2 | ind_id < 2) ## Leave only group id == 2 with two individuals
  ) %>%
  select(group_id, ind_id, everything()) %>%
  arrange(group_id, ind_id, t)

EdJeeOnGitHub commented 4 years ago

Hi Alec,

Sorry for the delay in replying.

We'll get to the bottom of this - it looks like we went a bit over the top sanitising user inputs.

EdJeeOnGitHub commented 4 years ago

This should have been fixed in the latest PR #72 @evanjflack

library(dplyr)
set.seed(938)

df <- 
  expand.grid(
    group_id = c(1, 2, 3), # Group ID (treatment level ID)
    t  = c(0, 1, 2)  # Time
  ) %>%
  mutate(
    # Treatment status
    a = case_when(
      group_id == 2 & t > 0 ~ 1, # 1 time period untreated 2 periods treated
      group_id == 3 & t > 1 ~ 1, # 2 untreated 1 treated
      T ~ 0 # id == 1 never treated
    )
  )

# Expand dataset with "individual" level observations 
df <- df %>% left_join(
  expand.grid(
    group_id = c(1, 2, 3), 
    ind_id = seq(1, 2)
  ) %>%
    filter(group_id == 2 | ind_id < 2) ## Leave only group id == 2 with two individuals
) %>%
  select(group_id, ind_id, everything()) %>%
  arrange(group_id, ind_id, t) %>% 
  mutate(y = rnorm(nrow(.)))

bacon_res <- df %>% 
  bacon(formula = y ~ a,
        id_var = "group_id",
        time_var = "t")

bacon_res

with results:


1 Earlier vs Later Treated    0.2  0.30438
2 Later vs Earlier Treated    0.2  0.12943
3     Treated vs Untreated    0.6 -0.44938

  treated untreated   estimate weight                     type
2       1     99999 -0.8366176    0.4     Treated vs Untreated
3       2     99999  0.3250949    0.2     Treated vs Untreated
6       2         1  0.1294280    0.2 Later vs Earlier Treated
8       1         2  0.3043799    0.2 Earlier vs Later Treated

alecmcclean commented 4 years ago

Great, thank you!

hyeunjung commented 3 years ago

Thank you for this package! I tested using the example code above, but my codes don't go through. I made sure that I have the most updated version of bacondecomp package, but still get an error for an unbalanced error. Could you please check if this fix for an unbalanced panel is reflected in the updated version of bacondecomp package in R?

Thank you so much for your help!

EdJeeOnGitHub commented 3 years ago

Hi,

Did you use the latest version from GitHub or CRAN?

I believe this is fixed on GitHub but looking back at the logs I'm not sure if @evanjflack pushed the patch to CRAN.

If it's broken on GitHub too I'll have another look.

Thanks, Ed

hyeunjung commented 3 years ago

Hi Ed,

Yes, I used the latest version from CRAN. I think the updates were not pushed to CRAN.

I just edited the source code and used it fine, but I think it would be great to push the update.

Thanks, Elina

On Nov 9, 2020, at 3:45 PM, Ed Jee notifications@github.com wrote:

Hi,

Did you use the latest version from GitHub or CRAN?

I believe this is fixed on GitHub but looking back at the logs I'm not sure if @evanjflack https://github.com/evanjflack pushed the patch to CRAN.

If it's broken on GitHub too I'll have another look.

Thanks, Ed

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/evanjflack/bacondecomp/issues/71#issuecomment-724351339, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARWHCY4Z43S4S72RMOQ5M6DSPB5KDANCNFSM4MOI37FA.

PromiseKamanga commented 2 years ago

Following this thread, I got the impression that the error of "unbalanced panel" was already fixed. However, I just downloaded the package from GitHub today and I still got the same error when I tried to use it. The data I am using involves bilateral trade values of multiple countries. As such I have duplicate "country-year" combinations because I observe a country's trade with all its partners in a given year. Could that explain the error? Do you have a suggestion on how I should proceed?

kylebutts commented 2 years ago

Hi @PromiseKamanga, could you open a new issue and write the code you’re trying to run that fails. I’ll be happy to help

ridwandse commented 1 year ago

Hi @EdJeeOnGitHub can you generate the same simulated data set on STATA and post the codes here or share this data generated in R here, I just want to see whether STATA's ddtiming gives me the same diff-in-diff estimate with same DD comparisons and weights. Just curious to learn. Thanks

EdJeeOnGitHub commented 1 year ago

Hi @ridwandse,

The code here will provide the exact same dataset since the seed has been set.

Something like write.csv(df, "my-df.csv") will save the file for loading into Stata

ridwandse commented 1 year ago

Thanks, @EdJeeOnGitHub, will follow up on the same. Actually i have unbalanced data and STATA's bacondecomp Y D, ddetail does not work with unbalanced data, it requires data to be strongly balanced. However, another way of obtaining the bacondecomposition is to use ddtiminng i.e., ddtiming Y D, i(id) t(year). This works with unbalanced case. I am not sure whether to proceed with bacondecomp in balanced case or ddtiming with unbalanced data. If you have any leads on that. Please guide through. Thanks

kylebutts commented 1 year ago

@ridwandse I think this is incorrect. Because something "runs" and spits out numbers does not mean it "works". The weights it reports are not correct. The bacon decomposition holds only in the strongly balanced case (it's an algebraic relationship between the TWFE OLS coefficient and a bunch of different averages).

In the unbalanced case, you can calculate the weights by hand (it's a bunch of n's basically) which is what ddtiming does. The weights do not mean anything though

ridwandse commented 1 year ago

Thank you @kylebutts , this was very useful. Yes you are right. I also have calculated all the DD comparison weights by hand as a combination of group size and treatment indicator (D) averages over i and t, we get the same results as using ddtiming Thanks

evanjflack / bacondecomp

"Unbalanced Panel" when groups are different sizes #71