Open alecmcclean opened 4 years ago
Hi Alec,
Sorry for the delay in replying.
We'll get to the bottom of this - it looks like we went a bit over the top sanitising user inputs.
This should have been fixed in the latest PR #72 @evanjflack
library(dplyr)
set.seed(938)
df <-
expand.grid(
group_id = c(1, 2, 3), # Group ID (treatment level ID)
t = c(0, 1, 2) # Time
) %>%
mutate(
# Treatment status
a = case_when(
group_id == 2 & t > 0 ~ 1, # 1 time period untreated 2 periods treated
group_id == 3 & t > 1 ~ 1, # 2 untreated 1 treated
T ~ 0 # id == 1 never treated
)
)
# Expand dataset with "individual" level observations
df <- df %>% left_join(
expand.grid(
group_id = c(1, 2, 3),
ind_id = seq(1, 2)
) %>%
filter(group_id == 2 | ind_id < 2) ## Leave only group id == 2 with two individuals
) %>%
select(group_id, ind_id, everything()) %>%
arrange(group_id, ind_id, t) %>%
mutate(y = rnorm(nrow(.)))
bacon_res <- df %>%
bacon(formula = y ~ a,
id_var = "group_id",
time_var = "t")
bacon_res
with results:
1 Earlier vs Later Treated 0.2 0.30438
2 Later vs Earlier Treated 0.2 0.12943
3 Treated vs Untreated 0.6 -0.44938
treated untreated estimate weight type
2 1 99999 -0.8366176 0.4 Treated vs Untreated
3 2 99999 0.3250949 0.2 Treated vs Untreated
6 2 1 0.1294280 0.2 Later vs Earlier Treated
8 1 2 0.3043799 0.2 Earlier vs Later Treated
Great, thank you!
Thank you for this package! I tested using the example code above, but my codes don't go through. I made sure that I have the most updated version of bacondecomp package, but still get an error for an unbalanced error. Could you please check if this fix for an unbalanced panel is reflected in the updated version of bacondecomp package in R?
Thank you so much for your help!
Hi,
Did you use the latest version from GitHub or CRAN?
I believe this is fixed on GitHub but looking back at the logs I'm not sure if @evanjflack pushed the patch to CRAN.
If it's broken on GitHub too I'll have another look.
Thanks, Ed
Hi Ed,
Yes, I used the latest version from CRAN. I think the updates were not pushed to CRAN.
I just edited the source code and used it fine, but I think it would be great to push the update.
Thanks, Elina
On Nov 9, 2020, at 3:45 PM, Ed Jee notifications@github.com wrote:
Hi,
Did you use the latest version from GitHub or CRAN?
I believe this is fixed on GitHub but looking back at the logs I'm not sure if @evanjflack https://github.com/evanjflack pushed the patch to CRAN.
If it's broken on GitHub too I'll have another look.
Thanks, Ed
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/evanjflack/bacondecomp/issues/71#issuecomment-724351339, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARWHCY4Z43S4S72RMOQ5M6DSPB5KDANCNFSM4MOI37FA.
Following this thread, I got the impression that the error of "unbalanced panel" was already fixed. However, I just downloaded the package from GitHub today and I still got the same error when I tried to use it. The data I am using involves bilateral trade values of multiple countries. As such I have duplicate "country-year" combinations because I observe a country's trade with all its partners in a given year. Could that explain the error? Do you have a suggestion on how I should proceed?
Hi @PromiseKamanga, could you open a new issue and write the code you’re trying to run that fails. I’ll be happy to help
Hi @EdJeeOnGitHub can you generate the same simulated data set on STATA and post the codes here or share this data generated in R here, I just want to see whether STATA's ddtiming gives me the same diff-in-diff estimate with same DD comparisons and weights. Just curious to learn. Thanks
Hi @ridwandse,
The code here will provide the exact same dataset since the seed has been set.
Something like write.csv(df, "my-df.csv")
will save the file for loading into Stata
Thanks, @EdJeeOnGitHub, will follow up on the same.
Actually i have unbalanced data and STATA's bacondecomp Y D, ddetail
does not work with unbalanced data, it requires data to be strongly balanced. However, another way of obtaining the bacondecomposition is to use ddtiminng i.e., ddtiming Y D, i(id) t(year)
. This works with unbalanced case. I am not sure whether to proceed with bacondecomp in balanced case or ddtiming with unbalanced data. If you have any leads on that. Please guide through.
Thanks
@ridwandse I think this is incorrect. Because something "runs" and spits out numbers does not mean it "works". The weights it reports are not correct. The bacon decomposition holds only in the strongly balanced case (it's an algebraic relationship between the TWFE OLS coefficient and a bunch of different averages).
In the unbalanced case, you can calculate the weights by hand (it's a bunch of n's basically) which is what ddtiming does. The weights do not mean anything though
Thank you @kylebutts , this was very useful. Yes you are right. I also have calculated all the DD comparison weights by hand as a combination of group size and treatment indicator (D) averages over i and t, we get the same results as using ddtiming Thanks
Hi Guys,
First - thanks a bunch for translating this package to R, I really appreciate it. I just wanted to flag a small issue I've found when using the bacon() function.
It seems that bacon() does not currently allow our groups to be different sizes. I've appended the code to generate a minimal example. In the dataset I create, we have 3 groups (id == 1, 2, 3), where id == 1 | 3 contain one individual, and id == 2 contains two individuals (ind_id is the individual id).
If I run bacon(id_var == "group_id", ...) the function will throw an error for an "Unbalanced Panel", because group 2 has twice as many time periods within it as group 1 (because there are two individuals in group 2).
But, I don't think you want to call that an error; otherwise, you cannot demonstrate 2x2 weighting heterogeneity arising from the size of the groups. And, from what I understand, this is one of the key takeaways of the Bacon decomposition: the larger groups retain higher weights in the 2x2.
Alternatively, if you do want to call that an unbalanced panel, I don't think you need the code calculating "n_k, n_u, n_ku", because n_k = n_u by definition and n_ku = 0.5.
Thanks again, Alec