asheshrambachan / HonestDiD

Robust inference in difference-in-differences and event study designs
Other
171 stars 45 forks source link

constructOriginalCS does not give original confidence intervals + guidance on event studies with controls #33

Closed michaeltopper1 closed 1 year ago

michaeltopper1 commented 1 year ago

1) Apologies if this is an incorrect interpretation on the desired behavior, but I believe that constructOriginalCS does not quite give the original confidence interval of the original event study passed in. Below is an example using data and calculations from the README:

library(tidyverse)
library(fixest)
library(broom)
library(haven)

df <- haven::read_dta("https://raw.githubusercontent.com/Mixtape-Sessions/Advanced-DID/main/Exercises/Data/ehec_data.dta")

#Keep years before 2016. Drop the 2016 cohort
df_nonstaggered <- df %>% filter(year < 2016 & 
                                   (is.na(yexp2)| yexp2 != 2015) )

#Create a treatment dummy
df_nonstaggered <- df_nonstaggered %>% mutate(D = case_when( yexp2 == 2014 ~ 1,
                                                             T ~ 0)) 

#Run the TWFE spec
twfe_results <- fixest::feols(dins ~ i(year, D, ref = 2013) | stfips + year, 
                              cluster = "stfips",
                              data = df_nonstaggered)

betahat <- summary(twfe_results)$coefficients #save the coefficients
sigma <- summary(twfe_results)$cov.scaled #save the covariance matrix

originalResults <- HonestDiD::constructOriginalCS(betahat = betahat,
                                                  sigma = sigma,
                                                  numPrePeriods = 4,
                                                  numPostPeriods = 3)

Note that I changed the numPrePeriods to 4 andnumPostPeriods to 3 for this example. My understanding is that this will give the original confidence interval (the one specified in the event study) for year 2012. Now when I compare this to the original confidence intervals using broom::tidy(conf.int = T) , I get slightly different results.

twfe_results %>% 
  broom::tidy(conf.int = T) %>% 
  slice(5)

While I do not think this is likely a problem when it comes to analysis, I do think it causes some confusion when trying to understand the package. For instance, I was trying to perform sensitivity analysis on different pre/post-treatment periods (in this case, year 2012), and wanted to create a sensitivity plot for each. However, this discrepancy had (has) me worried that I am misinterpreting how to use the package and incorrectly specifying which period I am testing.

2) I am curious how to use this package when adding controls to an event study. Here is an example using the data from above, but adding in a fake control column:

df <- haven::read_dta("https://raw.githubusercontent.com/Mixtape-Sessions/Advanced-DID/main/Exercises/Data/ehec_data.dta")

#Keep years before 2016. Drop the 2016 cohort
df_nonstaggered <- df %>% filter(year < 2016 & 
                                   (is.na(yexp2)| yexp2 != 2015) )

#Create a treatment dummy
df_nonstaggered <- df_nonstaggered %>% mutate(D = case_when( yexp2 == 2014 ~ 1,
                                                             T ~ 0)) 

# CREATING A FAKE CONTROL HERE
df_nonstaggered <- df_nonstaggered %>% 
  mutate(control = rnorm(344, 0, 1))

#Run the TWFE spec
twfe_results <- fixest::feols(dins ~ i(year, D, ref = 2013) + control | stfips + year, 
                              cluster = "stfips",
                              data = df_nonstaggered)

betahat <- summary(twfe_results)$coefficients #save the coefficients
sigma <- summary(twfe_results)$cov.scaled #save the covariance matrix

# PERFORMING THE SENSITIVITY ANALYSIS

HonestDiD::createSensitivityResults(betahat = betahat,
                                    sigma = sigma,
                                    numPrePeriods = 5,
                                    numPostPeriods = 2,
                                    Mvec = seq(from = 0, to = 0.05, by =0.01))

Note that performing the createSensitivityResults function will cause an error here, which I think is because of the size of the matrices that are passed into the arguments sigma and betahat. I omitted any information from the controls as such and did not receive an error message:

## FIXING THE COEFFICIENTS AND STANDARD ERROR MATRICES
betahat <- summary(twfe_results)$coefficients[1:7] #save the coefficients
sigma <- summary(twfe_results)$cov.scaled[1:7,1:7] #save the covariance matrix

HonestDiD::createSensitivityResults(betahat = betahat,
                                    sigma = sigma,
                                    numPrePeriods = 5,
                                    numPostPeriods = 2,
                                    Mvec = seq(from = 0, to = 0.05, by =0.01))

I am wondering if this is the process that should be taken to correctly do this analysis. If so, is it possible that the function could check for controls and omit them?

Thanks!

jonathandroth commented 1 year ago

Thanks for your Qs.

Re 1), constructOriginalCS constructs an asymptotic CI of the form estimate +/- 1.96 * se. The tidy::broom function appears to use a critical t-value, which in this example appears to be 2.02 instead of 1.96. This leads to very minor discrepancies (at least for me).

Re 2), yes, beta and Sigma need to be the event-study estimates and their variance-covariance matrix. If your original coefficients vector and vcv matrix contain control variables, you should subset to the coefficients used for the event-study before passing to HonestDiD.

Best, J

On Wed, Jul 19, 2023 at 1:40 PM Michael Topper @.***> wrote:

1.

Apologies if this is an incorrect interpretation on the desired behavior, but I believe that constructOriginalCS does not quite give the original confidence interval of the original event study passed in. Below is an example using data and calculations from the README:

library(tidyverse) library(fixest) library(broom) library(haven)

df <- haven::read_dta("https://raw.githubusercontent.com/Mixtape-Sessions/Advanced-DID/main/Exercises/Data/ehec_data.dta")

Keep years before 2016. Drop the 2016 cohort

df_nonstaggered <- df %>% filter(year < 2016 & (is.na(yexp2)| yexp2 != 2015) )

Create a treatment dummy

df_nonstaggered <- df_nonstaggered %>% mutate(D = case_when( yexp2 == 2014 ~ 1, T ~ 0))

Run the TWFE spec

twfe_results <- fixest::feols(dins ~ i(year, D, ref = 2013) | stfips + year, cluster = "stfips", data = df_nonstaggered)

betahat <- summary(twfe_results)$coefficients #save the coefficients sigma <- summary(twfe_results)$cov.scaled #save the covariance matrix

originalResults <- HonestDiD::constructOriginalCS(betahat = betahat, sigma = sigma, numPrePeriods = 4, numPostPeriods = 3)

Note that I changed the numPrePeriods to 4 andnumPostPeriods to 3 for this example. My understanding is that this will give the original confidence interval (the one specified in the event study) for year 2012. Now when I compare this to the original confidence intervals using broom::tidy(conf.int = T) , I get slightly different results.

twfe_results %>% broom::tidy(conf.int = T) %>% slice(5)

While I do not think this is likely a problem when it comes to analysis, I do think it causes some confusion when trying to understand the package. For instance, I was trying to perform sensitivity analysis on different pre/post-treatment periods (in this case, year 2012), and wanted to create a sensitivity plot for each. However, this discrepancy had (has) me worried that I am misinterpreting how to use the package and incorrectly specifying which period I am testing.

1.

I am curious how to use this package when adding controls to an event study. Here is an example using the data from above, but adding in a fake control column:

df <- haven::read_dta("https://raw.githubusercontent.com/Mixtape-Sessions/Advanced-DID/main/Exercises/Data/ehec_data.dta")

Keep years before 2016. Drop the 2016 cohort

df_nonstaggered <- df %>% filter(year < 2016 & (is.na(yexp2)| yexp2 != 2015) )

Create a treatment dummy

df_nonstaggered <- df_nonstaggered %>% mutate(D = case_when( yexp2 == 2014 ~ 1, T ~ 0))

CREATING A FAKE CONTROL HERE

df_nonstaggered <- df_nonstaggered %>% mutate(control = rnorm(344, 0, 1))

Run the TWFE spec

twfe_results <- fixest::feols(dins ~ i(year, D, ref = 2013) + control | stfips + year, cluster = "stfips", data = df_nonstaggered)

betahat <- summary(twfe_results)$coefficients #save the coefficients sigma <- summary(twfe_results)$cov.scaled #save the covariance matrix

PERFORMING THE SENSITIVITY ANALYSIS

HonestDiD::createSensitivityResults(betahat = betahat, sigma = sigma, numPrePeriods = 5, numPostPeriods = 2, Mvec = seq(from = 0, to = 0.05, by =0.01))

Note that performing the createSensitivityResults function will cause an error here, which I think is because of the size of the matrices that are passed into the arguments sigma and betahat. I omitted any information from the controls as such and did not receive an error message:

FIXING THE COEFFICIENTS AND STANDARD ERROR MATRICES

betahat <- summary(twfe_results)$coefficients[1:7] #save the coefficients sigma <- summary(twfe_results)$cov.scaled[1:7,1:7] #save the covariance matrix

HonestDiD::createSensitivityResults(betahat = betahat, sigma = sigma, numPrePeriods = 5, numPostPeriods = 2, Mvec = seq(from = 0, to = 0.05, by =0.01))

I am wondering if this is the process that should be taken to correctly do this analysis. If so, is it possible that the function could check for controls and omit them?

Thanks!

— Reply to this email directly, view it on GitHub https://github.com/asheshrambachan/HonestDiD/issues/33, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE6EXFCTPRLTX2NGP6K2FV3XRALZLANCNFSM6AAAAAA2QINC3U . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jonathandroth commented 1 year ago

PS I added a section to the README discussing how to do things with controls, using this example. So thanks for bringing this up!

michaeltopper1 commented 1 year ago

Awesome! Thanks so much for the help and quick response.