DeclareDesign / estimatr

estimatr: Fast Estimators for Design-Based Inference
https://declaredesign.org/r/estimatr
Other
131 stars 20 forks source link

FE case where clustered se_type = "stata" SEs do not agree with stata #337

Open graemeblair opened 4 years ago

graemeblair commented 4 years ago

Stata SE's don't agree in the below case with two-way fixed effects when I expected them to. If this is expected feel free to close. Data is RDS saved from R from repl.dta read in via code below. repl.RDS.zip

Agrees w/o fixed effects:

 . use repl.dta
 . reg conflict dlpi_lev, robust cluster(ccode2)

 Linear regression                               Number of obs     =      2,425
 F(1, 96)          =      10.80
 Prob > F          =     0.0014
 R-squared         =     0.0021
 Root MSE          =     .38377

 (Std. Err. adjusted for 97 clusters in ccode2)
 ------------------------------------------------------------------------------
   |               Robust
 conflict |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
 -------------+----------------------------------------------------------------
   dlpi_lev |  -.0182238    .005546    -3.29   0.001    -.0292326   -.0072151
 _cons |   .1827658   .0307032     5.95   0.000     .1218203    .2437113
 ------------------------------------------------------------------------------
repl <- haven::read_dta("repl.dta")

lm_robust(conflict ~ dlpi_lev, clusters = ccode2, se_type = "stata", data = repl)
                Estimate  Std. Error   t value     Pr(>|t|)   CI Lower     CI Upper DF
 (Intercept)  0.18276580 0.030703249  5.952653 4.305921e-08  0.1218203  0.243711268 96
 dlpi_lev    -0.01822383 0.005546027 -3.285925 1.420160e-03 -0.0292326 -0.007215052 96

Does not agree when we add two way FE:

 . tsset ccode2 year
 . xtreg conflict dlpi_lev yr*, fe robust cluster(ccode2)
 note: yr25 omitted because of collinearity

 Fixed-effects (within) regression               Number of obs     =      2,425
 Group variable: ccode2                          Number of groups  =         97

 R-sq:                                           Obs per group:
   within  = 0.0364                                         min =         25
 between = 0.0027                                         avg =       25.0
 overall = 0.0159                                         max =         25

 F(25,96)          =       1.20
 corr(u_i, Xb)  = 0.0042                         Prob > F          =     0.2602

 (Std. Err. adjusted for 97 clusters in ccode2)
 ------------------------------------------------------------------------------
   |               Robust
 conflict |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
 -------------+----------------------------------------------------------------
   dlpi_lev |  -.0108859   .0133478    -0.82   0.417    -.0373811    .0156093
 (year FE omitted)
 _cons |   .2603708   .0315889     8.24   0.000     .1976673    .3230742
 -------------+----------------------------------------------------------------
   sigma_u |  .29765047
 sigma_e |  .24593433
 rho |  .59428563   (fraction of variance due to u_i)
 ------------------------------------------------------------------------------
lm_robust(conflict ~ dlpi_lev, fixed_effects = ~ year + ccode2, clusters = ccode2, se_type = "stata", data = repl)
             Estimate Std. Error    t value Pr(>|t|)  CI Lower  CI Upper       DF
 dlpi_lev -0.01088586 0.01362317 -0.7990694 0.4262228 -0.03792764 0.01615592 96
nfultz commented 4 years ago

There's a stata note yr25 omitted - can you check if we match when you manually drop that year?

~Also check your design is in fact balanced, IIRC we don't yet support imbalanced designs for multiple FEs.~ It looks like all groups have 25 in stata.


~This comment makes me think we have some numerical flakyness in the FE standard errors:~

https://github.com/DeclareDesign/estimatr/blob/a1215e01245a592833106fcb73b833f94bba8fc6/R/estimatr_lm_robust.R#L67-L70


Looking at the #s in the output, is there a difference in a denominator somewhere relating to n within groups ? eg:

> (.0133478 / 0.01362317)**2
[1] 0.9599818656
> 24/25
[1] 0.96

Could be coincidence but that's really close. Can you increase the cformat in stata and rerun? Also try excluding a year and see if the ratio changes to 23/24.