DeclareDesign / estimatr

estimatr: Fast Estimators for Design-Based Inference
https://declaredesign.org/r/estimatr
Other
131 stars 20 forks source link

lh_robust gives inconsistent confidence intervals if using clusters #405

Open rcragun opened 1 year ago

rcragun commented 1 year ago

Overview

If you specify clusters for lh_robust, the confidence intervals (CIs) and p in $lh are inconsistent with those in $lm_robust.

Reproduce

The problem can be seen by using a hypothesis that one coefficient equals 0.

Simple example data:

library(estimatr)
nSize = 12
dat = data.frame(
  x = rnorm(nSize),
  e = rnorm(nSize),
  # Irrelevant clusters for errors
  eg = sample(2, nSize, replace=T)
)
dat$z = dat$x + dat$e

CIs match when not correcting for error correlation:

> lh_robust(z~x, data=dat, se_type='HC2', linear_hypothesis='x=0')
$lm_robust
             Estimate Std. Error    t value   Pr(>|t|)    CI Lower CI Upper DF
(Intercept) -0.137880  0.2850087 -0.4837747 0.63896458 -0.77291900 0.497159 10
x            0.620707  0.3135789  1.9794287 0.07594477 -0.07799024 1.319404 10

$lh
    Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
x=0   0.6207     0.3136   1.979  0.07594 -0.07799    1.319 10

CIs don't match when correcting for error correlation:

> lh_robust(z~x, data=dat, clusters=eg, se_type='stata', linear_hypothesis='x=0')
$lm_robust
             Estimate Std. Error    t value  Pr(>|t|)  CI Lower CI Upper DF
(Intercept) -0.137880  0.4824367 -0.2857991 0.8227790 -6.267819 5.992059  1
x            0.620707  0.4538092  1.3677710 0.4019017 -5.145485 6.386899  1

$lh
    Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
x=0   0.6207     0.4538   1.368   0.2013  -0.3904    1.632 10

Using other se_types does not alter these facts.

Additional notes

The problem may be due to a difference in degrees of freedom used, so I am unsure if this is the same issue as https://github.com/DeclareDesign/estimatr/issues/289.

System info

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] estimatr_1.0.0

loaded via a namespace (and not attached):
 [1] httr_1.4.5      compiler_4.1.2  R6_2.5.1        cli_3.6.0       generics_0.1.3  tools_4.1.2    
 [7] abind_1.4-5     rstudioapi_0.14 car_3.1-2       Rcpp_1.0.9      carData_3.0-5   mvtnorm_1.1-3  
[13] texreg_1.38.6   Formula_1.2-5   rlang_1.1.0