does SE clustering actually work?

thorek1 commented 3 years ago

Hi,

I was wondering whether standard error clustering actually works. In the code example you can see that clustering does not change the standard errors whereas I would expect them to move and fixest estimates them to change:

n <- 1000

set.seed(1)
dat <- data.frame(y = runif(n) > .5,
                  x1 = rnorm(n),
                  x2 = rnorm(n) + 2,
                  x2 = (rnorm(n) + 2) / 2,
                  x3 = sample(LETTERS,n, T),
                  x4 = sample(letters,n, T))

library(alpaca)

alpaca::feglm(y ~ x1 + x2 | x3, dat) |> summary()
#binomial - logit link
#
#y ~ x1 + x2 | x3
#
#Estimates:
#   Estimate Std. error z value Pr(> |z|)
#x1 -0.01404    0.06397  -0.219     0.826
#x2 -0.01322    0.06193  -0.213     0.831
#
#residual deviance= 1341.74,
#null deviance= 1384.69,
#n= 1000, l= [26]
#
#Number of Fisher Scoring Iterations: 4 
alpaca::feglm(y ~ x1 + x2 | x3 | x4, dat) |> summary()
#binomial - logit link
#
#y ~ x1 + x2 | x3 | x4
#
#Estimates:
#   Estimate Std. error z value Pr(> |z|)
#x1 -0.01404    0.06397  -0.219     0.826
#x2 -0.01322    0.06193  -0.213     0.831
#
#residual deviance= 1341.74,
#null deviance= 1384.69,
#n= 1000, l= [26]
#
#Number of Fisher Scoring Iterations: 4 

library(fixest)

fixest::feglm(y ~ x1 + x2 | x3, dat, family = 'binomial')
#GLM estimation, family = binomial, Dep. Var.: y
#Observations: 1,000 
#Fixed-effects: x3: 26
#Standard-errors: Clustered (x3) 
#    Estimate Std. Error   t value Pr(>|t|)) 
#x1 -0.014040   0.054451 -0.257846  0.796526 
#x2 -0.013218   0.055863 -0.236617  0.812954 
#---
#Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Log-Likelihood:  -670.9   Adj. Pseudo R2: -0.007979
#           BIC: 1,535.2     Squared Cor.:  0.041794
fixest::feglm(y ~ x1 + x2 | x3, dat, family = 'binomial', cluster = 'x4')
#GLM estimation, family = binomial, Dep. Var.: y
#Observations: 1,000 
#Fixed-effects: x3: 26
#Standard-errors: Clustered (x4) 
#    Estimate Std. Error   t value Pr(>|t|)) 
#x1 -0.014040   0.058249 -0.241034  0.809529 
#x2 -0.013218   0.058738 -0.225035  0.821952 
#---
#Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Log-Likelihood:  -670.9   Adj. Pseudo R2: -0.007979
#           BIC: 1,535.2     Squared Cor.:  0.041794

amrei-stammann commented 3 years ago

Hello,

summary() returns per default standard errors based on the inverse of the negative Hessian (not clustered).

If you want to compute clustered standard errors, you need to specify further arguments, see example below:

# clustered by x3
summary(feglm(y ~ x1 + x2 | x3, dat) , type = "clustered", cluster = ~ x3)

# clustered by x4
summary(feglm(y ~ x1 + x2 | x3 | x4, dat) , type = "clustered", cluster = ~ x4)

A further example is available in the vignette: https://cran.r-project.org/web/packages/alpaca/vignettes/trade.html

Best wishes, Amrei

thorek1 commented 3 years ago

Thanks for your answer. I should have read the documentation more thoroughly.

In the second example you give, why would you need to specify the x4 in the formula? It seems redundant and does not change the functionality (not needed in your first example). I would have expected, from reading the help page of feglm that specifying it there would be enough to have clustered standard errors printed.

Using summary to print the clustered standard errors unfortunately makes a typical workflow of mine impossible:

results <- list()
alpaca::feglm(y ~ x1 + x2 | x3, dat) |> summary() -> results[[1]]
alpaca::feglm(y ~ x1 + x2 | x3 | x4, dat) |> summary() -> results[[2]]

library(texreg)
results |> texreg()
#Error in extract(l[[i]], ...) : 
#  Neither texreg nor broom supports models of class summary.feglm.

alpaca::feglm(y ~ x1 + x2 | x3, dat) -> results[[1]]
alpaca::feglm(y ~ x1 + x2 | x3 | x4, dat) -> results[[2]]

library(texreg)
results |> texreg()
# no clustered standard errors

Integration with other tools (such as texreg, stargazer,...) makes life a lot easer as is the case for fixest.

pachadotdev commented 3 months ago

here is a a modification that I made to simplify the clustering https://github.com/amrei-stammann/alpaca/issues/17#issuecomment-2244095567

amrei-stammann / alpaca

does SE clustering actually work? #16