amrei-stammann / alpaca

An R-package for fitting glm's with high-dimensional k-way fixed effects
43 stars 6 forks source link

Cluster specification in `feglm` formula has no effect? #17

Open jay-sf opened 2 years ago

jay-sf commented 2 years ago

Hi,

first of all thanks for maintaining the alpaca package! I noticed the formula specification with FE and cluster to be very similar to lfe::felm which is very neat! I just recommended it on Stack Overflow.

However, I am not sure, if the cluster specification in the third part of the formula actually works and yields clustered standard errors as expected. We only seem to get those if we specify type="clustered", cluster=~i in the summary:

> set.seed(42); data <- simGLM(1000L, 20L, 1805L, model = "logit")

> summary(feglm(y ~ x1 + x2 + x3 | i + t, data), type="sandwich")$cm
    Estimate Std. error   z value Pr(> |z|)
x1  1.090884 0.02458502  44.37190         0
x2 -1.106484 0.02424425 -45.63902         0
x3  1.123164 0.02453891  45.77074         0
> summary(feglm(y ~ x1 + x2 + x3 | i + t | i, data), type="sandwich")$cm
    Estimate Std. error   z value Pr(> |z|)
x1  1.090884 0.02458502  44.37190         0
x2 -1.106484 0.02424425 -45.63902         0
x3  1.123164 0.02453891  45.77074         0
> summary(feglm(y ~ x1 + x2 + x3 | i + t, data), type="clustered", cluster=~i)$cm
    Estimate Std. error   z value Pr(> |z|)
x1  1.090884 0.02482350  43.94562         0
x2 -1.106484 0.02444224 -45.26933         0
x3  1.123164 0.02620042  42.86817         0
> summary(feglm(y ~ x1 + x2 + x3 | i + t | i, data), type="clustered", cluster=~i)$cm
    Estimate Std. error   z value Pr(> |z|)
x1  1.090884 0.02482350  43.94562         0
x2 -1.106484 0.02444224 -45.26933         0
x3  1.123164 0.02620042  42.86817         0

I noticed, a similar observation was made in an earlier issue.

It would be great if you could have a look at this.

Cheers!

amrei-stammann commented 2 years ago

Hi, alpaca only computes clustered standard errors if you specify type="clustered", cluster=~i. The third part in the formula only has to be specified if the cluster variable is not part of the model specification. Since i is already specified in the second part of the formula, you don't need to specify it again in the third part.

Best wishes, Amrei

pachadotdev commented 4 months ago

hi @amrei-stammann I made my own modification of Alpaca (Capybara), and one of the changes is to pass the cluster as y ~ a + b | fixedeffects | cluster https://github.com/pachadotdev/capybara