MatthieuStigler / Misconometrics

A bunch of small R scripts useful in econometrics
GNU General Public License v3.0
6 stars 4 forks source link

Error message with felm #7

Closed estnzhrn closed 1 year ago

estnzhrn commented 1 year ago

Hey Matthieu,

First, thank you so much for sharing the code! I attempted to use the dec_covar function following these steps:

library(lfe)
library(tidyverse)
library(broom)
library(Formula)

devtools::source_url("https://raw.githubusercontent.com/MatthieuStigler/Misconometrics/master/Gelbach_decompo/dec_covar.R")
model <- felm(y~ var_main + control1 + control2 + control3 + control4 |fixedeffects|0|id, data = df)
dec <- dec_covar(object = model, var_main = "var_main")

However, I get an error message stating, "Error in formula.default(object, env = baseenv()): invalid formula." I suspect that the issue lies with the "| fixedeffects | 0 | id" part. If I remove it, the code runs without errors. Nevertheless, I would like to account for fixed effects and cluster at id level. Do you have an idea on how to address this and obtain the decomposition?

Thank you very much in advance!

MatthieuStigler commented 1 year ago

I can't reproduce the error... could you please provide a self-reproducible example? Thanks

estnzhrn commented 1 year ago

Yes, thank you very much in advance!

library(lfe) 
library(broom)
library(Formula)
library(devtools)

source_url("https://raw.githubusercontent.com/MatthieuStigler/Misconometrics/master/Gelbach_decompo/dec_covar.R")

set.seed(12345)
my.df <- data.frame(  id = sample(c(1:14000), 30000, replace = TRUE),
                      y = sample(c(4:150), 30000, replace = TRUE),
                      dummy = sample(c(1,2), 30000, replace = TRUE),
                      control1 = sample(c(1:380), 30000, replace = TRUE),
                      control2 = sample(c(1:4), 30000, replace = TRUE),
                      fe = sample(c(1:70), 30000, replace = TRUE))

# Works if I do not account for fixed effects and clustering (i.e., disregarding "|fe|0|id")
regression = felm(y ~ dummy + control1 + control2, data = my.df)

dec <- dec_covar(object = regression, var_main = "dummy")
dec_long <- dec_covar(object = regression, var_main = "dummy", format = "long", add_coefs = TRUE, conf.int = TRUE)
plot_dec(dec_long) + ggtitle("Effect of each covariate on the main variable's coef")

# Does not work if I account for fixed effects and clustering:
regression = felm(y ~ dummy + control1 + control2 |fe|0|id, data = my.df)

dec <- dec_covar(object = regression, var_main = "dummy")
# Error Mesage: "Error in formula.default(object, env = baseenv()) : invalid formula"
MatthieuStigler commented 1 year ago

ok thanks, confirmed!

The problem is actually not about the FE, but about the fact that your lfe formula contains some 0. That's a problem with lfe which I had reported ong ago: https://github.com/sgaure/lfe/issues/6.

So the easy answer for you is: if you are ready to drop the clustered-error, and hence only use y ~ dummy + control1 + control2 |fe this will work.

library(lfe) 
#> Loading required package: Matrix
library(Formula) 

source("https://raw.githubusercontent.com/MatthieuStigler/Misconometrics/master/Gelbach_decompo/dec_covar.R")

set.seed(12345)
my.df <- data.frame(  id = sample(c(1:14000), 30000, replace = TRUE),
                      y = sample(c(4:150), 30000, replace = TRUE),
                      dummy = sample(c(1,2), 30000, replace = TRUE),
                      control1 = sample(c(1:380), 30000, replace = TRUE),
                      control2 = sample(c(1:4), 30000, replace = TRUE),
                      fe = sample(c(1:70), 30000, replace = TRUE))

regression_1 = felm(y ~ dummy + control1 + control2, data = my.df)
regression_2 = felm(y ~ dummy + control1 + control2 |0|0|0, data = my.df)

dec <- dec_covar(object = regression_1, var_main = "dummy")
dec <- dec_covar(object = regression_2, var_main = "dummy")
#> Error in formula.default(object, env = baseenv()): invalid formula

Created on 2023-06-19 with reprex v2.0.2

MatthieuStigler commented 1 year ago

Good news is that dec_covar seems to run smoothly with fiexst::feols instead:

library(lfe) 
#> Loading required package: Matrix
library(fixest)

source("https://raw.githubusercontent.com/MatthieuStigler/Misconometrics/master/Gelbach_decompo/dec_covar.R")

set.seed(12345)
my.df <- data.frame(  id = sample(c(1:14000), 30000, replace = TRUE),
                      y = sample(c(4:150), 30000, replace = TRUE),
                      dummy = sample(c(1,2), 30000, replace = TRUE),
                      control1 = sample(c(1:380), 30000, replace = TRUE),
                      control2 = sample(c(1:4), 30000, replace = TRUE),
                      fe = sample(c(1:70), 30000, replace = TRUE))

regression_2_felm = felm(y ~ dummy + control1 + control2 |fe, data = my.df)
regression_2_feols_noClus = feols(y ~ dummy + control1 + control2 |fe, data = my.df, 
                                  vcov="iid")
regression_2_feols_clus = feols(y ~ dummy + control1 + control2 |fe, data = my.df,
                                cluster="id")

all.equal(vcov(regression_2_felm), vcov(regression_2_feols_noClus))
#> [1] TRUE

## wide: all the same as no inference
dec_2_felm <- dec_covar(object = regression_2_felm, var_main = "dummy")
dec_2_feols_noClus <- dec_covar(object = regression_2_feols_noClus, var_main = "dummy")
#> Warning: attributes are not identical across measure variables; they will be
#> dropped
dec_2_feols_clus <- dec_covar(object = regression_2_feols_clus, var_main = "dummy")
#> Warning: attributes are not identical across measure variables; they will be
#> dropped

all.equal(dec_2_felm, dec_2_feols_noClus)
#> [1] TRUE
all.equal(dec_2_felm, dec_2_feols_clus)
#> [1] TRUE

## long: feols and felm same when using same vcov
dec_2_felm_long <- dec_covar(object = regression_2_felm, var_main = "dummy", format = "long", add_coefs = TRUE, conf.int = TRUE)
dec_2_feols_long_noClus <- dec_covar(object = regression_2_feols_noClus, var_main = "dummy", format = "long", add_coefs = TRUE, conf.int = TRUE)
dec_2_feols_long_clus <- dec_covar(object = regression_2_feols_clus, var_main = "dummy", format = "long", add_coefs = TRUE, conf.int = TRUE)

all.equal(dec_2_felm_long, dec_2_feols_long_noClus, check.attributes=FALSE)
#> [1] TRUE
all.equal(dec_2_felm_long, dec_2_feols_long_clus, check.attributes=FALSE)
#> [1] "Component \"beta_K_low\": Mean relative difference: 0.0004760047" 
#> [2] "Component \"beta_K_high\": Mean relative difference: 0.0005893639"
#> [3] "Component \"gamma_low\": Mean relative difference: 0.008025734"   
#> [4] "Component \"gamma_high\": Mean relative difference: 0.008754324"

Created on 2023-06-19 with reprex v2.0.2

estnzhrn commented 1 year ago

Thanks a lot for your kind help and prompt response! Works perfectly!

MatthieuStigler commented 1 year ago

glad to hear that!

If this code was useful for your research, I would be thankful if you could acknowledge it:

If you use this script in an academic publication, I would appreciate if you could cite it accordingly:

Stigler, Matthieu (2018) "dec_covar, an R implementation of Gelbach's covariate decomposition". https://github.com/MatthieuStigler/Misconometrics/tree/master/Gelbach_decompo