FixedEffects / FixedEffectModels.jl

Fast Estimation of Linear Models with IV and High Dimensional Categorical Variables
Other
225 stars 46 forks source link

The degrees of freedom and Std.Error are not calculated correctly when there are collinear fixed effects. #199

Closed xiaobaaaa closed 5 months ago

xiaobaaaa commented 2 years ago

The degrees of freedom and Std.Error are not calculated correctly when there are collinear fixed effects.

Thanks for the great package! I have been using this package for quite some time now and its high performance has solved many of my problems. I found some bugs during use.

I use the nlswork dataset to do the tests. For stata:

clear all
webuse nlswork
reghdfe ln_wage age ttl_exp tenure not_smsa , absorb(i.year#i.occ_code)
est store hd1
reghdfe ln_wage age ttl_exp tenure not_smsa , absorb(i.year##i.occ_code)
est store hd2
reghdfe ln_wage age ttl_exp tenure not_smsa , absorb(i.year i.occ_code i.year#i.occ_code)
est store hd3
reg ln_wage age ttl_exp tenure not_smsa i.year#i.occ_code
est store reg1
reg ln_wage age ttl_exp tenure not_smsa i.year##i.occ_code
est store reg2
reg ln_wage age ttl_exp tenure not_smsa i.year i.occ_code i.year#i.occ_code
est store reg3
esttab hd1 hd2 hd3 reg1 reg2 reg3, keep(age ttl_exp tenure not_smsa) se stats(df_) nostar

*save data for julia
export delimited using "~\nlswork.csv", replace

For JULIA:

using CSV, DataFrames, FixedEffectModels, RegressionTables
df = DataFrame(CSV.File("~\\nlswork.csv"))
reg1 = reg(df, @formula(ln_wage ~ age + ttl_exp + tenure + not_smsa + fe(year) & fe(occ_code)))
reg2 = reg(df, @formula(ln_wage ~ age + ttl_exp + tenure + not_smsa + fe(year) * fe(occ_code)))
reg3 = reg(df, @formula(ln_wage ~ age + ttl_exp + tenure + not_smsa + fe(year) + fe(occ_code) + fe(year) & fe(occ_code)))

The Std.Error of reg2 and reg3 are the same, but there is a slight difference in the Std.Error of reg1. As for age, the Std.Errors in reg1, reg2, and reg3 are 3.93088, 3.92897, and 3.92897.

However, in stata, all results are the same in reghdfe and reg. The year and occ_code fixed effects are omitted due to colinearity. This is because reg2 and reg3 in julia have different degrees of freedom than reg1 (171 for reg1 ,and 198 for reg2 and reg3). Maybe the degrees of freedom are not calculated correctly when there are collinear fixed effects.

BTW, FixedEffectModels.jl also dropped the year and occ_code fixed effects, and the coefficients are all the same in reg1, reg2, and reg3.

I have also tested with other datasets and found similar problems. And what worries me is that it is still unclear to me whether there is a problem with the calculation of degrees of freedom and standard errors when there is collinearty between several fixed effects, but not complete collinearty.

matthieugomez commented 2 years ago

It turns out that computing the right degrees of freedoms with high dimensional fixed effects is a hard problem (see the documentation of reghdfe about this). PRs to be as good as reghdfe on this are welcome!

That being said, for small dimensional fixed effects, as in your example, you can obtain the correct standard errors by using categorical variables in the formula (instead of using fe).

IljaK91 commented 2 years ago

I want to write that I observed the same behavior when running a regression with country*year fixed effects as described in the opening post.

Moreover, it seems like when running a regression programmatically, the notation fe(d1)*fe(d2) does not work, whereas fe(d1)&*fe(d2) does work with the following error:

ERROR: MethodError: no method matching *(::FixedEffectModels.FixedEffectTerm, ::FixedEffectModels.FixedEffectTerm)
Closest candidates are:
  *(::Any, ::Any, ::Any, ::Any...) at C:\Users\kantorov\.julia\juliaup\julia-1.7.3+0~x64\share\julia\base\operators.jl:655
  *(::Union{MathOptInterface.ScalarAffineFunction{T}, MathOptInterface.ScalarQuadraticFunction{T}, MathOptInterface.VectorAffineFunction{T}, MathOptInterface.VectorQuadraticFunction{T}}, ::T) where T at C:\Users\kantorov\.julia\packages\MathOptInterface\kCmJV\src\Utilities\functions.jl:3270
  *(::SpecialFunctions.SimplePoly, ::Any) at C:\Users\kantorov\.julia\packages\SpecialFunctions\jqvAz\src\expint.jl:8
  ...
Stacktrace:
 [1] top-level scope
   @ l:\Localbitcoin\Code_Ilja\main.jl:124
matthieugomez commented 2 years ago

@IljaK91 Please open a separate issue with a fully replicable code.