FixedEffects / Vcov.jl

Variance Covariance Matrices for developers
Other
9 stars 4 forks source link

Feature parity with Wooldridge and other packages #6

Open azev77 opened 3 years ago

azev77 commented 3 years ago

Wooldridge tweeted that every econometrics package should come with the following options for standard errors for virtually every command:

"In Stata, I'd like to type things like" glm y x1 ... xK, fam(logit) vce(hac nw 4) xtpoisson y x1 ... xK, fe vce(shac lat(Lx) long(Ly) dist(50)) qreg y x1 ... xK, q(.25) vce(dk 4)

azev77 commented 3 years ago

I myself have written some (unoptimized) code too:

using LinearAlgebra, BenchmarkTools, DataFrames;
x =[ones(506) randn(506,3)]; n,p =size(x);
y = x*[1;2;3;4] + randn(506);

b_ls = x\y; y_ls = x*b_ls; e_ls = y-y_ls; s2_ls = ((e_ls'e_ls)/(n-p))[1];

# Sandwich: v(Ω; x=x) = (x'x)^-1 * (x'Ω*x) * (x'x)^-1
v(Ω; x=x) = (x'x) \ (x'Ω*x) / (x'x)

# Newey West Lags 
relu(x) = max(x,0)
NW(nlags,n) = [relu(1 - abs(j-i)/(nlags+1)) for i=1:n, j=1:n]

# SE: 12 methods
function SE(x, y, n, p, e_ls, s2_ls)
    H = x/(x'x) * x'     #H = x*inv(x'x)*x'     ŷ = Hy
    M = I - H            #                      ε̂ = My
    δ = min.(4,n/p .* diag(H) )
    δm = min.(1,n/p .* diag(H) ) + min.(1.5,n/p  .* diag(H) )
    mx = max(n*0.7*maximum(diag(H))/p, 4)
    α  = min.(mx, n/p .* diag(H))
    e_ti = e_ls ./ (diag(I-H))
    #
    HS   = v( Diagonal(fill(s2_ls,n)) )                      #Spherical
    HC0  = v( Diagonal(e_ls*e_ls') * 1.00 )
    HC1  = v( Diagonal(e_ls*e_ls') * (n/(n-p)) )
    HC2  = v( Diagonal(e_ls*e_ls') / (Diagonal(I-H)) )
    HC3  = v( Diagonal(e_ls*e_ls') / ((Diagonal(I-H))^2 ) )
    HC4  = v( Diagonal(e_ls*e_ls') / ((Diagonal(I-H)) .^ δ ) )
    HC4m = v( Diagonal(e_ls*e_ls') / ((Diagonal(I-H)) .^ δm ) )
    HC5  = v( Diagonal(e_ls*e_ls') / sqrt.((Diagonal(I-H)) .^ α ) )
    HCJ  = v( ((n-1)/n) * (Diagonal(e_ti*e_ti') - (1/n)*e_ti*e_ti') )
    # CRVE example w/ 3 clusters
    ix =[1:200 , 201:400 , 401:506]
    blocks = [e_ls[ix[1]]*e_ls[ix[1]]', e_ls[ix[2]]*e_ls[ix[2]]', e_ls[ix[3]]*e_ls[ix[3]]']
    #D      = Diagonal(blocks)
    G = size(ix,1);
    HLZ = v( cat(blocks...,dims=(1,2)) * ((G)/(G-1)) * ((n-1)/(n-p)) )
    #HBC = v( M^(-1/2) * cat(blocks...,dims=(1,2)) * M^(-1/2) )
    HBC = v( real(sqrt(Hermitian(M^-1))) * cat(blocks...,dims=(1,2)) * real(sqrt(Hermitian(M^-1))) )
    # Multi-way clustering
    # NW
    lags = Int(max(round(n^0.25),1.0))
    HNW = v( (e_ls*e_ls') .* NW(lags,n) )
    #
    # Spatially Correlated SE: Conley, Driscoll-Kraay
    # Bootstrap: Wildclustered Bootstrap
    #
    #return HS, HC0, HC1, HC2, HC3, HC4, HC4m, HC5, HCJ, HLZ, HNW
    cov = [HS, HC0, HC1, HC2, HC3, HC4, HC4m, HC5, HCJ, HLZ, HBC, HNW]
    se = [sqrt.(diag(x)) for x in cov]
    #
    se = reshape(vcat(se...), 4, 12)
    se = DataFrame(round.(se, sigdigits=3))
    rename!(se, [:HS, :HC0, :HC1, :HC2, :HC3, :HC4, :HC4m, :HC5, :HCJ, :HLZ, :HBC, :HNW])
    insertcols!(se, 1, :coef => ["β0","β1","β2","β3"])
    return se
end

se1 = SE(x, y, n, p, e_ls, s2_ls)

Here is a screenshot w/ part of the table: image

PaulSoderlind commented 3 years ago

I have several of those, but in somewhat crude versions. In particular, the combination of autocorrelation and (some kind of) clustering tends to make things messy. I have the code, but it's not pretty.

Still, If anyone takes the lead to create a common approach, I am happy o contribute.

/Paul S

PaulSoderlind commented 3 years ago

Another thought/observation: things like HC0 = v( Diagonal(e_ls*e_ls') * 1.00 ) work on balanced panels.

It becomes substantially messier with unbalanced panels. To keep the matrix notation, a set of interactive "data exists" dummies can be used (see Hayashi's text). Alternatively, create a loop.

matthieugomez commented 3 years ago

@azev77 Yes it'd be great to have more types of standard errors! @PaulSoderlind Yes, being able to write these loops without being slower is definitely one of the advantages of Julia. This is already the case for clustered std, i.e. https://github.com/matthieugomez/Vcov.jl/blob/b324f45b788b06e46301963fadb9da8ceaceb150/src/covarianceestimators/vcovcluster.jl#L172

azev77 commented 3 years ago

@matthieugomez have you defined the scope of Vcov.jl?

matthieugomez commented 3 years ago

The goal is for developers (like GLM etc) to depend on this package and use it to compute standard errors. This would simplify things for users (i.e. simply specify Vcov.robust() as an argument of the function fit) without requiring every package to implement standard errors computation.

IljaK91 commented 2 years ago

Is there any movement in this direction? Working out these things would allow many people to move some workflows from R to Julia (regression -> custom standard errors -> latex regression table -> inclusion in paper). I am happy to join efforts if there is someone else who has a good idea of how to tackle this.

azev77 commented 2 years ago

Btw, @droodman, has: https://github.com/droodman/WildBootTests.jl