FixedEffects / FixedEffectModels.jl

Fast Estimation of Linear Models with IV and High Dimensional Categorical Variables
Other
227 stars 46 forks source link

[Bug] Multi-way clustered standard error when missing is allowed #195

Closed caibengbu closed 2 years ago

caibengbu commented 2 years ago

Hello all,

Thanks a lot for this amazing package!

I am running into a small bug while working with your package, however. My FixedEffectModels.jl version is v1.6.5 and I am running on Mac OS. Here is a minimal reproducible example:

df = DataFrame(Y=rand(10),X=rand(10),cat1=repeat([1,2],inner=5),cat2=repeat([1,2],outer=5))
allowmissing!(df)
reg(df,@formula(Y~X+fe(cat1)+fe(cat2)),Vcov.cluster(:cat1,:cat2))

The bug persists if actual missing value is added.

I got ERROR: BoundsError: attempt to access 3-element GroupedArrays.GroupedRefPool{Union{Missing, Int64}} with indices 0:2 at index [3]. I briefly looked into what happened, it seems that GroupedRefPool with indices 0:2 defined in https://github.com/FixedEffects/GroupedArrays.jl/blob/dc5ecc3f897366c541b17462cae56b90f8f95a75/src/GroupedArrays.jl#L150-L163 cannot be handled by findfirst here: https://github.com/FixedEffects/GroupedArrays.jl/blob/dc5ecc3f897366c541b17462cae56b90f8f95a75/src/utils.jl#L270-L273

I think the issue is that LinearIndices for GroupedRefPool is undefined and Julia calls Base.LinearIndices(x::AbstractArray) which returns [1,2,3,...] instead of [0,1,2,...]. This problem doesn't pop up when the model has a one-way cluster because refpool would be a IntegerRefpool, or when missing is disallowed because GroupedRefPool's indices would start at 1.

Here is a pull request FixedEffects/GroupedArrays.jl#5 I created regarding this issue.

matthieugomez commented 2 years ago

Just tagged a new version of GroupedArrays to correct this. Thanks again for your help.