FixedEffects / FixedEffectModels.jl

Fast Estimation of Linear Models with IV and High Dimensional Categorical Variables
Other
225 stars 46 forks source link

Ignore rows with `Inf`s? #214

Open jmboehm opened 2 years ago

jmboehm commented 2 years ago

The following seems like a 'classic' trap:

using FixedEffectModels, RDatasets
df = dataset("plm", "Cigar")
# assume some entries are Inf
df.Sales[1] = 0.0
df.logsales = log.(df.Sales)
reg(df, @formula(logsales ~ NDI + fe(State) + fe(Year)), Vcov.cluster(:State), weights = :Pop)

gives

ERROR: "Some observations for the dependent variable are infinite"
Stacktrace:
 [1] reg(df::Any, formula::FormulaTerm, vcov::StatsBase.CovarianceEstimator; contrasts::Dict, weights::Union{Nothing, Symbol}, save::Union{Bool, Symbol}, method::Symbol, nthreads::Integer, double_precision::Bool, tol::Real, maxiter::Integer, drop_singletons::Bool, progress_bar::Bool, dof_add::Integer, subset::Union{Nothing, AbstractVector}, first_stage::Bool)
   @ FixedEffectModels ~/.julia/packages/FixedEffectModels/kJPKw/src/fit.jl:176
 [2] top-level scope
   @ REPL[9]:1

I feel the package could automatically drop rows where the regressand or one of the regressors is infinite, similarly to how it does with missings. What's the argument against that?

matthieugomez commented 1 year ago

It's a bit tricky because Inf is meaningful. I will leave this issue open so that people can report if they encounter the same issue.

junder873 commented 1 year ago

I run into this sometimes as well. The R fixest package also automatically drops those observations.

nilshg commented 10 months ago

I think a "do what I mean" approach (dropping Inf) is unidiomatic in the Julia Stats ecosystem. If rows are being ignored at a minimum there should be a warning.