FixedEffects / FixedEffectModels.jl

Fast Estimation of Linear Models with IV and High Dimensional Categorical Variables
Other
225 stars 46 forks source link

`predict` for fixed effects #243

Closed jariji closed 10 months ago

jariji commented 1 year ago

predict is not implemented for models with fixed effects but I would like to use this functionality.

https://github.com/FixedEffects/FixedEffectModels.jl/blob/851eca92998133fbb2780c4db1898c3f903d1d8f/src/FixedEffectModel.jl#L132-L139

That code looks okay to me but the comment says it's wrong, so I'm reluctant to try implementing it myself lest I get it wrong. What is the problem with this code?

matthieugomez commented 1 year ago

It does not work if there are missing variables in the original dataframe or if fixed effects are of the form fe(id)&fe(year) (i.e. id-year fixed effects). It would be awesome if you could write a code that handles these two things.

Here is some background: https://github.com/FixedEffects/FixedEffectModels.jl/issues/204

jariji commented 1 year ago

Setting the missing issue aside for now, I'm looking at the case of interacted fixed effects. Doing the naive thing seems to work here. Am I missing something?

julia> using DataFrames, FixedEffectModels

julia> df = let
           halfX = allcombinations(DataFrame, :a => 1:3, :b => 10:10:30)
           X = vcat(halfX, halfX)
           d = DataFrame(X)
           d.y = rand(nrow(d))
           d
       end
18×3 DataFrame
 Row │ a      b      y         
     │ Int64  Int64  Float64   
─────┼─────────────────────────
   1 │     1     10  0.634415
   2 │     2     10  0.10137
   3 │     3     10  0.619162
   4 │     1     20  0.308558
   5 │     2     20  0.673735
   6 │     3     20  0.0323582
   7 │     1     30  0.0197685
   8 │     2     30  0.22085
   9 │     3     30  0.875045
  10 │     1     10  0.747533
  11 │     2     10  0.150399
  12 │     3     10  0.82051
  13 │     1     20  0.259925
  14 │     2     20  0.728193
  15 │     3     20  0.340064
  16 │     1     30  0.983969
  17 │     2     30  0.376881
  18 │     3     30  0.799643

julia> m = FixedEffectModels.reg(df, @formula(y ~ fe(a) * fe(b)), save = true)
                       FixedEffectModel                       
==============================================================
Number of obs:              18  Converged:                true
dof (model):                 0  dof (residuals):             3
R²:                      0.668  R² adjusted:            -0.880
F-statistic:               NaN  P-value:                   NaN
R² within:              -0.000  Iterations:                  3
==============================================================
  Estimate  Std. Error  t-stat  Pr(>|t|)  Lower 95%  Upper 95%
──────────────────────────────────────────────────────────────

==============================================================

julia> m.fe
18×5 DataFrame
 Row │ a      b      fe_a      fe_b        fe_a&fe_b  
     │ Int64  Int64  Float64?  Float64?    Float64?   
─────┼────────────────────────────────────────────────
   1 │     1     10  0.487636   0.0146608   0.188678
   2 │     2     10  0.429074   0.0146608  -0.31785
   3 │     3     10  0.53202    0.0146608   0.173155
   4 │     1     20  0.487636  -0.046219   -0.157175
   5 │     2     20  0.429074  -0.046219    0.318109
   6 │     3     20  0.53202   -0.046219   -0.29959
   7 │     1     30  0.487636   0.0315582  -0.0173249
   8 │     2     30  0.429074   0.0315582  -0.161766
   9 │     3     30  0.53202    0.0315582   0.273766
  10 │     1     10  0.487636   0.0146608   0.188678
  11 │     2     10  0.429074   0.0146608  -0.31785
  12 │     3     10  0.53202    0.0146608   0.173155
  13 │     1     20  0.487636  -0.046219   -0.157175
  14 │     2     20  0.429074  -0.046219    0.318109
  15 │     3     20  0.53202   -0.046219   -0.29959
  16 │     1     30  0.487636   0.0315582  -0.0173249
  17 │     2     30  0.429074   0.0315582  -0.161766
  18 │     3     30  0.53202    0.0315582   0.273766

julia> unique(m.fe)
9×5 DataFrame
 Row │ a      b      fe_a      fe_b        fe_a&fe_b  
     │ Int64  Int64  Float64?  Float64?    Float64?   
─────┼────────────────────────────────────────────────
   1 │     1     10  0.487636   0.0146608   0.188678
   2 │     2     10  0.429074   0.0146608  -0.31785
   3 │     3     10  0.53202    0.0146608   0.173155
   4 │     1     20  0.487636  -0.046219   -0.157175
   5 │     2     20  0.429074  -0.046219    0.318109
   6 │     3     20  0.53202   -0.046219   -0.29959
   7 │     1     30  0.487636   0.0315582  -0.0173249
   8 │     2     30  0.429074   0.0315582  -0.161766
   9 │     3     30  0.53202    0.0315582   0.273766

julia> fes = leftjoin(df, unique(m.fe); on=m.fekeys, makeunique=true)
18×6 DataFrame
 Row │ a      b      y          fe_a      fe_b        fe_a&fe_b  
     │ Int64  Int64  Float64    Float64?  Float64?    Float64?   
─────┼───────────────────────────────────────────────────────────
   1 │     1     10  0.634415   0.487636   0.0146608   0.188678
   2 │     2     10  0.10137    0.429074   0.0146608  -0.31785
   3 │     3     10  0.619162   0.53202    0.0146608   0.173155
   4 │     1     20  0.308558   0.487636  -0.046219   -0.157175
   5 │     2     20  0.673735   0.429074  -0.046219    0.318109
   6 │     3     20  0.0323582  0.53202   -0.046219   -0.29959
   7 │     1     30  0.0197685  0.487636   0.0315582  -0.0173249
   8 │     2     30  0.22085    0.429074   0.0315582  -0.161766
   9 │     3     30  0.875045   0.53202    0.0315582   0.273766
  10 │     1     10  0.747533   0.487636   0.0146608   0.188678
  11 │     2     10  0.150399   0.429074   0.0146608  -0.31785
  12 │     3     10  0.82051    0.53202    0.0146608   0.173155
  13 │     1     20  0.259925   0.487636  -0.046219   -0.157175
  14 │     2     20  0.728193   0.429074  -0.046219    0.318109
  15 │     3     20  0.340064   0.53202   -0.046219   -0.29959
  16 │     1     30  0.983969   0.487636   0.0315582  -0.0173249
  17 │     2     30  0.376881   0.429074   0.0315582  -0.161766
  18 │     3     30  0.799643   0.53202    0.0315582   0.273766

julia> combine(fes, AsTable(Not(m.fekeys)) => sum => :prediction)
18×1 DataFrame
 Row │ prediction 
     │ Float64    
─────┼────────────
   1 │   1.32539
   2 │   0.227254
   3 │   1.339
   4 │   0.592799
   5 │   1.3747
   6 │   0.218569
   7 │   0.521638
   8 │   0.519716
   9 │   1.71239
  10 │   1.43851
  11 │   0.276283
  12 │   1.54035
  13 │   0.544166
  14 │   1.42916
  15 │   0.526274
  16 │   1.48584
  17 │   0.675747
  18 │   1.63699
matthieugomez commented 1 year ago

Hmm.. maybe what was missing was interaction with continuous variable, like y & fe(a)?

nilshg commented 11 months ago

I had completely forgotten about #204 and the discussion had died down after my suggestion for dealing with the missing issue. Could you point me to an example of the interacted FE issue? It would be really good to get predict back, we just need a more comprehensive testset that covers the issues raised with my old predict implementation.