Closed jariji closed 10 months ago
It does not work if there are missing variables in the original dataframe or if fixed effects are of the form fe(id)&fe(year) (i.e. id-year fixed effects). It would be awesome if you could write a code that handles these two things.
Here is some background: https://github.com/FixedEffects/FixedEffectModels.jl/issues/204
Setting the missing
issue aside for now, I'm looking at the case of interacted fixed effects. Doing the naive thing seems to work here. Am I missing something?
julia> using DataFrames, FixedEffectModels
julia> df = let
halfX = allcombinations(DataFrame, :a => 1:3, :b => 10:10:30)
X = vcat(halfX, halfX)
d = DataFrame(X)
d.y = rand(nrow(d))
d
end
18×3 DataFrame
Row │ a b y
│ Int64 Int64 Float64
─────┼─────────────────────────
1 │ 1 10 0.634415
2 │ 2 10 0.10137
3 │ 3 10 0.619162
4 │ 1 20 0.308558
5 │ 2 20 0.673735
6 │ 3 20 0.0323582
7 │ 1 30 0.0197685
8 │ 2 30 0.22085
9 │ 3 30 0.875045
10 │ 1 10 0.747533
11 │ 2 10 0.150399
12 │ 3 10 0.82051
13 │ 1 20 0.259925
14 │ 2 20 0.728193
15 │ 3 20 0.340064
16 │ 1 30 0.983969
17 │ 2 30 0.376881
18 │ 3 30 0.799643
julia> m = FixedEffectModels.reg(df, @formula(y ~ fe(a) * fe(b)), save = true)
FixedEffectModel
==============================================================
Number of obs: 18 Converged: true
dof (model): 0 dof (residuals): 3
R²: 0.668 R² adjusted: -0.880
F-statistic: NaN P-value: NaN
R² within: -0.000 Iterations: 3
==============================================================
Estimate Std. Error t-stat Pr(>|t|) Lower 95% Upper 95%
──────────────────────────────────────────────────────────────
==============================================================
julia> m.fe
18×5 DataFrame
Row │ a b fe_a fe_b fe_a&fe_b
│ Int64 Int64 Float64? Float64? Float64?
─────┼────────────────────────────────────────────────
1 │ 1 10 0.487636 0.0146608 0.188678
2 │ 2 10 0.429074 0.0146608 -0.31785
3 │ 3 10 0.53202 0.0146608 0.173155
4 │ 1 20 0.487636 -0.046219 -0.157175
5 │ 2 20 0.429074 -0.046219 0.318109
6 │ 3 20 0.53202 -0.046219 -0.29959
7 │ 1 30 0.487636 0.0315582 -0.0173249
8 │ 2 30 0.429074 0.0315582 -0.161766
9 │ 3 30 0.53202 0.0315582 0.273766
10 │ 1 10 0.487636 0.0146608 0.188678
11 │ 2 10 0.429074 0.0146608 -0.31785
12 │ 3 10 0.53202 0.0146608 0.173155
13 │ 1 20 0.487636 -0.046219 -0.157175
14 │ 2 20 0.429074 -0.046219 0.318109
15 │ 3 20 0.53202 -0.046219 -0.29959
16 │ 1 30 0.487636 0.0315582 -0.0173249
17 │ 2 30 0.429074 0.0315582 -0.161766
18 │ 3 30 0.53202 0.0315582 0.273766
julia> unique(m.fe)
9×5 DataFrame
Row │ a b fe_a fe_b fe_a&fe_b
│ Int64 Int64 Float64? Float64? Float64?
─────┼────────────────────────────────────────────────
1 │ 1 10 0.487636 0.0146608 0.188678
2 │ 2 10 0.429074 0.0146608 -0.31785
3 │ 3 10 0.53202 0.0146608 0.173155
4 │ 1 20 0.487636 -0.046219 -0.157175
5 │ 2 20 0.429074 -0.046219 0.318109
6 │ 3 20 0.53202 -0.046219 -0.29959
7 │ 1 30 0.487636 0.0315582 -0.0173249
8 │ 2 30 0.429074 0.0315582 -0.161766
9 │ 3 30 0.53202 0.0315582 0.273766
julia> fes = leftjoin(df, unique(m.fe); on=m.fekeys, makeunique=true)
18×6 DataFrame
Row │ a b y fe_a fe_b fe_a&fe_b
│ Int64 Int64 Float64 Float64? Float64? Float64?
─────┼───────────────────────────────────────────────────────────
1 │ 1 10 0.634415 0.487636 0.0146608 0.188678
2 │ 2 10 0.10137 0.429074 0.0146608 -0.31785
3 │ 3 10 0.619162 0.53202 0.0146608 0.173155
4 │ 1 20 0.308558 0.487636 -0.046219 -0.157175
5 │ 2 20 0.673735 0.429074 -0.046219 0.318109
6 │ 3 20 0.0323582 0.53202 -0.046219 -0.29959
7 │ 1 30 0.0197685 0.487636 0.0315582 -0.0173249
8 │ 2 30 0.22085 0.429074 0.0315582 -0.161766
9 │ 3 30 0.875045 0.53202 0.0315582 0.273766
10 │ 1 10 0.747533 0.487636 0.0146608 0.188678
11 │ 2 10 0.150399 0.429074 0.0146608 -0.31785
12 │ 3 10 0.82051 0.53202 0.0146608 0.173155
13 │ 1 20 0.259925 0.487636 -0.046219 -0.157175
14 │ 2 20 0.728193 0.429074 -0.046219 0.318109
15 │ 3 20 0.340064 0.53202 -0.046219 -0.29959
16 │ 1 30 0.983969 0.487636 0.0315582 -0.0173249
17 │ 2 30 0.376881 0.429074 0.0315582 -0.161766
18 │ 3 30 0.799643 0.53202 0.0315582 0.273766
julia> combine(fes, AsTable(Not(m.fekeys)) => sum => :prediction)
18×1 DataFrame
Row │ prediction
│ Float64
─────┼────────────
1 │ 1.32539
2 │ 0.227254
3 │ 1.339
4 │ 0.592799
5 │ 1.3747
6 │ 0.218569
7 │ 0.521638
8 │ 0.519716
9 │ 1.71239
10 │ 1.43851
11 │ 0.276283
12 │ 1.54035
13 │ 0.544166
14 │ 1.42916
15 │ 0.526274
16 │ 1.48584
17 │ 0.675747
18 │ 1.63699
Hmm.. maybe what was missing was interaction with continuous variable, like y & fe(a)?
I had completely forgotten about #204 and the discussion had died down after my suggestion for dealing with the missing
issue. Could you point me to an example of the interacted FE issue? It would be really good to get predict
back, we just need a more comprehensive testset that covers the issues raised with my old predict
implementation.
predict
is not implemented for models with fixed effects but I would like to use this functionality.https://github.com/FixedEffects/FixedEffectModels.jl/blob/851eca92998133fbb2780c4db1898c3f903d1d8f/src/FixedEffectModel.jl#L132-L139
That code looks okay to me but the comment says it's wrong, so I'm reluctant to try implementing it myself lest I get it wrong. What is the problem with this code?