FixedEffects / FixedEffectModels.jl

Fast Estimation of Linear Models with IV and High Dimensional Categorical Variables
Other
227 stars 46 forks source link

Differences in R2 and tss compared to GLM #179

Closed mdkeehan closed 7 months ago

mdkeehan commented 3 years ago

Hello, Thank you for writing this package I wanted to use it as a go faster drop in replacement for the GLM.jl package. As a test I tried fitting the same model using both GLM and FixedEffectsModels and noticed differences in the R^2 reported. I have tried to replicate the difference in a minimum working example which I have attached. I have only been able to show a small difference which comes down to the following: @formula(obs~0+year+treatment) vs @formula(obs~0+year+fe(treatment)). It may be that my understanding of the the statistical model difference between the two formula is incomplete in which case a faq or a note in the documentation may help someone in the future. Here is my small working example which I hope might be the basis for a test.

using CUDA
using FixedEffectModels, GLM
using DataFrames, CategoricalArrays
using Test

@testset "Compare FE to GLM" begin
    treatmentD = ["A","A","A","B","B","B","B"]
    yearD = [0,1,2,       0,1,2,3]

    testframe=DataFrame(treatment = treatmentD, year = yearD,
                    obs = rand(length(treatmentD)) )

    testframe[!,:treatment]=categorical(testframe[:,:treatment])

    ols1=fit(LinearModel,@formula(obs~0+year+treatment),testframe)

    fesol=reg(testframe,@formula(obs~0+year+fe(treatment)),save=true)
    @test isapprox(r2(fesol),r2(ols1))
    @test isapprox(adjr2(fesol),adjr2(ols1))
    @test isapprox(coef(fesol)[1],coef(ols1)[1])
    @test isapprox(residuals(fesol),residuals(ols1))

    nofesol=reg(testframe,@formula(obs~0+year+treatment),save=true)
    @test isapprox(r2(nofesol),r2(ols1))
    @test isapprox(adjr2(nofesol),adjr2(ols1))
    @test isapprox(coef(nofesol),coef(ols1))
    @test isapprox(residuals(nofesol,testframe),residuals(ols1))

    @test isapprox(nofesol.tss, fesol.tss) #this fails
    @test isapprox(fesol.tss, nulldeviance(ols1))

    @test isapprox(nofesol.rss, fesol.rss)
    @test isapprox(nofesol.rss, deviance(ols1))

    nofesolgpu=reg(testframe,@formula(obs~0+year+treatment),method=:gpu, double_precision=true,save=true)
    @test isapprox(r2(nofesolgpu),r2(ols1))
    @test isapprox(adjr2(nofesolgpu),adjr2(ols1))

    fesolgpu=reg(testframe,@formula(obs~0+year+fe(treatment)),method=:gpu, double_precision=true,save=true)
    @test isapprox(r2(fesolgpu),r2(ols1))
    @test isapprox(adjr2(fesolgpu),adjr2(ols1))

    @test isapprox(r2(fesolgpu),r2(fesol))
    @test isapprox(adjr2(fesolgpu),adjr2(fesol))
    @test isapprox(coef(fesolgpu),coef(fesol))

    @test isapprox(r2(nofesolgpu),r2(nofesol))
    @test isapprox(adjr2(nofesolgpu),adjr2(nofesol))
    @test isapprox(coef(fesolgpu),coef(fesol))
end
matthieugomez commented 3 years ago

I am not sure this is an error. The definition of r2 is ambiguous in the absence of an intercept FixedEffectModels reports the same thing as Stata.

See also: https://github.com/FixedEffects/FixedEffectModels.jl/issues/173