JuliaStats / GLM.jl

Generalized linear models in Julia
Other
584 stars 114 forks source link

`ftest` and `-0.0` #461

Closed palday closed 1 year ago

palday commented 2 years ago

Currently, ftest uses the unadjusted R2, which is constrained by definition to be on [0, 1]. As it stands in the tests and doctests, we have several cases where the floating point results for a null model give an R2 of -0.0. That's fine numerically, but realistically we know that this should be positive zero. Moreover, if you use a different BLAS (e.g. MKL), then you do get nonnegative 0.0. This means that the doctests and tests for show methods can break based on the BLAS in a way that's much more annoying to fix than simply using isapprox instead of isequal.

Should we replace

https://github.com/JuliaStats/GLM.jl/blob/3f9c8e44f85d1d997c38a30580ac447ac9bac55a/src/ftest.jl#L162

with

    return FTestResult(Int(nobs(mods[1])), SSR, df, max.(r2.(mods), 0), fstat, pval)

?

This is in some sense an upstream issue (i.e. the definition of StatsBase.r2), but I'm hesitant to apply a constraint at that point because a clearly negative R2 could be a decent test for pathological method definitions of deviance and nulldeviance for a particular type.

nalimilan commented 2 years ago

Maybe we could use some tolerance when clamping to 0? That way we wouldn't hide completely incorrect results at least.

palday commented 2 years ago

So something like

    r2vals = [(-1e-5 < r <= 0.0 ? zero(r) : r) for r in r2.(mods)]
    return FTestResult(Int(nobs(mods[1])), SSR, df, r2vals, fstat, pval)

?

nalimilan commented 2 years ago

Yeah. The threshold is pretty arbitrary but I guess in practice it should be fine.