JuliaStats / GLM.jl

Generalized linear models in Julia

Other

584 stars 114 forks source link

Do not mention pseudo-R² in manual #477

Closed nalimilan closed 2 years ago

nalimilan commented 2 years ago

r2 does not work for GLMs currently (even with two arguments) as they don't implement nullloglikelihood. Also add a mention about adjr2.

Fixes https://github.com/JuliaStats/GLM.jl/issues/475.

If somebody has an idea regarding how nullloglikelihood could be implemented... Maybe all we need to do is define for each link what the prediction would be for all observations under the null model, and call loglik_obs on that like we do for loglikelihood?

codecov-commenter commented 2 years ago

Codecov Report

Merging #477 (c34c21a) into master (42a0d04) will not change coverage. The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #477   +/-   ##
=======================================
  Coverage   85.12%   85.12%           
=======================================
  Files           7        7           
  Lines         827      827           
=======================================
  Hits          704      704           
  Misses        123      123

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 42a0d04...c34c21a. Read the comment docs.

palday commented 2 years ago

@nalimilan The null model is "just" y ~ 1, so it would be trivial to create a null model using the response of the original and a column of 1's as X.

Or am I missing something ?

nalimilan commented 2 years ago

Yeah, I just wonder whether we should compute this more efficiently. After looking into it, I have a commit here that computes the null deviance and log-likelihood correctly simply by defining the null model as taking the predicted response to be the mean response. This works for all models we currently test, except those with offsets. Do you think there exist a direct formula for this case too?

palday commented 2 years ago

@nalimilan did you push the commit? For Poisson at least I think there might be a few easy/fast cases even with offset. I'll have to work through the algebra later to be sure there.

nalimilan commented 2 years ago

I've just filed https://github.com/JuliaStats/GLM.jl/pull/479. Let me know if you can find the formulas. Though if some cases have no closed-form solution we'll have to rely on fitting the null model at least for them...

FWIW, R's ?glm has this warning about offsets:

null.deviance: The deviance for the null model, comparable with ‘deviance’. The null model will include the offset, and an intercept if there is one in the model. Note that this will be incorrect if the link function depends on the data other than through the fitted mean: specify a zero offset to force a correct calculation.

ParadaCarleton commented 2 years ago

Does r^2 have to return a pseudo-r2? I ask because there are many definitions of pseudo-r2, so it can be confusing. This could also cause mistakes for users expecting the function to return only the actual r^2. (I would assume this myself.)

nalimilan commented 2 years ago

r2 only returns the pseudo-R² if you pass a second argument specifying which variant you want.

nalimilan commented 2 years ago

Superseded by https://github.com/JuliaStats/GLM.jl/pull/479.