easystats / performance

:muscle: Models' quality and performance metrics (R2, ICC, LOO, AIC, BF, ...)
https://easystats.github.io/performance/
GNU General Public License v3.0
1.02k stars 89 forks source link

r2_nagelkerke.glm #300

Closed SingerAl closed 3 years ago

SingerAl commented 3 years ago

Dear performance package authors, thank you for developping the performance package. Testing function 'r2_nagelkerke' I had an issue with the outcome. The critical code line is: r2_nagelkerke <- r2cox/(1 - exp(-model$null.deviance/insight::n_obs(model))) According to https://de.wikipedia.org/wiki/Pseudo-Bestimmtheitsma%C3%9F the part (1 - exp(-model$null.deviance/insight::n_obs(model))) is wrong. The code should be something like ll_null <- insight::loglikelihood(insight::null_model(model)) r2_nagelkerke <- r2cox/-expm1(2/insight::n_obs(model) * ll_null) With best regards, Alex

strengejacke commented 3 years ago

hm, I can't see any discrepancies between the R2 values from these 4 different packages:

set.seed(101)
n <- 200
x <- rnorm(n)
a <- 1
b <- -2
p <- exp(a + b * x) / (1 + exp(a + b * x))
y <- factor(ifelse(runif(n) < p, 1, 0), levels = 0:1)
d <- data.frame(x, y)
mod1 <- glm(y ~ x, data = d, family = binomial())

performance::r2_nagelkerke(mod1)
#> Nagelkerke's R2 
#>       0.5852906

fmsb::NagelkerkeR2(mod1)
#> $N
#> [1] 200
#> 
#> $R2
#> [1] 0.5852906

blorr::blr_rsq_nagelkerke(mod1)
#> [1] 0.5852906

mod1b <- rms::lrm(y ~ x, data = d)
mod1b
#> Logistic Regression Model
#>  
#>  rms::lrm(formula = y ~ x, data = d)
#>  
#>                         Model Likelihood    Discrimination    Rank Discrim.    
#>                               Ratio Test           Indexes          Indexes    
#>  Obs           200    LR chi2     112.22    R2       0.585    C       0.902    
#>   0             75    d.f.             1    g        2.894    Dxy     0.805    
#>   1            125    Pr(> chi2) <0.0001    gr      18.058    gamma   0.805    
#>  max |deriv| 3e-11                          gp       0.377    tau-a   0.379    
#>                                             Brier    0.119                     
#>  
#>            Coef    S.E.   Wald Z Pr(>|Z|)
#>  Intercept  0.8893 0.2189  4.06  <0.0001 
#>  x         -2.6368 0.3778 -6.98  <0.0001 
#> 

Created on 2021-05-19 by the reprex package (v2.0.0)

strengejacke commented 3 years ago

Also, your code returns the same result. I think it's just a different way of writing the formula for Nagelkerke's R2.

set.seed(101)
n <- 200
x <- rnorm(n)
a <- 1
b <- -2
p <- exp(a + b * x) / (1 + exp(a + b * x))
y <- factor(ifelse(runif(n) < p, 1, 0), levels = 0:1)
d <- data.frame(x, y)
mod1 <- glm(y ~ x, data = d, family = binomial())
library(performance)

unname(r2_coxsnell(mod1) / (1 - exp(-mod1$null.deviance/insight::n_obs(mod1))))
#> [1] 0.5852906

ll_null <- insight::loglikelihood(insight::null_model(mod1))
unname(r2_coxsnell(mod1) / -expm1(2 / insight::n_obs(mod1) * ll_null))
#> 'log Lik.' 0.5852906 (df=1)

Created on 2021-05-19 by the reprex package (v2.0.0)

Please re-open if I'm mistaken here.

SingerAl commented 3 years ago

Dear Daniel,

thank you for checking and sorry that I had not sent my test case. I worked on a different type of model: ratios of success/failure.

My test code is: set.seed(101) library(performance) ratios <- cbind( ok = c(3, 6, 4, 8, 2), faulty = c(5, 10, 5, 2, 6) ) x <- rnorm(5) mod1 <- glm(ratios ~ x, family = "binomial")

r2_coxsnell(mod1) r2_nagelkerke(mod1) rcompanion::nagelkerke(mod1)$Pseudo.R.squared.for.model.vs.null[c("Cox and Snell (ML)", "Nagelkerke (Cragg and Uhler)"),]

ll_null <- insight::loglikelihood(insight::null_model(mod1)) as.numeric(r2_coxsnell(mod1) / -expm1(2 / insight::n_obs(mod1) * ll_null))

My console output for this test case:

set.seed(101) library(performance) ratios <- cbind(

  • ok = c(3, 6, 4, 8, 2),
  • faulty = c(5, 10, 5, 2, 6)
  • ) x <- rnorm(5) mod1 <- glm(ratios ~ x, family = "binomial")

r2_coxsnell(mod1) Cox & Snell's R2 0.002606127 r2_nagelkerke(mod1) Nagelkerke's R2 0.00344032 rcompanion::nagelkerke(mod1)$Pseudo.R.squared.for.model.vs.null[c("Cox and Snell (ML)", "Nagelkerke (Cragg and Uhler)"),] Cox and Snell (ML) Nagelkerke (Cragg and Uhler) 0.00260613 0.00265250

ll_null <- insight::loglikelihood(insight::null_model(mod1)) as.numeric(r2_coxsnell(mod1) / -expm1(2 / insight::n_obs(mod1) * ll_null)) [1] 0.002652501

I still thin k there is a problem in the code. The deviance of the null model is D = 2 (ll_saturated_model - ll_null_model). If the loglikelihood of the saturated model is 0 (perfect fit) , the null model deviance D = -2 ll_null_model. In this case, r2_nagelkerke gives the same result as the other approaches. But, if ll_saturated_modell < 0, results are different.

With best regards, Alex

Von: Daniel @.> Gesendet: Mittwoch, 19. Mai 2021 17:16 An: easystats/performance @.> Cc: Alexander Singer @.>; Author @.> Betreff: Re: [easystats/performance] r2_nagelkerke.glm (#300)

Also, your code returns the same result. I think it's just a different way of writing the formula for Nagelkerke's R2.

set.seed(101)

n <- 200

x <- rnorm(n)

a <- 1

b <- -2

p <- exp(a + b x) / (1 + exp(a + b x))

y <- factor(ifelse(runif(n) < p, 1, 0), levels = 0:1)

d <- data.frame(x, y)

mod1 <- glm(y ~ x, data = d, family = binomial())

library(performance)

unname(r2_coxsnell(mod1) / (1 - exp(-mod1$null.deviance/insight::n_obs(mod1))))

> [1] 0.5852906

ll_null <- insight::loglikelihood(insight::null_model(mod1))

unname(r2_coxsnell(mod1) / -expm1(2 / insight::n_obs(mod1) * ll_null))

> 'log Lik.' 0.5852906 (df=1)

Created on 2021-05-19 by the reprex packagehttps://reprex.tidyverse.org (v2.0.0)

Please re-open if I'm mistaken here.

- You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/easystats/performance/issues/300#issuecomment-844203902, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AK2YVHKGQTMKSOVOKJOZQUTTOPI2NANCNFSM45D7LATQ.

[cid:rifcon_logo04.jpg] Dr. Alexander Singer | Effect Modelling and Statistics RIFCON GmbH | Goldbeckstra?e 13 | 69493 Hirschberg T. +49 6201 84528-62 | Fax: +49 (0)6201 8452899 @.*** | MEET US! https://rifcon.de/event-directory/ | www.rifcon.de http://www.rifcon.de/

RIFCON GmbH Goldbeckstra?e 13 - D-69493 Hirschberg Amtsgericht Mannheim | HRB 433053 | Ust.IdNr. DE 814188954 Gesch?ftsf?hrer / Managing Directors: Dr. Michael Riffel, Juergen Riffel, Ute Terberger

Please think twice before you print this email !

DISCLAIMER: This e-mail transmission may contain confidential or legally privileged information that is intended only for the individual or entity named in the e-mail address. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or reliance upon the contents of this e-mail is strictly prohibited. If you have received this e-mail transmission in error, please reply to the sender, so that we can arrange for proper delivery, and then please delete the message from your system. The original of this e-mail was scanned for viruses, but you should always use your own virus-scanning software to ensure mail and attachments are safe to open. This e-mail does not constitute a consent to the use of sender's contact information for direct marketing purposes or for transfers of data to third parties.

strengejacke commented 3 years ago

Ah, ok. If not bernoulli, R2 is a bit more complicated to calculate, because the loglikelihood isn't correct. However, there is a suggestion how to get the correct loglikelihood (and hence, R2), which is discussed here: https://github.com/easystats/performance/issues/165

Either way, #165 would address your issue, so we can keep this closed as duplicate.