Closed SingerAl closed 3 years ago
hm, I can't see any discrepancies between the R2 values from these 4 different packages:
set.seed(101)
n <- 200
x <- rnorm(n)
a <- 1
b <- -2
p <- exp(a + b * x) / (1 + exp(a + b * x))
y <- factor(ifelse(runif(n) < p, 1, 0), levels = 0:1)
d <- data.frame(x, y)
mod1 <- glm(y ~ x, data = d, family = binomial())
performance::r2_nagelkerke(mod1)
#> Nagelkerke's R2
#> 0.5852906
fmsb::NagelkerkeR2(mod1)
#> $N
#> [1] 200
#>
#> $R2
#> [1] 0.5852906
blorr::blr_rsq_nagelkerke(mod1)
#> [1] 0.5852906
mod1b <- rms::lrm(y ~ x, data = d)
mod1b
#> Logistic Regression Model
#>
#> rms::lrm(formula = y ~ x, data = d)
#>
#> Model Likelihood Discrimination Rank Discrim.
#> Ratio Test Indexes Indexes
#> Obs 200 LR chi2 112.22 R2 0.585 C 0.902
#> 0 75 d.f. 1 g 2.894 Dxy 0.805
#> 1 125 Pr(> chi2) <0.0001 gr 18.058 gamma 0.805
#> max |deriv| 3e-11 gp 0.377 tau-a 0.379
#> Brier 0.119
#>
#> Coef S.E. Wald Z Pr(>|Z|)
#> Intercept 0.8893 0.2189 4.06 <0.0001
#> x -2.6368 0.3778 -6.98 <0.0001
#>
Created on 2021-05-19 by the reprex package (v2.0.0)
Also, your code returns the same result. I think it's just a different way of writing the formula for Nagelkerke's R2.
set.seed(101)
n <- 200
x <- rnorm(n)
a <- 1
b <- -2
p <- exp(a + b * x) / (1 + exp(a + b * x))
y <- factor(ifelse(runif(n) < p, 1, 0), levels = 0:1)
d <- data.frame(x, y)
mod1 <- glm(y ~ x, data = d, family = binomial())
library(performance)
unname(r2_coxsnell(mod1) / (1 - exp(-mod1$null.deviance/insight::n_obs(mod1))))
#> [1] 0.5852906
ll_null <- insight::loglikelihood(insight::null_model(mod1))
unname(r2_coxsnell(mod1) / -expm1(2 / insight::n_obs(mod1) * ll_null))
#> 'log Lik.' 0.5852906 (df=1)
Created on 2021-05-19 by the reprex package (v2.0.0)
Please re-open if I'm mistaken here.
Dear Daniel,
thank you for checking and sorry that I had not sent my test case. I worked on a different type of model: ratios of success/failure.
My test code is: set.seed(101) library(performance) ratios <- cbind( ok = c(3, 6, 4, 8, 2), faulty = c(5, 10, 5, 2, 6) ) x <- rnorm(5) mod1 <- glm(ratios ~ x, family = "binomial")
r2_coxsnell(mod1) r2_nagelkerke(mod1) rcompanion::nagelkerke(mod1)$Pseudo.R.squared.for.model.vs.null[c("Cox and Snell (ML)", "Nagelkerke (Cragg and Uhler)"),]
ll_null <- insight::loglikelihood(insight::null_model(mod1)) as.numeric(r2_coxsnell(mod1) / -expm1(2 / insight::n_obs(mod1) * ll_null))
My console output for this test case:
set.seed(101) library(performance) ratios <- cbind(
- ok = c(3, 6, 4, 8, 2),
- faulty = c(5, 10, 5, 2, 6)
- ) x <- rnorm(5) mod1 <- glm(ratios ~ x, family = "binomial")
r2_coxsnell(mod1) Cox & Snell's R2 0.002606127 r2_nagelkerke(mod1) Nagelkerke's R2 0.00344032 rcompanion::nagelkerke(mod1)$Pseudo.R.squared.for.model.vs.null[c("Cox and Snell (ML)", "Nagelkerke (Cragg and Uhler)"),] Cox and Snell (ML) Nagelkerke (Cragg and Uhler) 0.00260613 0.00265250
ll_null <- insight::loglikelihood(insight::null_model(mod1)) as.numeric(r2_coxsnell(mod1) / -expm1(2 / insight::n_obs(mod1) * ll_null)) [1] 0.002652501
I still thin k there is a problem in the code. The deviance of the null model is D = 2 (ll_saturated_model - ll_null_model). If the loglikelihood of the saturated model is 0 (perfect fit) , the null model deviance D = -2 ll_null_model. In this case, r2_nagelkerke gives the same result as the other approaches. But, if ll_saturated_modell < 0, results are different.
With best regards, Alex
Von: Daniel @.> Gesendet: Mittwoch, 19. Mai 2021 17:16 An: easystats/performance @.> Cc: Alexander Singer @.>; Author @.> Betreff: Re: [easystats/performance] r2_nagelkerke.glm (#300)
Also, your code returns the same result. I think it's just a different way of writing the formula for Nagelkerke's R2.
set.seed(101)
n <- 200
x <- rnorm(n)
a <- 1
b <- -2
p <- exp(a + b x) / (1 + exp(a + b x))
y <- factor(ifelse(runif(n) < p, 1, 0), levels = 0:1)
d <- data.frame(x, y)
mod1 <- glm(y ~ x, data = d, family = binomial())
library(performance)
unname(r2_coxsnell(mod1) / (1 - exp(-mod1$null.deviance/insight::n_obs(mod1))))
ll_null <- insight::loglikelihood(insight::null_model(mod1))
unname(r2_coxsnell(mod1) / -expm1(2 / insight::n_obs(mod1) * ll_null))
Created on 2021-05-19 by the reprex packagehttps://reprex.tidyverse.org (v2.0.0)
Please re-open if I'm mistaken here.
- You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/easystats/performance/issues/300#issuecomment-844203902, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AK2YVHKGQTMKSOVOKJOZQUTTOPI2NANCNFSM45D7LATQ.
[cid:rifcon_logo04.jpg] Dr. Alexander Singer | Effect Modelling and Statistics RIFCON GmbH | Goldbeckstra?e 13 | 69493 Hirschberg T. +49 6201 84528-62 | Fax: +49 (0)6201 8452899 @.*** | MEET US! https://rifcon.de/event-directory/ | www.rifcon.de http://www.rifcon.de/
RIFCON GmbH Goldbeckstra?e 13 - D-69493 Hirschberg Amtsgericht Mannheim | HRB 433053 | Ust.IdNr. DE 814188954 Gesch?ftsf?hrer / Managing Directors: Dr. Michael Riffel, Juergen Riffel, Ute Terberger
Please think twice before you print this email !
DISCLAIMER: This e-mail transmission may contain confidential or legally privileged information that is intended only for the individual or entity named in the e-mail address. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or reliance upon the contents of this e-mail is strictly prohibited. If you have received this e-mail transmission in error, please reply to the sender, so that we can arrange for proper delivery, and then please delete the message from your system. The original of this e-mail was scanned for viruses, but you should always use your own virus-scanning software to ensure mail and attachments are safe to open. This e-mail does not constitute a consent to the use of sender's contact information for direct marketing purposes or for transfers of data to third parties.
Ah, ok. If not bernoulli, R2 is a bit more complicated to calculate, because the loglikelihood isn't correct. However, there is a suggestion how to get the correct loglikelihood (and hence, R2), which is discussed here: https://github.com/easystats/performance/issues/165
Either way, #165 would address your issue, so we can keep this closed as duplicate.
Dear performance package authors, thank you for developping the performance package. Testing function 'r2_nagelkerke' I had an issue with the outcome. The critical code line is: r2_nagelkerke <- r2cox/(1 - exp(-model$null.deviance/insight::n_obs(model))) According to https://de.wikipedia.org/wiki/Pseudo-Bestimmtheitsma%C3%9F the part (1 - exp(-model$null.deviance/insight::n_obs(model))) is wrong. The code should be something like ll_null <- insight::loglikelihood(insight::null_model(model)) r2_nagelkerke <- r2cox/-expm1(2/insight::n_obs(model) * ll_null) With best regards, Alex