easystats / performance

:muscle: Models' quality and performance metrics (R2, ICC, LOO, AIC, BF, ...)
https://easystats.github.io/performance/
GNU General Public License v3.0
1.03k stars 94 forks source link

Improving `binned_residuals()` #382

Closed strengejacke closed 1 year ago

strengejacke commented 2 years ago

This isn't the place, but I may have some Opinions about binned_model (I would like a smoothed line added, binomial CIs on the points; I would rather have binomial CIs on the individual points and a confidence interval on the smooth rather than have a dichotomizing region that indicates 'good' and 'bad' points ...). I see that this is copying what arm::binnedplot does though ...

Originally posted by @bbolker in https://github.com/easystats/performance/issues/376#issuecomment-989431839

mattansb commented 2 years ago

The docs for binned_residuals are also lacking - they do not explain what is being checked, and there is a line half-missing: image

strengejacke commented 2 years ago

@bbolker regarding the smooth line: does it make much sense to add such a line? Even for models with a larger number of observations, due to binning, there are still just a few data points.

library(lme4)
#> Loading required package: Matrix
library(HSAUR3)
#> Loading required package: tools
library(performance)

gm2 <- glmer(outcome ~ treatment * visit + (1 | patientID),
             data = toenail,
             family = binomial)

nobs(gm2)
#> [1] 1908

binned_residuals(gm2) |> plot() + see::theme_lucid()

Created on 2022-02-25 by the reprex package (v2.0.1)

strengejacke commented 2 years ago

(it would look like something like this)

library(lme4)
#> Loading required package: Matrix
library(HSAUR3)
#> Loading required package: tools
library(performance)

gm2 <- glmer(outcome ~ treatment * visit + (1 | patientID),
             data = toenail,
             family = binomial)

nobs(gm2)
#> [1] 1908

binned_residuals(gm2) |> 
  plot() + 
  see::theme_lucid() + 
  ggplot2::stat_smooth(ggplot2::aes(y = ybar, colour = group), se = FALSE)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'

binned_residuals(gm2) |> 
  plot() + 
  see::theme_lucid() + 
  ggplot2::stat_smooth(ggplot2::aes(y = ybar), se = FALSE)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Created on 2022-02-25 by the reprex package (v2.0.1)

bwiernik commented 2 years ago

https://gist.github.com/bbolker/945e684dc148239364ab0f99c9a1beed

I realized that the residuals from binomials are not binomial (so constructing binomial CIs isn't useful); however I think that a binned plot using mean_cl_boot() to generate CIs is probably fine. See examples and decide what you think is useful or not ...

bbolker commented 2 years ago

Don't know. I think I like the smoothed line, but don't have super-strong feelings.

bwiernik commented 2 years ago

Maybe tell it to always use gam() with thin-plate splines and pretty tight penalty instead of loess?

strengejacke commented 2 years ago

And we could add an argument that lets users optionally add such a smooth-line.

bwiernik commented 2 years ago

I mean we should have a smooth line I think, but a more penalized one

strengejacke commented 2 years ago

How to to this with stat_smooth()?

strengejacke commented 2 years ago

I just saw that binned_residuals() includes SEs, maybe we can use those for constructing CIs? It would probably good to compare against mean_ci_boot, but I'm not sure how...

library(ggplot2)
library(performance)
model <- glm(vs ~ wt + mpg, data = mtcars, family = "binomial")
result <- binned_residuals(model)
plot(result) + geom_errorbar(aes(ymin = CI_low, ymax = CI_high, colour = group))

aa <- broom::augment(model)

gg1 <- ggplot(aa, aes(.fitted, .resid)) +
  stat_summary_bin(fun.data = mean_cl_boot) +
  geom_hline(yintercept = 0, lty = 2) +
  geom_smooth()

gg1
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
#> Warning: Removed 11 rows containing missing values (geom_segment).

Created on 2022-02-28 by the reprex package (v2.0.1)

strengejacke commented 2 years ago

Another example:

library(ggplot2)
library(performance)
library(HSAUR3)
#> Loading required package: tools

gm2 <- glm(outcome ~ treatment * visit,
             data = toenail,
             family = binomial)

nobs(gm2)
#> [1] 1908

binned_residuals(gm2) |> 
  plot() + 
  stat_smooth(method = "gam", ggplot2::aes(y = ybar), colour = "grey70", se = FALSE) +
  geom_errorbar(aes(ymin = CI_low, ymax = CI_high, colour = group)) +
  see::theme_lucid()
#> `geom_smooth()` using formula 'y ~ s(x, bs = "cs")'

Created on 2022-02-28 by the reprex package (v2.0.1)

bwiernik commented 2 years ago

I think the CI_low and CI_high values might be wrong as currently written:

d <- suppressWarnings(lapply(1:n_bins, function(.x) {
    items <- (1:length(pred))[model.binned == .x]
    model.range <- range(pred[items], na.rm = TRUE)
    xbar <- mean(pred[items], na.rm = TRUE)
    ybar <- mean(y[items], na.rm = TRUE)
    n <- length(items)
    sdev <- stats::sd(y[items], na.rm = TRUE)

    data.frame(
      xbar = xbar,
      ybar = ybar,
      n = n,
      x.lo = model.range[1],
      x.hi = model.range[2],
      se = 2 * sdev / sqrt(n)
    )
  }))

  d <- do.call(rbind, d)
  d <- d[stats::complete.cases(d), ]

  # CIs
  d$CI_low <- d$ybar - stats::qnorm(.975) * d$se
  d$CI_high <- d$ybar + stats::qnorm(.975) * d$se

The computation of se already has 2 * in it, so I think those are 95% confidence margins, not SEs. But maybe I'm misunderstanding

strengejacke commented 2 years ago

Let me look into the Gelman/Hill book, I'm, not sure, but I think I roughly adopted the code from that book, or the arm-package.

strengejacke commented 2 years ago

I think you're right, this looks better to me:

library(ggplot2)
library(performance)
library(HSAUR3)
#> Loading required package: tools
#> Loading required package: tools

gm2 <- glm(outcome ~ treatment * visit,
           data = toenail,
           family = binomial)

binned_residuals(gm2) |> 
  plot() + 
  stat_smooth(method = "gam", aes(y = ybar), colour = "grey70", se = FALSE) +
  geom_errorbar(aes(ymin = CI_low, ymax = CI_high, colour = group)) +
  see::theme_lucid()
#> `geom_smooth()` using formula 'y ~ s(x, bs = "cs")'

Created on 2022-03-01 by the reprex package (v2.0.1)

bwiernik commented 2 years ago

Yeah, Gelman tends to approximate the critical value for 95% intervals as "2", so that's what jumped out to me in the code

strengejacke commented 2 years ago

The range of the CI is then identical to the range of error bound per bin - it's just that due to the point estimate, error bars of the point estimates cross the error bounds. So, the "se" refers to the error bounds, probably not to the point estimates (unless these are the same)

bwiernik commented 2 years ago

They are the same. The SE is the standard error of the point estimate (it's the typical standard error of the mean SD / sqrt(N) ). It's just a question of whether the error bands are drawn around the point estimate or the null hypothesis value.

I think I like it with both the error band around zero and the error bars on the points. The error bars help to interpret the smooth line I think.

What do you think?

strengejacke commented 2 years ago

Yes, I'd also say we should keep the error bars. What about the smooth line, you were suggesting using method = "gam", is this enough?

stat_smooth(method = "gam", aes(y = ybar), colour = "grey70", se = FALSE)
strengejacke commented 2 years ago

btw, when the term argument is used, we already add a smooth-line:

library(ggplot2)
library(performance)
library(HSAUR3)
#> Loading required package: tools

gm2 <- glm(outcome ~ treatment * visit,
           data = toenail,
           family = binomial)

binned_residuals(gm2, term = "visit") |> 
  plot()

Created on 2022-03-01 by the reprex package (v2.0.1)

bwiernik commented 2 years ago

I think that looks good. Should we make the line color the same as in the other plots? (Green I think?)

strengejacke commented 2 years ago

(its already green) should we use "loess" or "gam" (and if gam, some special formula?

strengejacke commented 2 years ago
library(performance)
library(HSAUR3)
#> Loading required package: tools

gm2 <- glm(outcome ~ treatment * visit,
           data = toenail,
           family = binomial)

binned_residuals(gm2) |> plot()

Created on 2022-03-01 by the reprex package (v2.0.1)

bwiernik commented 2 years ago

Looks good to me! Let's add theme_modern() to be line the others in check_model()

strengejacke commented 2 years ago

Yeah, this was the standalone-plot. for check_model, this is the default:

image

Don't forget https://github.com/easystats/performance/issues/376#issuecomment-1051088177 :-)

bwiernik commented 2 years ago

Thanks for the reminder

bwiernik commented 2 years ago

Also going to make a better PP check plot for categorical responses

bbolker commented 1 year ago

I might like to reopen this. The first plot is from check_model(), the second is from plot(binned_residuals()) (not sure why these differ?) -- the latter is incomplete with warnings etc. This is a case where the Gaussian approx for the distribution of binomial residuals is really bad. I know we talked earlier about using bootstrap residuals; a shifted exact binomial CI might work as well ...

It would also be nice to have an option to do the binned-residuals plot on a logit-logit scale, as that would show nonlinear patterns on the link scale more reliably ...

library(performance)

library(ggplot2); theme_set(theme_bw())
data("kyphosis", package = "rpart")
kyphosis <- transform(
    kyphosis,
    Kyphosis = as.numeric(factor(Kyphosis))-1
)
m <- glm(
    Kyphosis~Age+Start+Number,
    data=kyphosis,
    family = binomial(link = "logit")
)
check_model(m)
plot(binned_residuals(m))

Screenshot from 2023-10-22 19-58-47

Screenshot from 2023-10-22 19-59-08

bwiernik commented 1 year ago

The missing points and error bars might be due to them going out of bounds? Could you elaborate on the details of what you're suggesting with logit-logit?

bbolker commented 1 year ago

At least for the x axis, it would be nice to be able to plot on the log-odds rather than the probability scale. Not sure whether this makes sense for the y-axis or not (which shows residuals, rather than predictions — not sure what kinds of residuals these are, or whether it makes sense to think about them on the link or the linear predictor scale; if they're Pearson, then it's probably OK).

strengejacke commented 1 year ago

Dot issue should be fixed:

library(performance)
library(ggplot2)
theme_set(theme_bw())
data("kyphosis", package = "rpart")
kyphosis <- transform(
  kyphosis,
  Kyphosis = as.numeric(factor(Kyphosis)) - 1
)
m <- glm(
  Kyphosis ~ Age + Start + Number,
  data = kyphosis,
  family = binomial(link = "logit")
)
check_model(m)

plot(binned_residuals(m))


check_model(m, show_dots = FALSE)

plot(binned_residuals(m, show_dots = FALSE))


# or
plot(binned_residuals(m), show_dots = FALSE)

You need:

packageVersion("see")
#> [1] '0.8.0.6'
packageVersion("performance")
#> [1] '0.10.6.2'

(should be available via easystats::install_latest(force = TRUE) in 1-2 hours).

strengejacke commented 1 year ago

How would we achieve using a log-odds scale? Binned residuals are based on fitted values, and the "mean" of 0 and 1 for a certain range of the predictors, i.e. everything is on the probability scale by default.

bbolker commented 1 year ago

I agree that putting the y-axis on the log-odds scale isn't practical (and if the machinery is using Pearson residuals — I haven't checked — that's probably fine). Thanks for fixing the other issue. The main other adjustment I think would be worthwhile is something to handle the failure of the Gaussian approximation - either bootstrap or shifted exact CIs (what's going on in this case with those lowest four bins is that there are 0/9 successes).

Looks like + scale_x_continuous(trans = "logit") does approximately what I want with the x-axis.

I've been hacking around on my fork. Bootstrapped residuals didn't make much difference. I'm not sure whether shifted the exact binomial CI makes sense, but it seems to ...

Note that the example given in arm::binnedplot uses deviance residuals (which are more similar to Pearson than to raw residuals), whereas binned_residuals effectively uses raw/response-scale residuals (predicted - fitted) - that may be a large source of the issue ...

Here's what I get with my hacked version (shifted exact CIs and a log-odds scale y axis) (there are a few issues left: (1) I haven't adjusted the envelope; (2) I'm not sure what's up with the last bin where the range doesn't overlap the points, maybe shifting doesn't always work? (3) I get "Warning in eval_tidy(x[[2]], data, env) : restarting interrupted promise evaluation" warnings for reasons I don't understand ...

Screenshot from 2023-10-24 10-04-12

strengejacke commented 1 year ago

I have opened a PR (#641) - I first tried to open a PR based on your fork, but this somehow did not work.

Regarding:

(3) I get "Warning in eval_tidy(x[[2]], data, env) : restarting interrupted promise evaluation" warnings for reasons I don't understand ...

This is due to

stats:::binom.test(sum(y0[items]), n)$conf.int - fv

stats:::binom.test(sum(y0[items]), n)$conf.int returns a vector of 2 (ci low/high), while fv are fitted values of the model, thus exact returns a much larger vector than expected:

stats:::binom.test(sum(y0[items]), n)$conf.int
 num [1:2] 0 0.336
y0[items]
 num [1:9] 0 0 0 0 0 0 0 0 0
fv
 Named num [1:81] 0.257 0.1225 0.493 0.458 0.0299 ...

I think this is also the reason for

(2) I'm not sure what's up with the last bin where the range doesn't overlap the points, maybe shifting doesn't always work?

Maybe you can directly commit to #641?

strengejacke commented 1 year ago

What about this?

library(performance)
data("kyphosis", package = "rpart")
kyphosis <- transform(
  kyphosis,
  Kyphosis = as.numeric(factor(Kyphosis)) - 1
)
m <- glm(
  Kyphosis ~ Age + Start + Number,
  data = kyphosis,
  family = binomial(link = "logit")
)

plot(binned_residuals(m, ci_type = "exact"), show_dots = TRUE)

plot(binned_residuals(m, ci_type = "gaussian"), show_dots = TRUE)

plot(binned_residuals(m, ci_type = "boot"), show_dots = TRUE)


plot(binned_residuals(m, ci_type = "exact", residuals = "response"), show_dots = TRUE)

plot(binned_residuals(m, ci_type = "exact", residuals = "deviance"), show_dots = TRUE)

plot(binned_residuals(m, ci_type = "exact", residuals = "pearson"), show_dots = TRUE)


plot(binned_residuals(m, ci_type = "gaussian", residuals = "response"), show_dots = TRUE)

plot(binned_residuals(m, ci_type = "gaussian", residuals = "deviance"), show_dots = TRUE)

plot(binned_residuals(m, ci_type = "gaussian", residuals = "pearson"), show_dots = TRUE)

Created on 2023-10-24 with reprex v2.0.2

strengejacke commented 1 year ago

@bbolker We should decide on sensible defaults. For now, I used deviance residuals and exact CI.