Allow draw() to plot on the response scale

gavinsimpson / gratia

ggplot-based graphics and useful functions for GAMs fitted using the mgcv package

https://gavinsimpson.github.io/gratia/

Other

206 stars 28 forks source link

Allow draw() to plot on the response scale #79

Open gavinsimpson opened 4 years ago

gavinsimpson commented 4 years ago

Implement the equivalent of the trans and shift arguments that mgcv:::plot.gam() has so that users can pass in a function (e.g. the inverse link function) and a constant (e.g. the intercept), to get plots on the response scale.

Should these be arguments to the evaluate_smooth() methods or something draw() does internally?

~~Either way, the confidence bands aren't going to include the uncertainty in the constant, so shift is purely to move the origin (y-axis) around.~~ This is done in draw() and the various methods.

Want to have a simpler way for the user to express that they want the plots on the response scale. So need a new argument (response_scale?) that if set to TRUE, automatically set constant and fun to the correct values/functions such that we get values and intervals on the response scale. This will override anything passed to constant and fun obviously.

Should we include the overall uncertainty as we're bringing in the model constant here?

What about factor by smooths? These should really have the constant plus the parametric term for their level added as a constant.

Checklist:

[x] add code to handle this
[x] write tests
[ ] #253
[ ] #254

barbmuhling commented 1 year ago

Hello! Are there any plans to incorporate this option into smooth_estimates() any time soon? That would be super useful...

gavinsimpson commented 1 year ago

@barbmuhling the draw() method for smooth_estimates() has this (at least in the development version here on GH). If you mean to modify the values returned by smooth_estimates(), I think I won't be adding support for that as you can already do what you want using transform_fun() and a bit of dplyr fu:

library("mgcv")
library("gratia")
library("dplyr")

dat <- data_sim("eg1", n = 400, dist = "normal", scale = 2, seed = 2)
m1 <- gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data = dat, method = "REML")

my_const <- coef(m1)[1]

smooth_estimates(m1) |>
  add_confint() |>
  mutate(across(tidyselect::all_of(c("est", "lower_ci", "upper_ci")),
                .fns = \(x) x + my_const)) |>
  transform_fun(fun = \(x) x^2)

to apply any function you want to the estimate and credible interval. (This is what is used internally by the draw() methods.)

I could add something to transform_fun() to allow you to pass the constant in as well, to avoid the mutate() step?

barbmuhling commented 1 year ago

Ah, this will work perfectly, thanks so much!

tkiff commented 11 months ago

Hello Gavin, I came across this thread looking for help plotting by factors on top of each other. Apologies for the clunky code, but would this be the appropriate method?

Please let me know if this should go on stack overflow instread!

library("mgcv")
library("gratia")
library("dplyr")

dat <- data_sim("eg4", seed = 42)
m1 <- gam(y ~  fac + s(x2, by = fac), data = dat, method = "REML")

my_const <- coef(m1)[1]
my_constFac2 <- coef(m1)[2]
my_constFac3 <- coef(m1)[3]

parametric_effects(m1)

s2fac1 = smooth_estimates(m1) |>
  filter(smooth == "s(x2):fac1") |>
  add_confint() |>
  mutate(across(tidyselect::all_of(c("est", "lower_ci", "upper_ci")),
                .fns = \(x) x + my_const))

s2fac2 = smooth_estimates(m1) |>
  filter(smooth == "s(x2):fac2") |>
  add_confint() |>
  mutate(across(tidyselect::all_of(c("est", "lower_ci", "upper_ci")),
                .fns = \(x) x + my_const + my_constFac2))

s2fac3 =smooth_estimates(m1) |>
  filter(smooth == "s(x2):fac3") |>
  add_confint() |>
  mutate(across(tidyselect::all_of(c("est", "lower_ci", "upper_ci")),
                .fns = \(x) x + my_const + my_constFac3))

ggplot() +
  # Fac 1
  geom_line(data = s2fac1, aes(x = x2, est), color = "red") +
  geom_ribbon(data = s2fac1, aes(ymin = lower_ci, ymax = upper_ci, x = x2),
              alpha = 0.3, fill = "red") +
  # Fac 2
  geom_line(data = s2fac2, aes(x = x2, est), color = "black") +
  geom_ribbon(data = s2fac2, aes(ymin = lower_ci, ymax = upper_ci, x = x2),
              alpha = 0.3, fill = "black") +
  # Fac 3
  geom_line(data = s2fac3, aes(x = x2, est), color = "blue") +
  geom_ribbon(data = s2fac3, aes(ymin = lower_ci, ymax = upper_ci, x = x2),
              alpha = 0.3, fill = "blue")

gavinsimpson commented 11 months ago

@tkiff That works, but only for simple examples like this with a single factor.

I think you're better off using data_slice() to create the data for covariates that you want to evaluate the model at, including the values of other covariates in the model, and then use fitted_values() to get the data you need to plot:

library("mgcv")
library("gratia")
library("dplyr")
library("ggplot2")

dat <- data_sim("eg4", seed = 42)
m1 <- gam(y ~  fac + s(x2, by = fac) + s(x0) + s(x1), data = dat, method = "REML")

ds <- data_slice(m1, x2 = evenly(x2), fac = evenly(fac))
fv <- fitted_values(m1, data = ds)

fv |>
  ggplot(aes(x = x2, y = .fitted, group = fac)) +
  geom_ribbon(aes(ymin = .lower_ci, ymax = .upper_ci, fill = fac), alpha = 0.2) +
  geom_line(aes(colour = fac))

gratia-issue-79-plt1

Notice that data_slice() provides values for any covariates not specified but which are in the model. This forces you to be explicit about what you are conditioning on.

In this case, you can exclude the effects of the other smooths when generating fitted values

fv2 <- fitted_values(m1, data = ds, exclude = c("s(x0)", "s(x1)"))

fv2 |>
  ggplot(aes(x = x2, y = .fitted, group = fac)) +
  geom_ribbon(aes(ymin = .lower_ci, ymax = .upper_ci, fill = fac), alpha = 0.2) +
  geom_line(aes(colour = fac))

gratia-issue-79-plt2

There's no difference here because of how data_slice() finds typical values for the unmentioned coavriates.

gavinsimpson commented 11 months ago

@tkiff

Please let me know if this should go on stack overflow instread!

If you have questions about gratia you could ask them on SO but better might be to use the Discussion here on GitHub: https://github.com/gavinsimpson/gratia/discussions

gavinsimpson commented 8 months ago

This is essentially complete except for the two stretch goals