gavinsimpson / gratia

ggplot-based graphics and useful functions for GAMs fitted using the mgcv package
https://gavinsimpson.github.io/gratia/
Other
206 stars 28 forks source link

Show confidence interval for bivariate smooths #252

Closed gavinsimpson closed 8 months ago

gavinsimpson commented 8 months ago

This was posted by @psychobas as part of #1, but it is a separate issue:


Hi, thanks for the great package! Is it possible to show significance in draws of 2D smooths (e.g., linetype differs depending on whether the estimates are significant or not)?

Here is an example of what I mean based on the example data you provided above:

library("mgcv")
library("gratia")
library("dplyr")
library("ggplot2")
set.seed(2)
dat <- data_sim("eg2", n = 4000, dist = "normal", scale = 1, seed = 2)
m2 <- gam(y ~ s(x, z, k = 30), data = dat, method = "REML")

test1 <- confint(m2, parm = "s(x,z)")

test1 |>
  mutate(sig = (.lower_ci < 0 & .upper_ci < 0) | (.lower_ci > 0 & .upper_ci > 0)) |>
  ggplot(aes(x = x, y = z, fill = .estimate)) + 
  geom_raster(aes(alpha = sig)) + 
  geom_contour(aes(z = .estimate, linetype = sig)) +
  scale_alpha_manual(values = c("TRUE" = 1, "FALSE" = 0.7)) +
  scale_fill_distiller(palette = "RdBu", type = "div")

image

This looks identical to draw(m2, rug = FALSE) but with the linetype varying by whether the estimates are significant or not.

image

I could use this workaround but it would be great if this would be possible in gratia to not loose the other features provided by gratia::draw.

gavinsimpson commented 8 months ago

@psychobas

would this suffice:

sm <- smooth_estimates(m2) |>
  add_confint()

sm |>
  mutate(sig = (.lower_ci < 0 & .upper_ci < 0) | (.lower_ci > 0 & .upper_ci > 0)) |>
  draw(contour = FALSE) +
  geom_contour(colour = "black", aes(x = x, y = z, z = .estimate, linetype = sig))

bivariate-smooths-confint-#252

Note that I've updated this to be using recent gratia tools and also to match the changes I have made in the GitHub version to create objects with names that are prefixed with a period ".". Also, I have focused only on the contour line aspect; the alpha bit is confusing and I don't think it's necessary as with the correct diverging palette centred on 0, the alpha bit just works on the colours that are close to white.

I'm not convinced this is a good general way to show this kind of information so don't think I'll be adding it as an option to gratia::draw(), either the "gam" or "smooth_estimates" methods, as it is pretty easy to add oneself.

One reason I think the contours are confusing is because it looks like there are more than two line types on the plot (some parts look like the "dashed" type, where the contour is being drawn effectively on the border of significance.

psychobas commented 8 months ago

Thanks a lot @gavinsimpson, exactly what I was looking for!

I'm not convinced this is a good general way to show this kind of information

Any ideas on how to show this in a better way? For context, I want to want to show the estimated smooths from a model that contains both univariate and bivariate smooths with gratia::draw(). I want to use this plot in a paper and I'm sure reviewers would complain if the plot wouldn't show significance for the 2D smooths.

gavinsimpson commented 8 months ago

Part of the problem is that you are doing a large set of multiple comparisons and while the confidence intervals are actually Bayesian credible intervals if you use method = "REML" or "ML", and have an across-the-function frequentist interpretation, you're still going to run into multiple comparisons issues. A way round this is to use a simultaneous interval, which is available in confint() methods, but not yet for add_confint().

You're also, I feel, focusing on the wrong thing; you are not showing the significance or otherwise of the function (which is in the output shown by summary()) but are showing where on the smooth the effect is large enough to reject a point-wise null hypothesis.

If you meant "show uncertainty for the 2d smooths", I think you can do that by plotting the standard error of the estimated smooth.

I would also say that the focus in applied science work shouldn't be on p values. Instead focus on effect sizes; is the magnitude of change from the average (the white bits) relevant within the domain you are working? It shouldn't matter then as much what the uncertainty is. Also, uncertainty for smooths isn't what most people think of: I would show some draws from the posterior distribution of the smooth (using smooth_samples(m2, seed = 42) |> draw()) to visualise the uncertainty in the smooth.