easystats / modelbased

:chart_with_upwards_trend: Estimate effects, contrasts and means based on statistical models
https://easystats.github.io/modelbased/
GNU General Public License v3.0
234 stars 19 forks source link

Visualization: Johnson-Neyman type plots #128

Closed mattansb closed 3 years ago

mattansb commented 3 years ago

As I understand we're using emmeans as the backend, here's how to do it with emmeans:

library(emmeans)
library(ggplot2)

JN_plot <- function(model, x_name, mod_name, n = 200, pad = 0.25) {
  seq_min_max <- function(x){
    pad <- diff(range(x)) * pad
    seq(min(x) - pad, max(x) + pad, length.out = n)
  }

  trends <- emtrends(model, mod_name , x_name,
                     cov.reduce = setNames(list(seq_min_max), mod_name))

  orig_data <- insight::get_data(model)
  trend_name <- paste0(x_name, ".trend")

  trends |>
    summary(infer = TRUE) |> 
    as.data.frame() |>
    transform(.sig = p.value < .05) |>
    transform(.part.group = {function(x) rep(seq_along(x$lengths), x$length)}(rle(.sig))) |> # trust me this ugliness is needed
    ggplot(aes(.data[[mod_name]], .data[[trend_name]])) + 
    geom_ribbon(aes(ymin = lower.CL, ymax = upper.CL,
                    group = .part.group, fill = .sig),
                alpha = 0.4) + 
    geom_line(size = 1) + 
    geom_hline(yintercept = 0, linetype = "dashed") + 
    geom_line(aes(y = lower.CL, x = range(orig_data[[mod_name]]), group = NULL, size = "Range of obs data"), 
              data = function(x) subset(x, lower.CL == min(lower.CL))[c(1,1),])

}

Linear interaction

mod <- lm(mpg ~ wt * disp, data = mtcars)

JN_plot(mod, "disp", "wt")
#> Warning: Using size for a discrete variable is not advised.

GAMs

eg <- mgcv::gamSim(2, n = 500, scale = .1)
#> Bivariate smoothing example
b5 <- mgcv::gam(y ~ s(x, z, k = 40), data = eg$data)

JN_plot(b5, "z", "x", pad = 1)
#> Warning: Using size for a discrete variable is not advised.

Created on 2021-07-06 by the reprex package (v2.0.0)

DominiqueMakowski commented 3 years ago

Isn't it what's there https://github.com/easystats/modelbased#understand-interactions-between-two-continuous-variables ?

mattansb commented 3 years ago

Oh it is!

Need some vignettes........ (:

mattansb commented 3 years ago

But actually, can add the split into sig/non-sig regions?

bwiernik commented 3 years ago

Let's make the argument control fill either as a value to threshold or a continuous gradient (probably of log(p)).

DominiqueMakowski commented 3 years ago

image

you mean here to map the sig/non-sig aesthetic unto the ribbon's fill rather than the line color? I initially tried that, but ended up with the problem that the same colour parts get connected together, creating junction artifacts.

How could we change the plot?

library(modelbased)

model <- lm(mpg ~ hp * wt, data = mtcars)

slopes <- estimate_slopes(model, trend = "hp", modulate = "wt")

modelbased::visualisation_recipe(slopes)
#> Layer 1
#> --------
#> Geom type: hline
#> data = [10 x 10]
#> yintercept = 0
#> linetype = 'dashed'
#> 
#> Layer 2
#> --------
#> Geom type: ribbon
#> data = [10 x 10]
#> aes_string(
#>   y = 'Coefficient'
#>   x = 'wt'
#>   ymin = 'CI_low'
#>   ymax = 'CI_high'
#> )
#> alpha = 0.3333333
#> 
#> Layer 3
#> --------
#> Geom type: line
#> data = [10 x 10]
#> aes_string(
#>   y = 'Coefficient'
#>   x = 'wt'
#>   color = 'Confidence'
#>   group = 1
#> )
#> 
#> Layer 4
#> --------
#> Geom type: labs
#> y = 'Effect of hp'
#> color = 'Confidence'
#> title = 'Estimated Coefficients (mpg ~ hp * wt)'

Created on 2021-07-07 by the reprex package (v2.0.0)

bwiernik commented 3 years ago

How do we do it with the bayestestR density regions? How does ggdist do it (and the gradient versions)?

mattansb commented 3 years ago

I initially tried that, but ended up with the problem that the same colour parts get connected together, creating junction artifacts.

Ah, that what this super ugly part of code does!

    transform(.part.group = {function(x) rep(seq_along(x$lengths), x$length)}(rle(.sig))) |> # trust me this ugliness is needed
DominiqueMakowski commented 3 years ago

where's this code from?

mattansb commented 3 years ago

Its part of the code I wrote in the first comment on this issue.

Or do you mean how on earth did I figure that out? Because man, that was not easy at all...............

DominiqueMakowski commented 3 years ago

haha yes it seems to work, in that case indeed I can swap the line's color for the ribbon's fill

DominiqueMakowski commented 3 years ago

👍

library(modelbased)

model <- lm(Petal.Length ~ poly(Sepal.Width, 4), data = iris)
x <- estimate_slopes(model, modulate = "Sepal.Width", length = 20)
#> No numeric variable was specified for slope estimation. Selecting `trend = "Sepal.Width"`.
plot(visualisation_recipe(x))

Created on 2021-07-09 by the reprex package (v2.0.0)

bwiernik commented 3 years ago

What are those spaces?

IndrajeetPatil commented 3 years ago

Marginally significant

tenor

DominiqueMakowski commented 3 years ago

Spaces can be minimized by increasing the length

model <- lm(Petal.Length ~ poly(Sepal.Width, 4), data = iris)
plot(modelbased::estimate_slopes(model, modulate = "Sepal.Width", length = 500))
#> No numeric variable was specified for slope estimation. Selecting `trend = "Sepal.Width"`.

Created on 2021-07-09 by the reprex package (v2.0.0)

bwiernik commented 3 years ago

Oh, it's the same dealio as with visualization grid generally. Got it

mattansb commented 3 years ago

Beautiful!

mattansb commented 3 years ago

FYI: It's common to mark the range of observed values in the data (I think in case the region of sig is just on the edge of this range? dunno...)

DominiqueMakowski commented 3 years ago

Yeah in the "interaction" packages that implements the JN plots the range is indeed extended by some amount of both sides, and then the actual range is marked. IMHO it clutters the plot, and it suggests that the results beyond the observed range are fine to be interpreted (and opens the door to statements like "even though in our data the effect of var1 is not significant BUT IF var2 was bigger then it would have been - so we can say it's significant 👍"). We shouldn't extrapolate too much, and I think it's just more readable and clear to show what's happening within the range of the data

bwiernik commented 3 years ago

I agree. I like keeping the plot showing the actual data range