easystats / parameters

:bar_chart: Computation and processing of models' parameters
https://easystats.github.io/parameters/
GNU General Public License v3.0
439 stars 36 forks source link

Vignette about "Nested" interactions #332

Open strengejacke opened 4 years ago

strengejacke commented 4 years ago

We should use "*" consistently for interactions. Currently, we use : to denote nested effects, but it might be confusing (because ":" is used to add specific interactions in formulas)... maybe another symbol would be better like a dash or / (so that it's straightforward).

Mmh, let me try to recap, as I feel like this is a pretty nightmarish stuff, and getting this right would tremendously help users to understand the meaning of the parameters. Warning: it might be hairpulling

all interactions '*' vs. specific interactions ":" vs. nested effects '/'

Simple case 1 - factor and numeric

'*' vs. ':'

library(see)
library(ggplot2)
library(dplyr)

m1 <- lm(Sepal.Length ~ Species * Petal.Length, data = iris) 
m2 <- lm(Sepal.Length ~ Species + Species:Petal.Length, data = iris) 
m3 <- lm(Sepal.Length ~ Petal.Length + Species:Petal.Length, data = iris) 
m4 <- lm(Sepal.Length ~ Petal.Length + Species + Species:Petal.Length, data = iris) 

p1 <- modelbased::estimate_link(m1, preserve_range=FALSE) %>% 
  ggplot(aes(x=Petal.Length, y=Predicted, color=Species)) + 
  geom_line()
p2 <- modelbased::estimate_link(m2, preserve_range=FALSE) %>% 
  ggplot(aes(x=Petal.Length, y=Predicted, color=Species)) + 
  geom_line()
p3 <- modelbased::estimate_link(m3, preserve_range=FALSE) %>% 
  ggplot(aes(x=Petal.Length, y=Predicted, color=Species)) + 
  geom_line()
p4 <- modelbased::estimate_link(m4, preserve_range=FALSE) %>% 
  ggplot(aes(x=Petal.Length, y=Predicted, color=Species)) + 
  geom_line()
see::plots(p1, p2, p3, p4)

Created on 2020-11-10 by the reprex package (v0.3.0)

So here, m1, m2 and m4 are the same models, because they allow a different intercept for each Species + its differentiated modulation by Petal.Length. m3 is different because, while the slope is allowed to be modulated by Species, there is no different intercept allowed for Species (all the lines must origin from 0).

That said, it's all about interactions here (i.e., a * b is a placeholder for a + b + a:b), so in parameters, all of them should be denoted by * (which we use for interactions instead of ':', which I find clearer [and also because R uses ':' for interactions and nested effects and we need to differentiate]), which is not the case currently:

parameters::parameters(m1)
#> Parameter                           | Coefficient |   SE |         95% CI | t(144) |      p
#> -------------------------------------------------------------------------------------------
#> (Intercept)                         |        4.21 | 0.41 | [ 3.41,  5.02] |  10.34 | < .001
#> Species [versicolor]                |       -1.81 | 0.60 | [-2.99, -0.62] |  -3.02 | 0.003 
#> Species [virginica]                 |       -3.15 | 0.63 | [-4.41, -1.90] |  -4.97 | < .001
#> Petal.Length                        |        0.54 | 0.28 | [ 0.00,  1.09] |   1.96 | 0.052 
#> Species [versicolor] * Petal.Length |        0.29 | 0.30 | [-0.30,  0.87] |   0.97 | 0.334 
#> Species [virginica] * Petal.Length  |        0.45 | 0.29 | [-0.12,  1.03] |   1.56 | 0.120
parameters::parameters(m2)
#> Parameter                           | Coefficient |   SE |         95% CI | t(144) |      p
#> -------------------------------------------------------------------------------------------
#> (Intercept)                         |        4.21 | 0.41 | [ 3.41,  5.02] |  10.34 | < .001
#> Species [versicolor]                |       -1.81 | 0.60 | [-2.99, -0.62] |  -3.02 | 0.003 
#> Species [virginica]                 |       -3.15 | 0.63 | [-4.41, -1.90] |  -4.97 | < .001
#> Species [setosa] : Petal.Length     |        0.54 | 0.28 | [ 0.00,  1.09] |   1.96 | 0.052 
#> Species [versicolor] : Petal.Length |        0.83 | 0.10 | [ 0.63,  1.03] |   8.10 | < .001
#> Species [virginica] : Petal.Length  |        1.00 | 0.09 | [ 0.82,  1.17] |  11.43 | < .001
parameters::parameters(m3)
#> Parameter                        | Coefficient |   SE |         95% CI | t(146) |      p
#> ----------------------------------------------------------------------------------------
#> (Intercept)                      |        2.74 | 0.27 | [ 2.20,  3.28] |  10.00 | < .001
#> Petal.Length                     |        1.54 | 0.19 | [ 1.16,  1.91] |   8.16 | < .001
#> Petal.Length : Speciesversicolor |       -0.78 | 0.13 | [-1.03, -0.53] |  -6.19 | < .001
#> Petal.Length : Speciesvirginica  |       -0.84 | 0.14 | [-1.12, -0.56] |  -5.97 | < .001
parameters::parameters(m4)
#> Parameter                           | Coefficient |   SE |         95% CI | t(144) |      p
#> -------------------------------------------------------------------------------------------
#> (Intercept)                         |        4.21 | 0.41 | [ 3.41,  5.02] |  10.34 | < .001
#> Petal.Length                        |        0.54 | 0.28 | [ 0.00,  1.09] |   1.96 | 0.052 
#> Species [versicolor]                |       -1.81 | 0.60 | [-2.99, -0.62] |  -3.02 | 0.003 
#> Species [virginica]                 |       -3.15 | 0.63 | [-4.41, -1.90] |  -4.97 | < .001
#> Petal.Length * Species [versicolor] |        0.29 | 0.30 | [-0.30,  0.87] |   0.97 | 0.334 
#> Petal.Length * Species [virginica]  |        0.45 | 0.29 | [-0.12,  1.03] |   1.56 | 0.120

Created on 2020-11-10 by the reprex package (v0.3.0)

'*' vs. '/'

library(see)
library(ggplot2)
library(dplyr)

m1 <- lm(Sepal.Length ~ Species * Petal.Length, data = iris)
m2 <- lm(Sepal.Length ~ Species / Petal.Length, data = iris)

p1 <- modelbased::estimate_link(m1) %>%
  ggplot(aes(x=Petal.Length, y=Predicted, color=Species)) +
  geom_line()
p2 <- modelbased::estimate_link(m2) %>%
  ggplot(aes(x=Petal.Length, y=Predicted, color=Species)) +
  geom_line()
see::plots(p1, p2)


parameters::parameters(m1)
#> Parameter                           | Coefficient |   SE |         95% CI | t(144) |      p
#> -------------------------------------------------------------------------------------------
#> (Intercept)                         |        4.21 | 0.41 | [ 3.41,  5.02] |  10.34 | < .001
#> Species [versicolor]                |       -1.81 | 0.60 | [-2.99, -0.62] |  -3.02 | 0.003 
#> Species [virginica]                 |       -3.15 | 0.63 | [-4.41, -1.90] |  -4.97 | < .001
#> Petal.Length                        |        0.54 | 0.28 | [ 0.00,  1.09] |   1.96 | 0.052 
#> Species [versicolor] * Petal.Length |        0.29 | 0.30 | [-0.30,  0.87] |   0.97 | 0.334 
#> Species [virginica] * Petal.Length  |        0.45 | 0.29 | [-0.12,  1.03] |   1.56 | 0.120
parameters::parameters(m2)
#> Parameter                           | Coefficient |   SE |         95% CI | t(144) |      p
#> -------------------------------------------------------------------------------------------
#> (Intercept)                         |        4.21 | 0.41 | [ 3.41,  5.02] |  10.34 | < .001
#> Species [versicolor]                |       -1.81 | 0.60 | [-2.99, -0.62] |  -3.02 | 0.003 
#> Species [virginica]                 |       -3.15 | 0.63 | [-4.41, -1.90] |  -4.97 | < .001
#> Species [setosa] : Petal.Length     |        0.54 | 0.28 | [ 0.00,  1.09] |   1.96 | 0.052 
#> Species [versicolor] : Petal.Length |        0.83 | 0.10 | [ 0.63,  1.03] |   8.10 | < .001
#> Species [virginica] : Petal.Length  |        1.00 | 0.09 | [ 0.82,  1.17] |  11.43 | < .001

Created on 2020-11-10 by the reprex package (v0.3.0)

Here, while it is the same model (in the case of a variable nested in a factor (spoiler, it gets weird when it's in a numeric)), it's the same model, but the parameters represent different things. In the case of the nested model, the effects are pretty much regular effects (i.e., the coefficient of the slope) estimated "within" each level. So it's conceptually different than interactions (which evaluates the change in another effect). So currently we denote nested by : but it might be confusing, so maybe we should replace by / or | to show that it's "the effect of x within the factor level.

Now I'm trying to wrap my head around nested effects for continuous... and I didn't even want to look at more than 2 variables ^^

Originally posted by @DominiqueMakowski in https://github.com/easystats/parameters/issues/330#issuecomment-724442409

strengejacke commented 4 years ago

See #330 and #155