easystats / modelbased

:chart_with_upwards_trend: Estimate effects, contrasts and means based on statistical models
https://easystats.github.io/modelbased/
GNU General Public License v3.0
234 stars 19 forks source link

Strange output of `estimate_expectation()`? #198

Closed strengejacke closed 2 years ago

strengejacke commented 2 years ago

See example. Sepal.Width is predicted, and appears in the output column. The footer says Predictors controlled: Sepal.Length, but misses Petal.Width.

m <- lm(Sepal.Width ~ Petal.Length + Petal.Width + Species, data = iris)
modelbased::estimate_expectation(m, insight::get_datagrid(iris, "Species"))
#> Model-based Expectation
#> 
#> Species    | Sepal.Width | Petal.Length | Petal.Width | Predicted |   SE |       95% CI | Residuals
#> ---------------------------------------------------------------------------------------------------
#> setosa     |        3.06 |         3.76 |        1.20 |      4.38 | 0.15 | [4.08, 4.67] |     -1.32
#> versicolor |        3.06 |         3.76 |        1.20 |      2.61 | 0.05 | [2.51, 2.71] |      0.44
#> virginica  |        3.06 |         3.76 |        1.20 |      2.18 | 0.12 | [1.94, 2.43] |      0.88
#> 
#> Variable predicted: Sepal.Width
#> Predictors modulated: Species
#> Predictors controlled: Sepal.Length

Created on 2022-08-10 by the reprex package (v2.0.1)

DominiqueMakowski commented 2 years ago

Indeed, will give this a look

bwiernik commented 2 years ago

It's even weirder. It says Sepal.Length but that variable isn't even in the model

bwiernik commented 2 years ago

I bet there's an extraneous ! somewhere so it's showing the columns in the data that aren't the remaining predictors, rather than the ones that are

strengejacke commented 2 years ago

One issue seems to be in insight::data_grid(). Variables adjusted for are stored as attribute, which is created with ifelse(), however, there's an "unexpected" behaviour, which caused wrong attributes here:

x <- c("a", "b")
ifelse(length(x) >= 1, x, NA)
#> [1] "a"

Created on 2022-08-14 by the reprex package (v2.0.1)

strengejacke commented 2 years ago

Ok, one problem seems to be passing data grids as data argument.

str(insight::get_datagrid(iris, "Species"))
#> Classes 'datagrid', 'visualisation_matrix' and 'data.frame': 3 obs. of  5 variables:
#>  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 2 3
#>  $ Sepal.Length: num  5.84 5.84 5.84
#>  $ Sepal.Width : num  3.06 3.06 3.06
#>  $ Petal.Length: num  3.76 3.76 3.76
#>  $ Petal.Width : num  1.2 1.2 1.2
#>  - attr(*, "adjusted_for")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#>  - attr(*, "at_specs")='data.frame': 1 obs. of  3 variables:
...

As you can see, get_datagrid() correctly "adjusts for" 4 remaining variables, as we specified only "Species". estimate_expectation() just uses the attributes from get_datagrid() to get adjusted-for values, independent of the model-specification.

I'm not sure, maybe we just need read the at attribute, and then intersect "adjusted for"-variables with the remaining (non-focal) predictors of the model?

strengejacke commented 2 years ago
library(modelbased)
m <- lm(Sepal.Width ~ Petal.Length + Petal.Width + Species, data = iris)
modelbased::estimate_expectation(m, insight::get_datagrid(iris, "Species"))
#> Model-based Expectation
#> 
#> Species    | Predicted |   SE |       95% CI | Residuals
#> --------------------------------------------------------
#> setosa     |      4.38 | 0.15 | [4.08, 4.67] |     -1.32
#> versicolor |      2.61 | 0.05 | [2.51, 2.71] |      0.44
#> virginica  |      2.18 | 0.12 | [1.94, 2.43] |      0.88
#> 
#> Variable predicted: Sepal.Width
#> Predictors modulated: Species
#> Predictors controlled: Petal.Length (3.8), Petal.Width (1.2)

Created on 2022-08-14 by the reprex package (v2.0.1)

(see #200)

strengejacke commented 2 years ago

btw, I added some more options to create a reference grid to insight::get_datagrid():

m <- lm(Sepal.Width ~ Petal.Length + Species * Petal.Width, data = iris)

grid <- insight::get_datagrid(iris, "Species * Petal.Width")
modelbased::estimate_expectation(m, data = grid)
#> Model-based Expectation
#> 
#> Petal.Width |    Species | Predicted |   SE |       95% CI | Residuals
#> ----------------------------------------------------------------------
#> 0.10        |     setosa |      3.63 | 0.18 | [3.28, 3.98] |     -0.58
#> 0.37        |     setosa |      3.84 | 0.17 | [3.51, 4.16] |     -0.78
#> 0.63        |     setosa |      4.04 | 0.22 | [3.61, 4.47] |     -0.98
#> 0.90        |     setosa |      4.24 | 0.30 | [3.65, 4.83] |     -1.18
#> 1.17        |     setosa |      4.45 | 0.39 | [3.67, 5.22] |     -1.39
#> 1.43        |     setosa |      4.65 | 0.49 | [3.67, 5.62] |     -1.59
#> 1.70        |     setosa |      4.85 | 0.60 | [3.67, 6.03] |     -1.79
#> 1.97        |     setosa |      5.05 | 0.70 | [3.67, 6.44] |     -2.00
#> 2.23        |     setosa |      5.26 | 0.81 | [3.67, 6.85] |     -2.20
#> 2.50        |     setosa |      5.46 | 0.91 | [3.66, 7.26] |     -2.40
#> 0.10        | versicolor |      1.72 | 0.29 | [1.14, 2.30] |      1.33
#> 0.37        | versicolor |      1.94 | 0.23 | [1.49, 2.39] |      1.12
#> 0.63        | versicolor |      2.15 | 0.16 | [1.83, 2.47] |      0.91
#> 0.90        | versicolor |      2.36 | 0.10 | [2.16, 2.56] |      0.70
#> 1.17        | versicolor |      2.57 | 0.06 | [2.46, 2.68] |      0.48
#> 1.43        | versicolor |      2.79 | 0.07 | [2.65, 2.92] |      0.27
#> 1.70        | versicolor |      3.00 | 0.12 | [2.76, 3.24] |      0.06
#> 1.97        | versicolor |      3.21 | 0.19 | [2.85, 3.58] |     -0.15
#> 2.23        | versicolor |      3.42 | 0.25 | [2.93, 3.92] |     -0.37
#> 2.50        | versicolor |      3.64 | 0.32 | [3.01, 4.26] |     -0.58
#> 0.10        |  virginica |      1.68 | 0.30 | [1.08, 2.28] |      1.37
#> 0.37        |  virginica |      1.83 | 0.26 | [1.30, 2.35] |      1.23
#> 0.63        |  virginica |      1.97 | 0.23 | [1.52, 2.42] |      1.09
#> 0.90        |  virginica |      2.12 | 0.19 | [1.73, 2.50] |      0.94
#> 1.17        |  virginica |      2.26 | 0.16 | [1.94, 2.58] |      0.80
#> 1.43        |  virginica |      2.41 | 0.14 | [2.13, 2.68] |      0.65
#> 1.70        |  virginica |      2.55 | 0.13 | [2.30, 2.80] |      0.51
#> 1.97        |  virginica |      2.70 | 0.13 | [2.44, 2.95] |      0.36
#> 2.23        |  virginica |      2.84 | 0.14 | [2.56, 3.12] |      0.22
#> 2.50        |  virginica |      2.98 | 0.17 | [2.65, 3.31] |      0.07
#> 
#> Variable predicted: Sepal.Width
#> Predictors modulated: Species, Petal.Width
#> Predictors controlled: Petal.Length (3.8)

grid <- insight::get_datagrid(iris, "Species * Petal.Width", range = "grid")
modelbased::estimate_expectation(m, data = grid)
#> Model-based Expectation
#> 
#> Petal.Width |    Species | Predicted |   SE |       95% CI | Residuals
#> ----------------------------------------------------------------------
#> 0.44        |     setosa |      3.89 | 0.17 | [3.55, 4.23] |     -0.83
#> 1.20        |     setosa |      4.47 | 0.40 | [3.67, 5.27] |     -1.41
#> 1.96        |     setosa |      5.05 | 0.70 | [3.67, 6.43] |     -1.99
#> 0.44        | versicolor |      1.99 | 0.21 | [1.58, 2.41] |      1.06
#> 1.20        | versicolor |      2.60 | 0.05 | [2.49, 2.71] |      0.46
#> 1.96        | versicolor |      3.21 | 0.18 | [2.84, 3.57] |     -0.15
#> 0.44        |  virginica |      1.87 | 0.25 | [1.36, 2.37] |      1.19
#> 1.20        |  virginica |      2.28 | 0.16 | [1.96, 2.59] |      0.78
#> 1.96        |  virginica |      2.69 | 0.13 | [2.44, 2.94] |      0.36
#> 
#> Variable predicted: Sepal.Width
#> Predictors modulated: Species, Petal.Width
#> Predictors controlled: Petal.Length (3.8)

grid <- insight::get_datagrid(iris, c("Species", "Petal.Width = [sd]"))
modelbased::estimate_expectation(m, data = grid)
#> Model-based Expectation
#> 
#> Petal.Width |    Species | Predicted |   SE |       95% CI | Residuals
#> ----------------------------------------------------------------------
#> 0.44        |     setosa |      3.89 | 0.17 | [3.55, 4.23] |     -0.83
#> 1.20        |     setosa |      4.47 | 0.40 | [3.67, 5.27] |     -1.41
#> 1.96        |     setosa |      5.05 | 0.70 | [3.67, 6.43] |     -1.99
#> 0.44        | versicolor |      1.99 | 0.21 | [1.58, 2.41] |      1.06
#> 1.20        | versicolor |      2.60 | 0.05 | [2.49, 2.71] |      0.46
#> 1.96        | versicolor |      3.21 | 0.18 | [2.84, 3.57] |     -0.15
#> 0.44        |  virginica |      1.87 | 0.25 | [1.36, 2.37] |      1.19
#> 1.20        |  virginica |      2.28 | 0.16 | [1.96, 2.59] |      0.78
#> 1.96        |  virginica |      2.69 | 0.13 | [2.44, 2.94] |      0.36
#> 
#> Variable predicted: Sepal.Width
#> Predictors modulated: Species, Petal.Width = [sd]
#> Predictors controlled: Petal.Length (3.8)

grid <- insight::get_datagrid(iris, c("Species", "Petal.Width = [3,9]"))
modelbased::estimate_expectation(m, data = grid)
#> Model-based Expectation
#> 
#> Petal.Width |    Species | Predicted |   SE |        95% CI | Residuals
#> -----------------------------------------------------------------------
#> 3.00        |     setosa |      5.84 | 1.11 | [3.65,  8.04] |     -2.79
#> 3.67        |     setosa |      6.35 | 1.38 | [3.63,  9.08] |     -3.29
#> 4.33        |     setosa |      6.86 | 1.65 | [3.60, 10.11] |     -3.80
#> 5.00        |     setosa |      7.37 | 1.92 | [3.58, 11.15] |     -4.31
#> 5.67        |     setosa |      7.87 | 2.18 | [3.56, 12.19] |     -4.82
#> 6.33        |     setosa |      8.38 | 2.45 | [3.53, 13.23] |     -5.33
#> 7.00        |     setosa |      8.89 | 2.72 | [3.51, 14.27] |     -5.83
#> 7.67        |     setosa |      9.40 | 2.99 | [3.49, 15.31] |     -6.34
#> 8.33        |     setosa |      9.91 | 3.26 | [3.46, 16.35] |     -6.85
#> 9.00        |     setosa |     10.41 | 3.53 | [3.44, 17.39] |     -7.36
#> 3.00        | versicolor |      4.03 | 0.44 | [3.17,  4.90] |     -0.98
#> 3.67        | versicolor |      4.57 | 0.60 | [3.37,  5.76] |     -1.51
#> 4.33        | versicolor |      5.10 | 0.77 | [3.57,  6.62] |     -2.04
#> 5.00        | versicolor |      5.63 | 0.94 | [3.78,  7.48] |     -2.57
#> 5.67        | versicolor |      6.16 | 1.10 | [3.98,  8.34] |     -3.10
#> 6.33        | versicolor |      6.69 | 1.27 | [4.18,  9.20] |     -3.63
#> 7.00        | versicolor |      7.22 | 1.44 | [4.38, 10.06] |     -4.16
#> 7.67        | versicolor |      7.75 | 1.60 | [4.58, 10.92] |     -4.69
#> 8.33        | versicolor |      8.28 | 1.77 | [4.79, 11.78] |     -5.23
#> 9.00        | versicolor |      8.81 | 1.94 | [4.99, 12.64] |     -5.76
#> 3.00        |  virginica |      3.26 | 0.23 | [2.80,  3.71] |     -0.20
#> 3.67        |  virginica |      3.62 | 0.32 | [2.98,  4.26] |     -0.56
#> 4.33        |  virginica |      3.98 | 0.42 | [3.14,  4.82] |     -0.92
#> 5.00        |  virginica |      4.34 | 0.53 | [3.30,  5.38] |     -1.28
#> 5.67        |  virginica |      4.70 | 0.63 | [3.45,  5.95] |     -1.64
#> 6.33        |  virginica |      5.06 | 0.74 | [3.61,  6.52] |     -2.01
#> 7.00        |  virginica |      5.43 | 0.84 | [3.76,  7.09] |     -2.37
#> 7.67        |  virginica |      5.79 | 0.95 | [3.91,  7.66] |     -2.73
#> 8.33        |  virginica |      6.15 | 1.06 | [4.06,  8.23] |     -3.09
#> 9.00        |  virginica |      6.51 | 1.16 | [4.21,  8.81] |     -3.45
#> 
#> Variable predicted: Sepal.Width
#> Predictors modulated: Species, Petal.Width = [3,9]
#> Predictors controlled: Petal.Length (3.8)

grid <- insight::get_datagrid(iris, c("Species", "Petal.Width = [terciles]"))
modelbased::estimate_expectation(m, data = grid)
#> Model-based Expectation
#> 
#> Petal.Width |    Species | Predicted |   SE |       95% CI | Residuals
#> ----------------------------------------------------------------------
#> 0.87        |     setosa |      4.22 | 0.29 | [3.65, 4.78] |     -1.16
#> 1.60        |     setosa |      4.78 | 0.56 | [3.67, 5.88] |     -1.72
#> 0.87        | versicolor |      2.34 | 0.11 | [2.12, 2.55] |      0.72
#> 1.60        | versicolor |      2.92 | 0.10 | [2.72, 3.12] |      0.14
#> 0.87        |  virginica |      2.10 | 0.20 | [1.71, 2.49] |      0.96
#> 1.60        |  virginica |      2.50 | 0.13 | [2.24, 2.75] |      0.56
#> 
#> Variable predicted: Sepal.Width
#> Predictors modulated: Species, Petal.Width = [terciles]
#> Predictors controlled: Petal.Length (3.8)

Created on 2022-08-14 by the reprex package (v2.0.1)