easystats / modelbased

:chart_with_upwards_trend: Estimate effects, contrasts and means based on statistical models
https://easystats.github.io/modelbased/
GNU General Public License v3.0
234 stars 19 forks source link

estimate_expectation() with matrix repsonse doesn't work #164

Open bwiernik opened 3 years ago

bwiernik commented 3 years ago

When the response of a model is matrix-like, get_response() returns this as a dataframe with correct names. However, get_data() is returning the response as a nested matrix-column in the data.frame:

library(lme4)
#> Loading required package: Matrix
(gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
              data = cbpp, family = binomial))
#> Generalized linear mixed model fit by maximum likelihood (Laplace
#>   Approximation) [glmerMod]
#>  Family: binomial  ( logit )
#> Formula: cbind(incidence, size - incidence) ~ period + (1 | herd)
#>    Data: cbpp
#>      AIC      BIC   logLik deviance df.resid 
#> 194.0531 204.1799 -92.0266 184.0531       51 
#> Random effects:
#>  Groups Name        Std.Dev.
#>  herd   (Intercept) 0.6421  
#> Number of obs: 56, groups:  herd, 15
#> Fixed Effects:
#> (Intercept)      period2      period3      period4  
#>     -1.3983      -0.9919      -1.1282      -1.5797
insight::get_response(gm1) |> head()
#>   incidence size
#> 1         2   14
#> 2         3   12
#> 3         4    9
#> 4         0    5
#> 5         3   22
#> 6         1   18
insight::get_data(gm1) |> head()
#>   cbind(incidence, size - incidence).incidence
#> 1                                            2
#> 2                                            3
#> 3                                            4
#> 4                                            0
#> 5                                            3
#> 6                                            1
#>   cbind(incidence, size - incidence).V2 period herd incidence size
#> 1                                    12      1    1         2   14
#> 2                                     9      2    1         3   12
#> 3                                     5      3    1         4    9
#> 4                                     5      4    1         0    5
#> 5                                    19      1    2         3   22
#> 6                                    17      2    2         1   18

Created on 2021-07-09 by the reprex package (v2.0.0)

This is producing errors when the data are used by other functions:

library(lme4)
#> Loading required package: Matrix
(gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
              data = cbpp, family = binomial))
#> Generalized linear mixed model fit by maximum likelihood (Laplace
#>   Approximation) [glmerMod]
#>  Family: binomial  ( logit )
#> Formula: cbind(incidence, size - incidence) ~ period + (1 | herd)
#>    Data: cbpp
#>      AIC      BIC   logLik deviance df.resid 
#> 194.0531 204.1799 -92.0266 184.0531       51 
#> Random effects:
#>  Groups Name        Std.Dev.
#>  herd   (Intercept) 0.6421  
#> Number of obs: 56, groups:  herd, 15
#> Fixed Effects:
#> (Intercept)      period2      period3      period4  
#>     -1.3983      -0.9919      -1.1282      -1.5797
modelbased::estimate_expectation(gm1, include_random = TRUE)
#> Error in cbind(incidence, size - incidence): object 'incidence' not found

Created on 2021-07-09 by the reprex package (v2.0.0)

@DominiqueMakowski See the problem this is causing with estimate_prediction()

DominiqueMakowski commented 3 years ago

Indeed, we should probably add a step to get_data to sanitize the output right?

bwiernik commented 3 years ago

Yeah that would be good

strengejacke commented 3 years ago

Both columns incidence and size are present in the returned data frame, so I'm not sure if this is an issue of get_data()?

strengejacke commented 3 years ago

Any comments on my comment? :-)

bwiernik commented 3 years ago

At a minimum, the response matrix-column probably shouldn't be there.

strengejacke commented 2 years ago

No get_data() issue:

library(insight)
library(lme4)
#> Loading required package: Matrix
(gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
              data = cbpp, family = binomial))
#> Generalized linear mixed model fit by maximum likelihood (Laplace
#>   Approximation) [glmerMod]
#>  Family: binomial  ( logit )
#> Formula: cbind(incidence, size - incidence) ~ period + (1 | herd)
#>    Data: cbpp
#>      AIC      BIC   logLik deviance df.resid 
#> 194.0531 204.1799 -92.0266 184.0531       51 
#> Random effects:
#>  Groups Name        Std.Dev.
#>  herd   (Intercept) 0.6421  
#> Number of obs: 56, groups:  herd, 15
#> Fixed Effects:
#> (Intercept)      period2      period3      period4  
#>     -1.3983      -0.9919      -1.1282      -1.5797

get_data(gm1) |> str()
#> 'data.frame':    56 obs. of  4 variables:
#>  $ period   : Factor w/ 4 levels "1","2","3","4": 1 2 3 4 1 2 3 1 2 3 ...
#>  $ herd     : Factor w/ 15 levels "1","2","3","4",..: 1 1 1 1 2 2 2 3 3 3 ...
#>  $ incidence: num  2 3 4 0 3 1 1 8 2 0 ...
#>  $ size     : num  14 12 9 5 22 18 21 22 16 16 ...
get_data(gm1) |> head()
#>   period herd incidence size
#> 1      1    1         2   14
#> 2      2    1         3   12
#> 3      3    1         4    9
#> 4      4    1         0    5
#> 5      1    2         3   22
#> 6      2    2         1   18

modelbased::estimate_expectation(gm1, include_random = TRUE)
#> Error in cbind(incidence, size - incidence): object 'incidence' not found

Created on 2021-12-30 by the reprex package (v2.0.1)