easystats / performance

:muscle: Models' quality and performance metrics (R2, ICC, LOO, AIC, BF, ...)
https://easystats.github.io/performance/
GNU General Public License v3.0
1.03k stars 94 forks source link

r2_nakagawa breaking when used with purrr::map #145

Open seanhardison1 opened 4 years ago

seanhardison1 commented 4 years ago

I've found that r2_nakagawa breaks when used in a purrr::map context.

Here's a simple example where r2_nakagawa() is used with purrr::map. The same two models are fitted to each species in the Salamanders data set, and Nakagawa's R^2 is collected for each.

library(performance)
library(glmmTMB)
library(tidyverse)

mod_func <- function(df){

  form_list <- list()
  form_list[[1]] <- formula(count ~ mined + (1|site))
  form_list[[2]] <- formula(count ~ mined + DOY + (1|site))
  fam <- "poisson"

  r2_out <- NULL
  for (i in 1:length(form_list)){
    mod <- glmmTMB(form_list[[i]], data = df, family = fam)
    r2_out[[i]] <- r2_nakagawa(mod)
  }
  return(r2_out)
}

nested_mods <- 
  Salamanders %>% 
  group_by(spp) %>% 
  nest() %>% 
  mutate(model = map(data, mod_func))

nested_mods$model

The output gives conditional R^2 = 1 for each model

However, when models are specified individually, r2_nakagawa works as expected

# but in isolation r2_nakagawa works, even when formula is specified as a list
form_list <- list()
form_list[[1]] <- formula(count ~ mined + (1|site))
fam <- "poisson"
mod <- glmmTMB(form_list[[1]], data = Salamanders %>% 
                 dplyr::filter(spp == "GP"), family = fam)
r2_nakagawa(mod)
zmeers commented 4 years ago

I'm also having an issue with purrr::map() and the performance::r2() function. I have re-installed performance from Github, but the issue still persists. For me, it doesn't recognise the data frame. The error I get is:

Error: Problem with `mutate()` input `r2`.
x object '.x' not found
ℹ Input `r2` is `map(multinomial_regression, ~r2(.x))`.

I don't have this issue with performance::check_collinearity().

My session info:

sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin19.5.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /usr/local/Cellar/openblas/0.3.10_1/lib/libopenblasp-r0.3.10.dylib

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] performance_0.5.0 ggthemes_4.2.0    ggtext_0.1.0      nnet_7.3-14       forcats_0.5.0     stringr_1.4.0     dplyr_1.0.2       purrr_0.3.4       readr_1.3.1      
[10] tidyr_1.1.2       tibble_3.0.4      ggplot2_3.3.2     tidyverse_1.3.0  

loaded via a namespace (and not attached):
  [1] readxl_1.3.1         backports_1.1.10     plyr_1.8.6           igraph_1.2.5         listenv_0.8.0        crosstalk_1.1.0.1    rstantools_2.1.1     inline_0.3.15       
  [9] digest_0.6.26        htmltools_0.5.0      rsconnect_0.8.16     fansi_0.4.1          magrittr_1.5         globals_0.12.5       brms_2.13.5          modelr_0.1.8        
 [17] RcppParallel_5.0.2   matrixStats_0.56.0   MCMCpack_1.4-9       xts_0.12.1           sandwich_2.5-1       prettyunits_1.1.1    colorspace_1.4-1     blob_1.2.1          
 [25] rvest_0.3.6          haven_2.3.1          callr_3.5.1          crayon_1.3.4         jsonlite_1.7.1       zoo_1.8-8            glue_1.4.2           gtable_0.3.0        
 [33] MatrixModels_0.4-1   V8_3.2.0             pkgbuild_1.1.0       rstan_2.21.2         future.apply_1.6.0   abind_1.4-5          SparseM_1.78         scales_1.1.1        
 [41] mvtnorm_1.1-1        DBI_1.1.0            miniUI_0.1.1.1       Rcpp_1.0.5           xtable_1.8-4         gridtext_0.1.1       tmvnsim_1.0-2        stats4_4.0.2        
 [49] StanHeaders_2.21.0-6 DT_0.15              htmlwidgets_1.5.1    httr_1.4.2           threejs_0.3.3        lavaan_0.6-7         ellipsis_0.3.1       pkgconfig_2.0.3     
 [57] loo_2.3.1            dbplyr_1.4.4         here_0.1             tidyselect_1.1.0     rlang_0.4.8          reshape2_1.4.4       later_1.1.0.1        munsell_0.5.0       
 [65] cellranger_1.1.0     tools_4.0.2          cli_2.1.0            generics_0.0.2       broom_0.7.0          ggridges_0.5.2       fastmap_1.0.1        blavaan_0.3-10      
 [73] mcmc_0.9-7           processx_3.4.4       fs_1.5.0             future_1.18.0        nlme_3.1-148         mime_0.9             quantreg_5.70        xml2_1.3.2          
 [81] nonnest2_0.5-5       compiler_4.0.2       bayesplot_1.7.2      shinythemes_1.1.2    rstudioapi_0.11      curl_4.3             reprex_0.3.0         pbivnorm_0.6.0      
 [89] stringi_1.4.6        ps_1.4.0             Brobdingnag_1.2-6    lattice_0.20-41      Matrix_1.2-18        markdown_1.1         shinyjs_2.0.0        vctrs_0.3.4         
 [97] CompQuadForm_1.4.3   pillar_1.4.6         lifecycle_0.2.0      bridgesampling_1.0-0 insight_0.9.6.1      conquer_1.0.2        httpuv_1.5.4         R6_2.4.1            
[105] promises_1.1.1       gridExtra_2.3        codetools_0.2-16     colourpicker_1.1.0   MASS_7.3-51.6        gtools_3.8.2         assertthat_0.2.1     rprojroot_1.3-2     
[113] withr_2.3.0          shinystan_2.5.0      mnormt_2.0.2         bayestestR_0.7.2     parallel_4.0.2       hms_0.5.3            grid_4.0.2           coda_0.19-3         
[121] shiny_1.5.0          lubridate_1.7.9      base64enc_0.1-3      dygraphs_1.1.1.6    
strengejacke commented 4 years ago

@seanhardison1 I found the reason why this does not work... Due to the structure of the nested data / models, update() fails, which is required for the null-model:

library(performance)
library(glmmTMB)
library(tidyverse)

mod_func <- function(df){

  form_list <- list()
  form_list[[1]] <- formula(count ~ mined + (1|site))
  form_list[[2]] <- formula(count ~ mined + DOY + (1|site))
  fam <- "poisson"

  mod <- glmmTMB(form_list[[1]], data = df, family = fam)
}

nested_mods <- 
  Salamanders %>% 
  group_by(spp) %>% 
  nest() %>% 
  mutate(model = map(data, mod_func))

# doesn't work
insight::null_model(nested_mods$model[[1]])
#> Can't calculate null-model. Probably the data that was used to fit the model cannot be found.
#> NULL

# reason
update(nested_mods$model[[1]], formula. = ~1)
#> Error in glmmTMB(formula = count ~ 1, data = df, family = fam, ziformula = ~0, : object 'fam' not found

form_list <- list()
form_list[[1]] <- formula(count ~ mined + (1|site))
fam <- "poisson"
mod <- glmmTMB(form_list[[1]], data = Salamanders %>% 
                 dplyr::filter(spp == "GP"), family = fam)

# works
insight::null_model(mod)
#> Formula:          count ~ (1 | site)
#> Data: Salamanders %>% dplyr::filter(spp == "GP")
#>       AIC       BIC    logLik  df.resid 
#>  260.7390  265.7826 -128.3695        90 
#> Random-effects (co)variances:
#> 
#> Conditional model:
#>  Groups Name        Std.Dev.
#>  site   (Intercept) 1.845   
#> 
#> Number of obs: 92 / Conditional model: site, 23
#> 
#> Fixed Effects:
#> 
#> Conditional model:
#> (Intercept)  
#>     -0.9884

Created on 2020-10-22 by the reprex package (v0.3.0)

I'm not sure how to address this at the moment, but I'll let this issue open, in case I find a solution later.

strengejacke commented 4 years ago

@zmeers Do you have a small reproducible example?

strengejacke commented 4 years ago

Interestingly, for a simpler case, it works with mixed models:

library(mice)
library(lme4)
library(performance)
data("nhanes2")
imp <- mice(nhanes2)
models <- lapply(1:5, function(i) {
  lme4::lmer(bmi ~ age + hyp + (1 | chl), data = complete(imp, action = i))
})

lapply(models, r2)
#> Warning: Can't compute random effect variances. Some variance components equal zero.
#>   Solution: Respecify random structure!
#> Random effect variances not available. Returned R2 does not account for random effects.
#> [[1]]
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.917
#>      Marginal R2: 0.383
#> 
#> [[2]]
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.796
#>      Marginal R2: 0.283
#> 
#> [[3]]
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.943
#>      Marginal R2: 0.303
#> 
#> [[4]]
#> # R2 for Mixed Models
#> 
#>   Conditional R2: NA
#>      Marginal R2: 0.286
#> 
#> [[5]]
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.906
#>      Marginal R2: 0.278

Created on 2020-10-22 by the reprex package (v0.3.0)

DHLocke commented 4 years ago

I am new to posting issues, feedback is welcomed and I really appreciate your great software.

library(tidyverse)
library(performance)
library(nlme)
library(broom)

data(iris)

model <- nlme::lme(Sepal.Length ~ Petal.Length, random = ~1 | Species, data = iris)
r2(model)
icc(model)

# ICC for lmer models, if I could get performance::icc to work in the map / mutate that would be better.
ICC_lme <- function(out){
  varests <- as.numeric(VarCorr(out)[1:2])
  varests[1]/sum(varests)
}

boot_straps <- iris %>% 
  modelr::bootstrap(n = 100, id = 'model_iter') %>%
  mutate(mod_1 = map(strap, ~nlme::lme(Sepal.Length ~ Petal.Length, random = ~1 | Species, data =.)),
         mod_2 = map(strap, ~nlme::lme(Sepal.Length ~ Petal.Width, random = ~1 | Species, data =.)),

         tidy_mod_1 = map(mod_1, tidy),
         tidy_mod_2 = map(mod_2, tidy),

         AIC_1 = map(mod_1, AIC),
         AIC_2 = map(mod_2, AIC),

         ICC_1 = map(mod_1, ICC_lme),
         ICC_2 = map(mod_2, ICC_lme),

         R2_1 = map(mod_1, performance::r2),
         R2_2 = map(mod_2, performance::r2)
         )

boot_straps %>% 
  tidylog::select(model_iter, starts_with('AIC')) %>%
  unnest() %>% 
  pivot_longer(cols = -model_iter,
               names_to = 'model',
               values_to = 'AIC') 

boot_straps %>% 
  tidylog::select(model_iter, starts_with('R2_')) %>%
  unnest() 

boot_straps$R2_1
strengejacke commented 4 years ago

@DHLocke With this commit, I added an informative warning related to your issue.

This code-line:

map(strap, ~nlme::lme(Sepal.Length ~ Petal.Length, random = ~1 | Species, data =.))

causes that the model objects have no data:

boot_straps$mod_1[[1]]$data
#> NULL

while this works:

library(nlme)
data(iris)
model <- nlme::lme(Sepal.Length ~ Petal.Length, random = ~1 | Species, data = iris)
head(model$data)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa

Not sure how to address this, but I'll try to investigate further...

strengejacke commented 4 years ago

@DHLocke The reason seems to be one of the tidyverse-quirks, which you can, however, workaround when not using anonymous functions inside map():

library(tidyverse)
library(performance)
library(nlme)
library(broom)
library(broom.mixed)
data(iris)

boot_straps <- iris %>% 
  modelr::bootstrap(n = 10, id = 'model_iter') %>%
  mutate(mod_1 = map(strap, function(d) nlme::lme(Sepal.Length ~ Petal.Length, random = ~1 | Species, data = as.data.frame(d))),
         mod_2 = map(strap, function(d) nlme::lme(Sepal.Length ~ Petal.Width, random = ~1 | Species, data = as.data.frame(d))),

         tidy_mod_1 = map(mod_1, tidy),
         tidy_mod_2 = map(mod_2, tidy),

         AIC_1 = map(mod_1, AIC),
         AIC_2 = map(mod_2, AIC),

         ICC_1 = map(mod_1, performance::icc),
         ICC_2 = map(mod_2, performance::icc),

         R2_1 = map(mod_1, performance::r2),
         R2_2 = map(mod_2, performance::r2)
  )

boot_straps$R2_1
#> [[1]]
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.975
#>      Marginal R2: 0.621
#> 
#> [[2]]
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.981
#>      Marginal R2: 0.646
#> 
#> [[3]]
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.965
#>      Marginal R2: 0.642
#> 
#> [[4]]
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.973
#>      Marginal R2: 0.643
#> 
#> [[5]]
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.962
#>      Marginal R2: 0.658
#> 
#> [[6]]
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.976
#>      Marginal R2: 0.674
#> 
#> [[7]]
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.963
#>      Marginal R2: 0.618
#> 
#> [[8]]
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.974
#>      Marginal R2: 0.632
#> 
#> [[9]]
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.968
#>      Marginal R2: 0.629
#> 
#> [[10]]
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.974
#>      Marginal R2: 0.641

Created on 2020-11-03 by the reprex package (v0.3.0)

strengejacke commented 4 years ago

@DHLocke The issue with non-working icc() for lme() should be fixed in the latest GitHub-version of performance as well.

DHLocke commented 4 years ago

Wonderful, thanks so much.

wmacnair commented 3 years ago

I think I'm having a similar issue, but when calling r2_nakagawa from inside a function (i.e. a more general issue than map).

Maybe stats::update expects to be called in the global scope and can't find the data otherwise? So it could perhaps be fixed by taking the data as an argument and somehow including it in the environment for that call? I don't know enough about scoping etc to be at all confident though...

Thanks for a nicely comprehensive package. Will

MWE

library("glmmTMB")
library("performance")

mwe <- function() {
  Owls <- transform(Owls,
    Nest=reorder(Nest,NegPerChick),
    NCalls=SiblingNegotiation,
    FT=FoodTreatment)
  fit <- glmmTMB(
    NCalls ~ (FT + ArrivalTime) * SexParent + offset(log(BroodSize)) + (1 | Nest),
    data=Owls, family=nbinom2)
  print(r2_nakagawa(fit))
}
mwe()
# Can't calculate null-model. Probably the data that was used to fit the model cannot be found.
# # R2 for Mixed Models

#   Conditional R2: 1.000
#      Marginal R2: 0.593

whereas if I call the same example in the global scope:

Owls <- transform(Owls,
  Nest=reorder(Nest,NegPerChick),
  NCalls=SiblingNegotiation,
  FT=FoodTreatment)
fit <- glmmTMB(
  NCalls ~ (FT + ArrivalTime) * SexParent + offset(log(BroodSize)) + (1 | Nest),
  data=Owls, family=nbinom2)
print(r2_nakagawa(fit))
# # R2 for Mixed Models

#   Conditional R2: 0.244
#      Marginal R2: 0.144

(I know that I'm slightly out of date on updating packages, but I'm hoping that's not the issue here.)

sessionInfo()
# R version 4.0.3 (2020-10-10)
# Platform: x86_64-conda-linux-gnu (64-bit)
# Running under: CentOS Linux 7 (Core)

# Matrix products: default
# BLAS/LAPACK: /pstore/home/macnairw/.conda/envs/r_4.0.3/lib/libopenblasp-r0.3.12.so

# locale:
#  [1] LC_CTYPE=C                 LC_NUMERIC=C
#  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
#  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
#  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
#  [9] LC_ADDRESS=C               LC_TELEPHONE=C
# [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base

# other attached packages:
# [1] performance_0.7.2 glmmTMB_1.0.2.1   colorout_1.2-2    workflowr_1.6.2

# loaded via a namespace (and not attached):
#  [1] Rcpp_1.0.6        pillar_1.6.1      compiler_4.0.3    later_1.2.0
#  [5] nloptr_1.2.2.2    TMB_1.7.20        tools_4.0.3       boot_1.3-28
#  [9] digest_0.6.27     lme4_1.1-27.1     evaluate_0.14     lifecycle_1.0.0
# [13] tibble_3.1.2      nlme_3.1-152      lattice_0.20-44   pkgconfig_2.0.3
# [17] rlang_0.4.11      Matrix_1.3-4      xfun_0.24         knitr_1.33
# [21] fs_1.5.0          vctrs_0.3.8       grid_4.0.3        R6_2.5.0
# [25] fansi_0.5.0       rmarkdown_2.8     minqa_1.2.4       magrittr_2.0.1
# [29] promises_1.2.0.1  ellipsis_0.3.2    htmltools_0.5.1.1 splines_4.0.3
# [33] MASS_7.3-54       insight_0.14.2    httpuv_1.6.1      utf8_1.2.1
# [37] crayon_1.4.1
bwiernik commented 3 years ago

This all sounds like an environment scoping issue @strengejacke

strengejacke commented 3 years ago

Yeah, I must see when I find some time to get back to this issue again...