easystats / insight

:crystal_ball: Easy access to model information for various model objects
https://easystats.github.io/insight/
GNU General Public License v3.0
402 stars 39 forks source link

insight::get_data issue with subset argument provided via eval(parse(text=...)) #819

Closed AlpDYel closed 4 days ago

AlpDYel commented 1 year ago

I have fitted a bunch of models using a loop that uses eval(parse(text = thing_I_want)) to fit model and family. Now I have loaded the resulting models and I am using insight::get_data() on the model object but I get the error

Error in as.character(x) :
cannot coerce type 'closure' to vector of type 'character'

insight_get_data error

strengejacke commented 1 year ago

Do you have a reproducible example?

AlpDYel commented 1 year ago

I will try to get one. I am using a custom function to fit and save models in parallel and then load and name the resulting models so I would have to simplify certain things from the pipeline to get it to reproduce without publishing a bunch of tangentially related code.

AlpDYel commented 9 months ago

So I had to revisit the issue in a new project and I went ahead and isolated the issue:

ibrary(tidyverse)
library(lme4)
library(performance)

data(troutegg,package="faraway")

#This does not work
fit_mod <- function(formula,weights,data,family,subset){
  mod <-glmer(as.formula(formula),weight=eval(parse(text=weights)),
              data=data,family= eval(parse(text = family)),subset=eval(parse(text=subset)))
  return(mod)
}
mod <- fit_mod(formula = "survive/total~period+(1|location)",
               weights = "total",
               data=troutegg,
               family="binomial('logit')",
               subset = "total>0")

performance::binned_residuals(mod)
#This works
mod2 <-glmer(as.formula("survive/total~period+(1|location)"),weight=eval(parse(text="total")),
             data=troutegg,nAGQ=0,family= eval(parse(text = "binomial('logit')")))
performance::binned_residuals(mod2)

#Digging deeper, problem comes from
insight::get_data(mod)

#This does not work too...
fit_mod_alt <- function(formula,data,weights,troutegg,family,subset){
  mod <-glmer(as.formula(formula),weight=eval(parse(text=weights)),
              data=get(data,envir = .GlobalEnv),family= eval(parse(text = family)),subset=eval(parse(text=subset)))
  return(mod)
}
mod3 <- fit_mod_alt(formula = "survive/total~period+(1|location)",
               weights = "total",
               data="troutegg",
               family="binomial('logit')",
               subset = "total>0")
performance::binned_residuals(mod3)

If I understand correctly, the data for the model shows up as data internally when fitted with the fit_mod function (as opposed to troutegg), which causes issues with the get_data function from insight that is used internally within some performance library functions such as binned_residuals

AlpDYel commented 9 months ago

I also tried assigning troutegg to data to see if that helps, it also did not

#This also does not work
data <- troutegg
fit_mod_alt <- function(formula,data,weights,troutegg,family,subset){
  mod <-glmer(as.formula(formula),weight=eval(parse(text=weights)),
              data=data,family= eval(parse(text = family)),subset=eval(parse(text=subset)))
  return(mod)
}
mod3 <- fit_mod_alt(formula = "survive/total~period+(1|location)",
               weights = "total",
               data=data,
               family="binomial('logit')",
               subset = "total>0")
performance::binned_residuals(mod3)
AlpDYel commented 9 months ago

So I finally figured out the root of the issue:

The get_data function does not work intended when I used the subset argument. It works OK when I subset within the as an input (example with tidyverse filter) as opposed to the subset argument. It is likely a none-issue for 99% of the cases.

#Works!
mod4 <- fit_mod_alt(formula = "survive/total~period+(1|location)",
                    weights = "total",
                    data= troutegg,
                    family="binomial('logit')")
insight::get_data(mod4)
#Alternative to subset with tidyverse
mod5 <- fit_mod_alt(formula = "survive/total~period+(1|location)",
                    weights = "total",
                    data= troutegg %>% filter(eval(parse(text="location !=5"))),
                    family="binomial('logit')")

insight::get_data(mod5)
strengejacke commented 4 days ago

The problem here is that the call doesn't tell us which variables were used:

library(lme4)
data(troutegg,package="faraway")

data <- troutegg
fit_mod_alt <- function(formula,data,weights,troutegg,family,subset){
  mod <-glmer(as.formula(formula),weight=eval(parse(text=weights)),
              data=data,family= eval(parse(text = family)),subset=eval(parse(text=subset)))
  return(mod)
}

mod3 <- fit_mod_alt(formula = "survive/total~period+(1|location)",
               weights = "total",
               data=data,
               family="binomial('logit')",
               subset = "total>0")

insight::get_call(mod3)
#> glmer(formula = survive/total ~ period + (1 | location), data = data, 
#>     family = eval(parse(text = family)), subset = eval(parse(text = subset)), 
#>     weights = eval(parse(text = weights)))

Created on 2024-11-04 with reprex v2.1.1