ModelOriented / DALEX

moDel Agnostic Language for Exploration and eXplanation
https://dalex.drwhy.ai
GNU General Public License v3.0
1.35k stars 166 forks source link

Option to specify random effects? #556

Open Jeffrothschild opened 1 year ago

Jeffrothschild commented 1 year ago

Hi, I'm wondering if it would be possible (or even make sense) to have the option to specify random effects in the model explainer?

I thought about this because when looking at feature importance, the full model RMSE is quite different to one that accounts for random effects. For example...


library(tidyverse)
library(tidymodels)
library(lme4)
library(DALEXtra)

df <- nlme::Oxboys 
df

# model using lmer

lmr_mod <- lme4::lmer(height ~ age + Occasion + (1|Subject), df)
sjstats::rmse(lmr_mod)
# RMSE is 1.2

# model with tidymodels
mixed_model_spec <- linear_reg() %>% set_engine("lmer")

mixed_model_wf <- workflow() %>%
  add_model(mixed_model_spec, formula = height ~ age + Occasion + (1|Subject)) %>%
  add_variables(outcomes = height, predictors = c(age, Occasion, Subject))

fit <- fit(mixed_model_wf, df)

explainer <- 
  explain_tidymodels(
    fit, 
    data = dplyr::select(df, c(age, Occasion, Subject)),
    y = df$height,
    label = "lmm",
    verbose = T)

var_imp <- 
  feature_importance(explainer)

# full model RMSE is 8.0
mayer79 commented 1 year ago

ML workflows with clustered data are a delicate thing. Using a clean train/test split (grouped split on subject) and then evaluating the model on the test data is often a good choice. Then you wont have this problem.